I need a Python 3 script that will extract specific parts of text from a PDF file and generate an .rtf file with those text elements isolated, and some extra text added.
The input file is a typical "Written Discovery" request used in litigation (when people are suing/being sued). The first few pages have the case information and preliminary definitions. Then there are a series of numbered requests that all have the same title, but with a different number. Following the title, there is the text of the discovery request.
In the example I am using (see file: [login to view URL]), each discovery request is titled "SAMPLE DISCOVERY REQUEST" but in "real world" documents they could be a number of different titles (eg SPECIAL INTERROGATORY, REQUEST FOR PRODUCTION, REQUEST FOR ADMISSION), although they will all be the same in a single document.
I would like a script that uses OCR to extract all of the discovery requests and put them into an .rtf file with limited formatting (bold and underlined). Each request should be followed by the following text: RESPONSE TO [TITLE OF DISCOVERY REQUEST] (see file: [login to view URL])
What makes this tricky is that these documents always appear on "Pleading Paper" where each line is numbered on the left hand side of the page. This causes the text in OCR output to be interrupted by numbers (see file: OCR [login to view URL]).
The script will need to determine when the requests begin, what they are called, list all of them (in the example, there are 9, but 30-50 is more typical) and add in the "RESPONSE TO" language.
This is the first step in what could be a much larger project for the right developer.
Please let me know if you can handle this project in a short timeframe, and if you have any questions.
Dear client,
Thanks a lot for taking your precious time to read my message.
After browsing your job description, I am very interested in your project and I believe I’m qualified for the task.
Regarding OCR, I have +10 years in this field and have worked on many successful OCR projects in the past.
If you are looking for a highly skilled and rich experienced Image processing expert with a deep knowledge, a professional attitude, excellent communication skills and the highest code quality, then I'm the person you are looking for.
I'd like to talk more about this.
If given a chance, I am highly confident in my ability to deliver the highest quality. I am confident that my involvement in your project will bring it to a successful launch, on time and within budget.
I look forward to hearing from you.
Kind Regards
Hello Sir,
I am the expert freelancer here. I am on the 6th position through out the world to deliver the quality job. I have deliver here more than 395 + projects with 100% client satisfaction.
I can help you with Python Script PDF OCR to .rtf
I have more than 5 years of the experience in Python
Please ping me for more discussion.
Thanks.
Hello!\nI am a python developer.\nI looked at your project and it seems interesting.\nI have all necessary skills required for this project.\nPing me to discuss in detail.
As an:
1- Bachelor of Applied Mathematics
2- Engineer in Statistics and Applied Economics
3- Microsoft Office Technician (Microsoft Graduate: MOS Diploma),
I am able, available and ready to do this kind of work in the best conditions,