Find Jobs
Hire Freelancers

Machine Learning / TensorFlow / OCR Document Classification and Data Extraction

€750-1500 EUR

Completed
Posted over 7 years ago

€750-1500 EUR

Paid on delivery
Looking for an experienced Machine Learning specialist who can help us build an advanced document categorisation and data extraction application using Google’s ML and TensorFlow if determined suitable. We process tens of thousands of documents each day from a library of 100 different categories, of which each could have 1000 different variations. Currently, we use traditional expression based document categorisation and template based data extraction, which is cumbersome to setup and manage. As this is a pilot project it will be built in stages starting with 2 basic Identity documents, a drivers licence and passport. The aim will be to receive an image of the document and the OCR extracted unstructured text file, from these we will want to determine the document type and then structure the text into an organised JSON format. Ideally the application should be able to receive any document, determine which ones are identity documents, determine what type of identity document they are, and extract the text into a structured format. Please outline your experience with Machine Learning and Tensor flow to help assist in candidate selection for this project. Once the pilot project is successful an opportunity could be available to build a large project. Questions to respond to in your proposal: - How would you approach this project? - What roadblocks do you see in this project?
Project ID: 11290414

About the project

8 proposals
Remote project
Active 8 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
I registered at freelancer specifically in response to this posting. I"m an entrepeneur working in the last stages of a self-funded fintech project, needing to generate income to push through the last few months of development. I've been writing empirical algorithms for data collection, feature extraction, and analysis since 2010. My first major project was a smart OCR utility for screen-scraping online gaming clients, parsing text and translating to a database in real-time. Currently I'm working on a end-to-end learner for futures market portfolio optimization by swarming recurrent neural networks. My everyday work revolves around deep-learning and statistical optimization. I prefer Keras, an ML library wrapping Tensorflow and Theano, in most cases. Its an approachable, well-documented and well-maintained library with lots of power; and it's framework can be easily extended when circumstances require non-standard approaches. Your pilot project needs this: convolutional neural nets for document classification, sequence-to-sequence (bidirectional) recurrent neural nets for parsing text files, Bayesian hyper-parameter search with Mint for best general settings with heterogeneous data . Your obstacle is document standardization, workflow, and preprocessing. SciKit Learn provides easily open-source facilities for this purpose. I'd be happy to discuss further and provide code samples with tests upon request. Best of luck on this project, Josh W.
€1,111 EUR in 15 days
5.0 (1 review)
4.3
4.3
User Avatar
Hello. I have a masters degree in AI and have worked on Image Processing algorithms and tools for more than 7 years. I'm developer of this app: [login to view URL] and have worked with the latest algorithms and tools for OCR, Image detection, feature extraction, SIFT, SURF, Deep Learning, ... As you mentioned I think we can use TensorFlow and also Tesseract OCR in this project. Also Microsoft has a good open source OCR tool. This API is using it: [login to view URL] I think we first can train a classifier algorithm to detect the type of document. We can use some features like Image features, raw texts from OCR algorithm, ... to detect the type and then try to parse the OCR texts for each type and try to convert it to JSON. I assume that you want to build something like this: [login to view URL] I have worked with it in one of my projects before. One of the issues is that the accuracy of OCR tool/algorithm is not 100%. Especially for noisy images and to fix this we can use some filters for the different fields to make sure that the output is fine. For example we can define that all characters of a field should be digits or length of the field should be 10, ... Please let me know if you have any questions. Thanks, Helmot
€777 EUR in 20 days
4.8 (148 reviews)
7.7
7.7
8 freelancers are bidding on average €1,196 EUR for this job
User Avatar
I have done MS Software Engineering. I had a course on DATA ENGINEERING and Artificial Intelligence. I know all data mining techniques (Predication & Classification) and data analysis techniques. I have worked on K-mean, ID3, Bayesian theorem, confusion matrix, Hungarian algo and so on .My research was on Rough Set Theory. Tools I uses are Weka, Matlab, RapiMiner, SPSS,Java, R programming and Excel . Please see my profile and reviews as well. Thanks
€1,000 EUR in 20 days
4.9 (206 reviews)
7.2
7.2
User Avatar
Hi, have a good day. The interest in this project is about experience in Data Analysis by this platform, although, all my experience has been in Industrial Analysis but now I´m searching for a new challenge that can give me the opportunity to expand my knowledge in this area. I didn´t find a roadblock for this project, the low experience that I have is a competition for me and my ability to learn quickly is an advantage to tackle this project. At present Im only studying my master degree, in advanced Math and Computing, and this give the opportunity and time to take this job. I hope to take the time to know me. Regards.
€1,250 EUR in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hello, I understood the initial scope of this project. Although i want to discuss further this job in order to prepare the final concept for this project. After Complete discussion over the call or in chat, i will prepare following things for you - Technical Project Proposal - Flow chart for this Project - Execution plan (Step by step procedure with explanation how and at what that we are going to execute a particular task)
€1,764 EUR in 40 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi, I am about to finish my Data Science certification, along with my masters from Harvard University and have done significant amount of projects and coursework along the way related to ML. To add to that I have around 7+ yeas of programming experience and would be able to handle your project efficiently. I have done projects related to text classification, NLP, in both supervised and unsupervised learning scenarios. I feel confident that I would be able to solve this problem. Here are your answers - - How would you approach this project? Without going into too much details, here is a brief summary of the approach - First step is to classify the document, without looking at the raw data, I am assuming it would be easier to use the OCR to classify the document, rather than using the image. We can try supervised/unsupervised approaches here based on exactly what data-sets you have available. Obviously we will have to clean and normalize the data and get onto feature identification before moving forward. The next step would be converting the data into structured format. I would really need to look into sample data sets to give any opinion on that. one approach could be use NLP to tokenize the data set and then identify certain key tokens. Finally, the coding infrastructure should be modular to allow addition of more documents in future without messing everything up. - What roadblocks do you see in this project? converting the unstr data into str data (char limit reached)
€1,666 EUR in 20 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hello ► We follow Agile Scrum. ► We can help you on it. ► Please ping me back to discuss roadblocks. ► For your note, we have team of 85+ in-house developers skilled with all major technologies including RoR, PHP, .NET, Android, iOS along with Designers and SEO... ||| ❰1❱ 2000+ Successful Project Deliveries. ❰2❱ 85+ In-House Developers & CSM / CSPO. ❰3❱ Offices in USA / Canada / Ireland / UK. ❰4❱ Execution Methodology: Agile Scrum.  ||| So, Looking forward for your reply ASAP. THANKS & REGARDS...
€1,000 EUR in 20 days
0.0 (1 review)
0.0
0.0

About the client

Flag of ROMANIA
Bucuresti, Romania
5.0
7
Payment method verified
Member since Apr 2, 2011

Client Verification

Other jobs from this client

XSLT file transformation
$30-250 USD
Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.