Develop web scraping software

Closed Posted Sep 14, 2015 Paid on delivery
Closed Paid on delivery

I’m looking for experienced data extraction developer to provide me with custom project.

The goal is to automatically retrieve information from some major search engine. It’s not about regular search results, but information from snippets about important places containing place name, phone number and opening hours.

My original project analysis shown, that search engine is using content generated dynamically by obfuscated/compressed JavaScript code. No simple wget/curl will provide results. You must have experience in this matter since it looks not trivial.

Input data (string):

- Search query

Output data (JSON):

- Place name

- Address

- Phone number

- Opening hours

I see this as a command line script where I provide search query as parameter and get JSON response in STDOUT.

Script must use proxy service provided by [login to view URL] and include automated dead proxy detection and rotation. Connection timeout must be a parameter.

Search engine is using HTTPS encrypted connections.

Interpreted languages preferred like: PHP, Python

Script must run on headless Linux/Debian server. It must not depend on web browser or any other GUI application, so for instance Selenium will not work.

Script must be able to run multiple instances concurrently.

During the tests you will provide online demo of this script where search query will be passed as URL GET/POST params and response will contain JSON.

After finishing you must provide full, unencrypted source code of the project and build/compilation instructions if needed.

Data Mining Java PHP Python Web Scraping

Project ID: #8472130

About the project

44 proposals Remote project Active Nov 3, 2015