Find Jobs
Hire Freelancers

Simple web scrapping project

$80-210 USD

Cancelled
Posted over 11 years ago

$80-210 USD

Paid on delivery
This project involves scraping several websites to compile a database of contacts for a targeted marketing program in the USA. The contact details should include all available contact data that can be obtained from the websites such as “Contact Name”, “Title”, “Areas of Specialty / Services Provided”, “Business Name”, “Address”, “Phone”, “Fax”, “e-mail”, “Contacts Website”, “Data Source (i.e. source website url)”, and so forth. The contact details should be provided in rows with column headings in Excel format. The specific list of websites contact directories are as follows: Sub-Project 1: [login to view URL] The first sub-project will be to get contact data US Pediatricians from Healthgrades.com. The pediatricians directory starts here: [login to view URL] From there you would drill down to find all the pediatric doctor pages such as: [login to view URL] [login to view URL] and so on. The address page has the contact details. There are many thousands of Pediatricans in the US so I expect you will get a lot of data. I believe you may run into a technical challenge since most large content websites installed scrape detection which will block out repeated requests from the same IP especially if performed too quickly. If you are familiar with this problem let me know. The work-around is to reduce the scrape rate to a maximum rate per IP and to use IP proxies. I have a piece of Java code I wrote a while back to do this which I can give to you as part of the project. Let me know if you need it. You just need to load a list of public IP proxy servers and rotate the requests across proxies. Again I don't know if health grades has this feature but it likely. I would also like you to compile contact data for the pediatric nurses. You can start the scrape process here: [login to view URL] Sub-Project 2: [login to view URL] The second sub-project would be to scrape lactation consultant contacts from ILCA. The search page is located here: [login to view URL] Since the web form requires entry of a zip code I suggest you use a list of zip codes for all major US cities found here: [login to view URL] This should get most of the country. Sub-Project 3: [login to view URL] The third sub-project would be to scrape [login to view URL] . The contact info for all US doulsa can be found at: [login to view URL] The results are paginated so you would need to scrape each page obviously. The successful project applicant may use any technology they wish to most efficiently achieve this task. For example; Excel VBA, PHP/MySQL, Java / RegEx, ScrapeBox, CodeCentrix, or iOpus iMacros are all possible ways to obtain the data but if the data can’t be scraped then some limited manual data entry may be required. Applicant should provide a fixed price for the entire project and Duration for completion Conclusion I believe sub-project 1 is the most complex and sub-project 3 is the simplest. If you want to start with sub-project 3 I am fine with that since it allows us to see results most quickly.
Project ID: 4028362

About the project

Remote project
Active 11 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

About the client

Flag of UNITED KINGDOM
birmingham, United Kingdom
4.8
155
Payment method verified
Member since Jun 27, 2012

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.