I have a website that goes like this:
[login to view URL], where NNNN is a number starting with 1 and going upto 20000.
So it is
[login to view URL]
[login to view URL] etc.
When you reach the site, it will have three results:
1. Page would show: No practice ID
2. Page would show:
This site is not configured
3. A page without #1 or #2 above.
For the three examples output above, please see the attached files.
The crawler should identify the URLs that belong to #3 above and spit out an Excel file. I have defined an outline of the workflow below.
If you use Python, you will call ‘urllib2’ or ‘requests’ – requests is easy to work with.
These libs allows to call a URL and evaluate the response.
First is to check the response code – 200 or 404 or 500.
Next, these libs provide access to the response body.
Response body can be loaded as text for simple text search or it can be loaded as xml for advance traversal.
[Removed by Freelancer.com Admin]
18 freelancers are bidding on average €52 for this job
Hi, I have done several projects similar with this. Please view my portfolios and contact me to discuss your project. Let me know the details via interview. Thank you. Regards,
Hello There, I have rich experience in working with web scraping, web crawling. I would like to do your project. Message me for more details. We can discuss more over chat.
Hi, I have a great experience in writing scrapers (more than 70 for such web sites as [login to view URL] or [login to view URL]). Сan complete your project quickly, efficiently and qualitatively. Regards