Website Scraper Code – preferably using OutWit-Hub (since I have a copy – but more suitable package would probably be OK).
The target site is a real estate website which provides DB of agents across the US which can be searched by location (town, suburb). I am interested in the selling/listing agents (the majority). Doing it manually, the agent list for the target location needs each agent name to be clicked one at a time. This then loads a second page for the particular agent which contains wanted fields for agent name and agency/brokerage name -2 of the 3 target fields. Also in the side panel box containing the agency name is a “Website” link (for most records probably 70-80% of them. This needs to be clicked/opened in order to find the 3rd required field for the particular record: an email address for the agent. The “Website” pages need to be scanned for the email address, which typically may be in the header-banner, or anywhere in the body of the page. Or often in other pages of this 3rd-level site, such as “About Us” page or other. The “Contact Us” page usually opens up a contact form and does not contain an email address, although some do.
Most agent records use a form for visitor communication with the agent, no doubt because the site operator gets some sort of fee or commission for introducing a prospective property seller to the agent. Also because most agents are lazy or stupid and trying to avoid automatically generated spam – and they also “avoid” prospective clients who hate filling out forms. But around half have the email address as well, and these are my targets.
The top level of page of the entire agents listing for a specific district in the directory has multiple pages, each containing maybe 10 agent record summaries and there is typically 100 to 500 agent records for any given location. So the first level search needs to automatically progress though the entire first level page-by-page.
There is no expressed restriction on manual roaming and extracting fields for multiple records. I do not use the data in any way which is competitive with the DB operator.
I will provide the DB id and sample (manual) extraction sheet when confidential dialogue with prospective Freelancer is established. DB is slow and clunky and quite a few website links do not work plus DB is fairly volatile due to turnover of agent member/ subscribers and presumable failure of member agents to update the data.
Good idea also to capture both agent and agency names fields from second level pages even for those agent records without a website link (can be saved to same spreadsheet and separated out later by myself).
If not possible to access and scan the pages in the third level (agent/agency website) pages, then in would still be helpful for me to avoid wasting time manually searching all the records which don’t have a website link on the second level page side panel (and save as per preceding para), and, for those with a website link, simply give me a the website hyperlink address instead of the email address. I have never seen an email address on either of the first 2 levels (because it would allow the visitor to bypass the DB operator).
16 freelancers are bidding on average $698 for this job
Dear Sir! I make Scraper code ++ This is my job. I'm ready to begin now. Pm for me. Please check my reviews and profile https://www.freelancer.com/u/ProfSoftStudio.html