I'm looking to have an entire website, data and media (images) scraped and saved to a database.
Extraction target site: See attachment to this project for the URL.
1. Select Counties in top navbar
2. CLick on the current date, ie "Wed
3. It lists the arrests of that date. People's names. Click on every name on every day and grab their information and mugshot. In this case, Dovon Anderson is first. We click his name/link.
[url removed, login to view] all of this data and store into a MYSQL database. Classify by County and state (right now we're just doing FL, but we will move on to the other states soon).
- Data to be scraped on an individual mugshot page, ALL. Except the advertisements and 'Tag This Mugshot'.
Scrape: Arrest Information (Full name, Date, Time, Arresting Agency, Total Bond), Personal Information, Charges.
This should be repeated for all counties in FL, as well as all dates listed in the gray colored bar that lists dates and arrests like the one above. I have a list of 40+ private proxies to be used and am willing to use a captcha service if need be. I prefer Mac OSX but can run it on Windows if it's easier for you.
Summary: I want all the data and mugshot photos from [url removed, login to view] put into a database periodically so my dev can use the data to display on our own website.
Output: MySQL database
Let me know what you think. Thanks.