Custom scraping of media directories (pay per directory scraped)
$5-10 USD
Cancelled
Posted almost 12 years ago
$5-10 USD
Paid on delivery
Build custom scrapers to take profile data from many different online media directories and output to yaml.
First directory to scrape: California Institute of Technology
<[login to view URL]>
## Deliverables
Job:
Build custom scrapers to take profile data from existing media directories and output them to yaml file.
Pay:
Per directory scraped. Includes up to two revisions to the script. Small revisions are often needed as each directory is unique and feedback from us is needed to make small changes to make the yaml file useable.
Details:
The data to be scraped is most often on individual profile pages within the directory. You will need to export the data as a yaml file that includes the following:
1. list of profiles in alphabetical order
2. embed predetermined location in each profile
3. eliminate profiles with no expertise listed
4. take name, title, phone number(s), email(s), expertise, links, awards and other profile information
This is the first media directory to scrape:
California Institute of Technology
<[login to view URL]>
Example of profile outputted to yaml:
- :fn: Adell, Bernard L
:tel: 613-533-6000 x74256
:email: [login to view URL]@[login to view URL]
:url: [login to view URL]
:faculty: Faculty of Law
:department:
- Faculty of Law
:tags:
- Labour law
- Employment law
- Essential services strikes
- Essential services lockouts
- Employment law reform
:org:
:"organization-name": Queen's University
:adr:
:locality: Kingston
:region: Ontario
:"postal-code": K7L3N6
Additional media directories to be scrapped, others will be added if the work is done well:
Harvard University (location address of all to embed in code: Cambridge, Massachusetts, zip code 02138):
Harvard Divinity School
<[login to view URL]>
Harvard Graduate School of Education
<[login to view URL]>
Harvard's John F. Kennedy School of Government
<[login to view URL]>
Note:
The final script must be sent along with the yaml file. All work is property of [[login to view URL]][1].
I attest on penalty of perjury that no information gathered will be used in violation of the rules of the source of that information. I additionally attest, that the information will not be gathered or used in violation of the CAN-SPAM act or any other U.S. law.