I want a developer with the following skills to write a simple report script:
Skills:
- Win32 Perl (ActiveState)
- spidering using LWP (regular expression skills needed)
- MS Excel access using OLE
Report script functionality:
- Open command line given Excel worksheet and read a list of internet search engine queries, e.g. url #1, url #2, url #3, etc. Each url has one or more respective regular expressions.
- Spider each read url, strip away html tags and java etc so that only body text remains, and collect statistics from the body text using the regular expression(s) associated with the url.
- Update Excel worksheet with the date of the spidering, url id, regular expression id(s), and statistics gathered.
- It must be possible for a human to edit the Excel worksheets while the script is not running. In this way then extra urls can be added, as well as Excel graphs of the collected data etc.
Intended usage:
The Perl script will be run once per day in order to update the Excel worksheets. The Excel worksheets will be used in order to add new urls and browse collected statistics.