Need to scrape data from a public source and save it as HTML and generate RSS field
₹1500-12500 INR
Closed
Posted almost 10 years ago
₹1500-12500 INR
Paid on delivery
Need to write a PHP code to do the following:
An original website needs to be scraped. The structure of the original website is as follows. The data needs to be fetched from this website and needs to be saved as HTML file in respective folder structure. The target website's storage structure is given below. This php file will be stored in a protected directory. The php file is to be activated using php cli at specific time using CRON job. The input to the php script is given below. These inputs will be used by the programmer (you) in original website to get relevant data.
Once the data is obtained and stored, we must be able to get the details of data in target website using RSS feed. The RSS feed will be derived using data in target website only. The format of RSS feed is given below.
Terminology: The website to be scraped is called 'Original website'. The place to store data is called as 'Target Website'
Original Website Data structure:
Category A
State
From Date
To Date
Category B
State
District
From Date
To Date
Target Website Data structure:
Category A
State
Year
Month
Category B
State
District
Year
Month
Example:
Category B -> Tamil Nadu -> Chennai (South) -> 2014 -> 01 -> [login to view URL]
Filename Format: case-<case number>-<Year>-<month>-<date>
Example: [login to view URL]
Input to php script:
Example a) php /protecteddirectory/[login to view URL]
Example b) php /protecteddirectory/[login to view URL]
Example of Output storage path:
/webdirectory/tamilnadu/chennai-South/2014/01/[login to view URL]
The webdirectory is publicly accessible.
1) Access key: 16 digits.
2) date=today will process all inputs in original website for today
3) 'eff' is the start date to be used in original website
4) 'dis' is the end date to be used in original website
HTML meta tags will be generated using content data from original website. The title tag will include the values from columns of the main table. The title tag will be the value under Case number, Complainant, Respondent (example Case FA/465/2013 - Parvez Ali VS Smt. Saswati Das).
The description tag will include 160 characters of data within the document following the keyword ": O R D E R :". If this keyword is not present, pick random sentence in the middle of the document.
RSS feed Link type A: [login to view URL]
RSS feed Link type B: [login to view URL]
The RSS feed will include sub folders of target website directory (all states and districts)
RSS Format and version:
rss version="2.0" xmlns:atom="[login to view URL]
<?xml version="1.0" encoding="utf-8"?>
RSS title and description will be picked using meta tags.
Hello, sir.
I read your job posting with interest.
I am very interested in your job.
I am a excellent web developer and have rich experiences in c, java for 10+ years.
I can help you complete your job perfectly in short time.
I want to discuss in more details about this job.
Looking forward your reply.
Thank you.
Best regards.