C# crawler (NCrawler) - Simple interface on top of NCrawler framework -
$10-30 USD
Cancelled
Posted almost 9 years ago
$10-30 USD
Paid on delivery
Hi, here's what we need:
- Write a Windows form or Web interface on top of NCrawler console <[login to view URL]> (LGPL license).
* You could use other crawler frameworks too, as Abot<[login to view URL]>, but most features requests below are already integrated in libraries within NCrawler's latest source files.
**The scope of the current project**:
1.
a. Write a windows forms or web interface on top of NCrawlers console *, where indexing job for the links of a given URL can be started, stopped and resumed.
b. Stopping should also occur if, for example, internet connection breaks down, or the program is closed.
c. Where the retry count of failed URLs can be specified, as well as link depth.
d. Where a proxy list can be specified and turned on/off (off meaning don't use proxy)
2.
Where the found pages property bag (url, page content, date, etc..) are saved into a SQL Express database, and the currently processed URL logged onto a text-box area on the interface.
**Target system**:
Our system has .NET 4 and Microsoft SQLExpress.
**Deliverables**: We need a working sample with clean code including all source files in C#, that is able to index [][1]<[login to view URL]> with a link-depth of 3 and that can paused and resumed, when that pause and resume on disconnect internet connection and reconnect, or close program and open it again and resume the job. All data should be stored in Ms SQLExpress. (Watch out for UTF-8).
----------------------------**
Information for the programmer to make your work easier:**
For stopping / resuming: Have a look at [login to view URL](false or true);
Regarding link-name extraction:[login to view URL] doc = new [login to view URL]();
[login to view URL]([login to view URL]);
You can use RegEx.
Have a good day and all the best.
We are very interested in this project. I have read your above description and i think its well within our range to execute this is in a good time frame so would you be kind enough to message me so we can have a fast and understanding agreement.
Create a user interface for NCrawler with a textbox for url and a "CRAWL" button;
Support Pause/Resume;
Support Multiple Instances;
Support Multiple Storage Options.
A WPF interface with three views
View 1 - Input the Url's to be crawleds and some results
View 2 - Details View of Crawled Url
View 3 - Additional Settings (deep link, try attempt before fail, etc)
Database integration using Entity Framework
Build on top of .Net framework 4 and latest version of NCrawler of .Net framework 4.0
2 days to model database and resolving doubts
2 days of programming and resolving doubts
1 day of tests
Communication via skype in english of portuguese