Find Jobs
Hire Freelancers

Crawler combination(rebuild)

$250-750 USD

Closed
Posted almost 8 years ago

$250-750 USD

Paid on delivery
We need some one to combine two of our crawlers. Crawler A: It can scrape web site, remove the web code, find the absolute path of link, store picture and resource and store the content of the data into MongoDB. And the data will be in a tree-structure, just like use F12 to check the elements of the web page. And this crawler allows us to import a file of website. But it crawl very slowly, because it use chrome drive to crawl. Crawler B: It can crawl really fast, but it can only write the data to a file with all the web code. So basically, we want to combine them. For crawling, we want to use Crawler B's speed. But for other function. We want to use Crawler A's, especially for the data storage in MongoDB. PLZ provide your previous experience(sample or demo), a better crawler framework will be very welcome
Project ID: 10707115

About the project

9 proposals
Remote project
Active 8 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
9 freelancers are bidding on average $525 USD for this job
User Avatar
Dear Sir, I am TOP RANKED programmer with 10 years of experience. I can merge both crawlers and create a fast one. Send me code.
$555 USD in 15 days
4.8 (464 reviews)
7.5
7.5
User Avatar
I guess: the first crawler use Selenium framework, right? The program will open browser window (as you mentioned, use Chrome Driver), the program then wait the browser window render the web pages complete, after that it scrape data the second crawler use HTTP request directly, so it will be quick, but HTTP request can only get the original source of the pages, it cannot run javascript to render the page. It's impossible to combine the 2 aspects directly, but there is another way to speed up. That is use multi-threads, to use multi-threads, the tasks must could be split into sub tasks, such as you have 10000 pages to scrape, you can put into 10 threads, each thread 1000 pages.
$555 USD in 10 days
5.0 (44 reviews)
6.3
6.3
User Avatar
Hi mate, I have a lot of experience with parsing and extracting links and elements from text. Combining the two crawlers should be a routine task for several days. Just contact me to discuss the details and the project will be a breeze.
$350 USD in 7 days
5.0 (2 reviews)
4.4
4.4
User Avatar
We have very good experience in developing web crawlers and website automation scripts in .NET and have done several similar projects in past. You can see our reviews and satisfaction level of our clients for such projects. Please share website from where you want to get the data in your DB and we will prepare and send a sample to you so that you will be 100% sure that we can d you work. Please message us soon as we are ready to start today.
$850 USD in 10 days
5.0 (4 reviews)
4.0
4.0
User Avatar
Hi, I'm a software developer with 5 years experience. I have created many scrapers and used all the good frameworks, including Selenium, Jsoup and HttpClient. My last scraping project involved downloading hundreds of thousands of shopping products from Ezbuy and storing the info inside a CSV file. I can re-examine your problems with both scrapers & tasks then create a superior scraper that better fits your needs. This project would take from 1-2 weeks. Anyway if you're interested, PM me. Sincerely, Owen McMonagle. Software Eureka.
$750 USD in 10 days
5.0 (3 reviews)
3.3
3.3

About the client

Flag of CHINA
上海, China
5.0
45
Payment method verified
Member since Dec 9, 2015

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.