Find Jobs
Hire Freelancers

Automatic imports from Wikimedia

$750-1500 USD

In Progress
Posted over 8 years ago

$750-1500 USD

Paid on delivery
I'm working on a project that creates fingerprints for images and videos. The code for this is available as [login to view URL] and is undergoing constant development. What I'm looking for now is to use this blockhash algorithm to create a database of hashes for images and videos available from Wikipedia (especially Wikimedia Commons), and keep this database updated as works are added/removed/updated from Wikipedia. This project creates the Python scripts that can run in the background to keep the local database updated against Wikipedia. We have done a similar program which is available here: [login to view URL] but it has not been updated, and had some bugs. It also did not really have any support for removing or updating works as they changed in Wikipedia. The contractor who continues this work may choose to work with this existing code base, or start anew (starting from scratch might be preferable). The program is supposed to consist of two parts: a server and a client, that interact with each other in a way that the contractor can define. Previously we've used RabbitMQ or interactions through the PostgreSQL database with almost equal success. The server is responsible for: - Interacting with the Wikimedia API, finding works that are newly added, has been removed, or been updated. - Adding any such works to a "queue" for further processing - Monitoring the client(s) work (for instance, if a work can not be processed, and clients have tried it 3-4 times with different times in between, marking the work as "error" so that it doesn't get processed more) The client is responsible for: - Retrieving works from the queue to process - Getting information from the Wikimedia API about: -- The title -- The copyright statement (Creative Commons or similar) -- The author (name) -- The available media files (image or video files) -- For each media file: --- The URL of the media file --- The blockhash of the media file (calculated by the blockhash command listed previously) The exact information retrieved should only be the basic information about a work. You can see for instance [login to view URL] for examples from a previous project of what information we stored about each work. The server and client should both be constructed in a way that other sources of information, such as Flickr, could easily be added later (the logic would be the same, but the exact API calls etc would change). The retrieved information should be stored in a PostgreSQL database.
Project ID: 8881685

About the project

1 proposal
Remote project
Active 8 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

About the client

Flag of SWEDEN
Gnesta, Sweden
5.0
9
Payment method verified
Member since May 10, 2015

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.