Simhashing algo to locate similar files in a folder tree(repost)

I need a Visual Basic 6 algorithm to detect "similar" files in one or more folder trees.

It should use a simhashing method to create "fingerprints" of the files and then hamming distance to determine the similarities between fingerprints.

The approach should be similar to the one used by Google and described in this paper:

[url removed, login to view]~manku/papers/[url removed, login to view]

## Deliverables

This function must be cancellable while in operation and must also provide progress feeback (completion percentage and name of file under examination) via Event notifications which can be used to display progress on a dialog or status bar, if desired.

Previous experience with both simhashing and VB6 are REQUIRED for this project.

The implementation must be highly performant - maximum optimization is expected. A routine that simply does the job will not be sufficient. It must do so with the lowest memory footprint and CPU cycles possible.

The code will be maintained by myself so it must be well written and clearly documented. Hungarian notation must be used (or some similar consistent coding standard) and all variable names should be descriptive.

The input and output parameters of this function can be discussed and agreed upon by both parties after some research has been done by the programmer.

Any other questions can be discussed during the research and information gathering portion of this task.

Skills: Engineering, MySQL, PHP, Project Management, Software Architecture, Software Testing, Visual Basic

See more: written research papers, visual basic programmer job, visual basic algorithm, VB6 Job, trees in algorithm, tree notation, tree in algorithm, the hungarian algorithm, programmer vb6, job research paper, job hungarian, i need a visual basic programmer, i need a paper written, i need a basic programmer, hungarian tree, hungarian method, hungarian algorithm, has algo, google algorithm questions, distance job programmer

About the Employer:
( 29 reviews ) United States

Project ID: #3110460