I need a Visual Basic 6 algorithm to detect "similar" files in one or more folder trees.
It should use a simhashing method to create "fingerprints" of the files and then hamming distance to determine the similarities between fingerprints.
The approach should be similar to the one used by Google and described in this paper:
[url removed, login to view]~manku/papers/[url removed, login to view]
This function must be cancellable while in operation and must also provide progress feeback (completion percentage and name of file under examination) via Event notifications which can be used to display progress on a dialog or status bar, if desired.
Previous experience with both simhashing and VB6 are REQUIRED for this project.
The implementation must be highly performant - maximum optimization is expected. A routine that simply does the job will not be sufficient. It must do so with the lowest memory footprint and CPU cycles possible.
The code will be maintained by myself so it must be well written and clearly documented. Hungarian notation must be used (or some similar consistent coding standard) and all variable names should be descriptive.
The input and output parameters of this function can be discussed and agreed upon by both parties after some research has been done by the programmer.
Any other questions can be discussed during the research and information gathering portion of this task.