app to pass files in directory to external processor
$100-500 USD
Paid on delivery
I require an application that will scan a set of directories and subdirectories for PDF files and create text-searchable versions. The application to create text-searchable PDFs will be purchased, the application need only manage the directory crawling.
The application will also have a "watched folder" feature where if a file is placed in a folder it will be ocr'ed into another folder.
This application should run as a service (and have accompanying applet with configuration and log file viewer for real-time watch of scanning process)
The application being used for OCR is img2pdf command line version. There is a downloadable demo available.
## Deliverables
Process detail:
1. Folder (and subfolders)? is scanned at predefinted time, or triggered via the applet.
2. A PDF is found and it's "creator" property is evaluated. If it is "img2pdf" then it has already been recognized and is bypassed
3. If the pdf creator property is not img2pdf a command line application is called to create a new pdf that has been ocr'd by the external command line app.
4. If process is successful, the old pdf is either renamed with a .old extension or deleted (based on configuration variable) and the new file renamed as the original file name.
5. If config varible is set to log, a log file is appended to with filename, OCR process time (if processed). If not processed and just skipped over log as well. Errors should be captured and logged.
Watched folder process detail:
1. System scans input folder every x mins (defined in config), or triggered manually via applet
2. Looks for files with thexe extensions:? TIFF,JPG,PNG,GIF,PCD,PSD,TGA,BMP,DCX,PIC,TIF
3. Runs each file through img2pdf
4. If process is successful, the old pdf is either renamed with a .old extension or deleted (based on configuration variable) and the new file renamed as the original file name.
5. If config varible is set to log, a log file is appended to with filename, OCR process time (if processed). If not processed and just skipped over log as well. Errors should be captured and logged.
Configuration variables
1. img2pdf location
2. img2pdf command line variables. This only needs to be any additional options as part of command line
3. Daily times for folder scanning (allow multiple)
4. Folders for folder scanning (drive letters accepted, multiple folders)
5. Original pdfs saved (true/false)
6. Debug log (true/false)
7. Watched folder check time (every x mins)
8. Watched folder input file
9. Watched folder output file
Command line information for img2pdf
Image2PDF v3.5: Convert and combine images into a PDF file.
Homepage: <[url removed, login to view]>
E-mail? : <support@[url removed, login to view]>
Build? ? : Dec 17 2008
Current version support image format: TIFF,JPG,PNG,GIF,PCD,PSD,TGA,BMP,DCX,PIC e
tc.
-------------------------------------------------------
Usage: img2pdf [options] <-o output> <images>
-l [log file name] : specify log file for output message
-u [producer]? ? ? ? ? : producer
-d [creator]? ? ? ? ? ? : creator
-j [subject]? ? ? ? ? ? : subject
-t [title]? ? ? ? ? ? ? ? : title
-a [author]? ? ? ? ? ? ? : author
-k [keywords]? ? ? ? ? : keywords
-e [CreationDate]? : CreationDate, eg. 20070116230629-08'00'
-E [ModDate]? ? ? ? ? ? : ModDate, eg. 20070116230629-08'00'
-p [0 or 1]? ? ? ? ? ? ? : append to an exist pdf file, 0:insert at first page,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1:append to last page
-s? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? : skew correct
-c? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? : clear spot
-x [OCR type]? ? ? ? ? : OCR documents within conversion, the value of document
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? type must equal 1, this option is only available in
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Image2PDF OCR Command Line Version
? ? -x 1? ? ? ? ? ? ? ? ? ? ? : create searchable PDF file from input image files
-r [resolution]? ? ? : set resolution in generated pdf file
? ? -r? 0? ? ? ? ? ? ? ? ? ? : use the default image width and height information
? ? -r -1? ? ? ? ? ? ? ? ? ? : take over DPI info from original images
-o [PDF file name] : PDF file will be generated
-b [num]? ? ? ? ? ? ? ? ? ? : specify bookmark attribute
? ? ? num can use any one of the following values:
? ? ? >= 0? ? ? ? ? ? ? ? ? ? : specify first number in bookmarks
? ? ? == -1? ? ? ? ? ? ? ? ? : don't use bookmark
? ? ? == -2? ? ? ? ? ? ? ? ? : read bookmarks from [url removed, login to view] file
? ? ? == -3? ? ? ? ? ? ? ? ? : use the filenames as bookmarks
? ? ? == -4? ? ? ? ? ? ? ? ? : use the filenames as bookmarks, one bookmark at the first p
age of each tif
-$ [Registration key] : register the image2pdf command line with your regcode
-------------------------------------------------------
Example:
? ? ? ? img2pdf -b 10 -o c:\[url removed, login to view] -r 100 c:\[url removed, login to view] c:\[url removed, login to view] c:\a3_dir
? ? ? ? img2pdf -b -3 -o c:\[url removed, login to view] c:\*.tif
? ? ? ? img2pdf -b -1 -o c:\[url removed, login to view] -r 300 c:\a*.jpg
? ? ? ? img2pdf -b -3 -o d:\*.pdf -r 300 c:\a*.jpg
? ? ? ? Img2PDF -b -3 -o "c:\pdf dir\*.pdf" "c:\*.*"
? ? ? ? img2pdf -j "subject" -t "title" -a "author" -k "keywords" -o c:\[url removed, login to view]
c:\[url removed, login to view]
? ? ? ? img2pdf -p 0 -o c:\[url removed, login to view] c:\[url removed, login to view]
? ? ? ? img2pdf -p 1 -o c:\[url removed, login to view] c:\[url removed, login to view]
? ? ? ? img2pdf -x 1 -o c:\[url removed, login to view] c:\[url removed, login to view]
? ? ? ? img2pdf -x 1 -o c:\[url removed, login to view] c:\[url removed, login to view]
Project ID: #3753893