General scanner tips
The first and most important step is to convert your paper statements into an electronic format (TIF or PDF). This will be used as input for StatementReader. The accuracy of StatementReader depends on the way that this input file is created. Therefore, you should use these tips:
- Align your pages properly on the scanner.
- Set your scanner to at least 300DPI. A lower DPI setting will reduce the success of the optical character recognition (OCR) process.
- If your scanner has OCR software installed (which is common for most office scanners) make sure that you make use of it by setting your scanner to create a ‘Searchable PDF’.
- Ensure that all of the pages that you scan together as one file, relate to the same bank.
A searchable PDF document is a file that contains both images and text. A non-searchable PDF document contains scanned pages in an image format, similar to that of a multipage TIFF file. StatementReader can process and extract data from both kinds of PDF document, but the accuracy and speed is greatly enhanced if you use a searchable PDF document.
If your scanner does not have integrated OCR capabilities and therefore cannot create searchable PDF documents then you could use OCR technology developed by either Abbyfine or Nuance. Both of these solutions are successful in extracting text even from low quality scanned pages.
If you cannot create a searchable PDF document then StatementReader will use its internal OCR engine (Google’s Tesseract engine) which is often inferior to the above options. In this case, be careful to always align the paper correctly in the scanner and scan to 300DPI. Also, select ‘text’ on your scanner, so that the text contrast is enhanced and any background shading is removed.
An alternative, and very effective external server solution is also available. Please contact us for more information about arranging a trial of this solution - setting this up takes just a few seconds.
Recent Posts
See AllRevisited starter script from January 2021: Split Excel file into separate files Excel is essential, and Python is the future - forcing...