API and server integration

If you’re looking for an offline API to OCR PDF bank statements into Excel with a free download and full support to get you started, keep reading.

 

StatementReader utilises unique templates for each known bank statement format; these are already provided for a vast number of the common banks operating in UK, Canada, India and Australia, with a built in tool for users to create and share (internally) new templates in under a minute or a direct support link for our external support team to quickly create and release new optimised templates.

 

StatementReader works as a user controlled extraction and analysis tool, whereby the user selects the bank template and the scanned file, and then clicks ‘Go’ to see the extracted data in Excel.

 

Also, StatementReader can be configured to run as an offline API, with a full OCR, extraction and data validation converter ready to use, today.  Each module runs offline within your network, and has been implemented for financial intermediaries with with highest IT security requirements.

 

The transactional data and other account information located on the page is extracted locally from the searchable PDF document, into a standardised column structure, initially held in SQLite database files on the user’s machine. This process therefore does not necessitate the need for your own database installation. Once extracted in its raw form, the data is validated and output to Excel (by default), or CSV files, with a repair summary that is presented to the user.

 

The repair summary details the number of lines extracted, how many dates and amounts had a potential error, how many corrections were made and an accuracy percentage for before and after the automated corrections.

 

The validated Excel documents show orange shading for any potential errors, and green shading for any corrected cells.  Also, hyperlinks are added for the user to access the PDF page from which each row was extracted.  The PDF pages accessed here are cached locally for a few days (as defined in the options).

 

PDF bank statements are converted to Excel and CSV, for seamless integration with your existing analysis process.  This is made easy with defined folders for the API output, and standardised database/CSV/Excel column structures.

 

Technical considerations

An offline server based StatementReader solution will incorporate the following:

 

- Licence authentication for an unlimited or volume based usage model

 

- Software updates will be initiated by your IT team to install to each user’s machine

 

- Users can directly upload documents for template or processing support

 

- Output of Excel/CSV files/error logs to a defined folder

 

Access to an OCR server can be configured within the StatementReader options, this server may run within your network.  The OCR server is required to convert documents into searchable PDF documents, therefore extracting transactions from searchable PDF documents (with readable embedded text) will not require an OCR server.

Windows technical requirements

Computer and processor - 1 gigahertz (GHz) or faster x86 or x64-bit processor

Memory (RAM) - 1 gigabyte (GB) RAM (32-bit); 2 gigabytes (GB) RAM (64-bit)

Hard Disk - 2.0 gigabytes (GB) available

Display - 1024 x 576 or higher resolution monitor

Operating System - Windows 7/8/10 (32-bit or 64-bit)

 

OCR solution technical requirements

1) A proprietary OCR engine for Windows

All OCR products benefit from more cores and every core will need 1GB of RAM. How many cores you want to allocate is entirely dependent on your expected volume and your resources. We advise to allocate a minimum of 6 cores/6GB for the initial tests. Then you can increase the cores/RAM at a later date when you have a clearer picture of the volume that you will be handling.

 

2) StatementReader OCR Server Module

The StatementReader Server Module is the actual server that the StatementReader clients connect to. It must be installed on the aforementioned Windows machine. The Server Module listens on TCP port 501 (n.b. this can be configured to any other port number). The clients connect to the Server using a direct TCP connection. They communicate using our own protocol (StatementReader protocol), thus we do not need to piggyback our messages on other protocols like HTTP, HTTPS or SSH and subsequently other servers do not have to be installed on the Windows machine.

 

It is the StatementReader Server Module that is accepting input files from the clients, calls the OCR application to process the input files and transmits the results back to the clients. The key requirement is that StatementReader receives a searchable PDF document to process, and therefore it is not essential that the suggested OCR solution is used.

 

 

Support

We are driven to satisfy our clients and are pleased to offer:

 

- Assisted implementation of the application (with a hosted OCR server)

 

- Creation of optimised templates for your bank statements within 24 hours

 

- Unlimited ongoing technical support including system migration and server monitoring

 

- Software upgrades with new bank templates created for our clients globally

 

- User training including document processing and template creation

 

- A limited 24 hour bank statement processing service is also provided by arrangement