top of page

Lloyds TSB duplicate bank template - Extracting a PDF with a scanned image and searchable text

Lloyds TSB send out bank statements like this:

Lloyds TSB duplicate bank template

Some of the text here is searchable, and some is an image, so the OCR engine skips the image as it assumes the text is already extracted (unfortunately the image on each page is the bank statement).

To overcome this, I used a standalone OCR engine to force the OCR to read all of the text; another method you can use is to tick the ‘bypass PDF encryption’ box above the Go button. Just FYI, if you try to run this you will first have to right click on the job you have already run in StatementReader and select ‘remove OCR cache’.

Here are the steps you can use: 1. Select the template UK -> Lloyds TSB duplicate 2. Select your input non-searchable PDF document using the ‘browse’ button 3. Untick ‘Parse PDF’ from above the ‘Go’ button (this will use our external OCR server by default, you can check this from the Options -> Advanced options -> Engine window). Also tick ‘bypass PDF encryption’. 4. Click ‘Go’



Recent Posts

See All

Improving Excel with Python (May 2022)

Revisited starter script from January 2021: Split Excel file into separate files Excel is essential, and Python is the future - forcing ourselves to practice the latter by automating some of the commo

Message us or

Call us on +44 (0)20 3287 8283

Mon to Fri: 8am-8pm

Weekends: 10am-6pm

bottom of page