Get Optical Character Reader - English

Get Optical Character Reader – English

This tutorial will show you how to use Tesseract OCR and gImageReader to read (Recognize) text from a PDF file or Image File.

Steps to Implement Optical Character Reader (English)

1. Download Tesseract OCR Engine

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google and is probably one of the most accurate open source OCR engines available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 40 languages.

2. Install Tesseract OCR Engine

  1. Double Click on the downloaded exe to install the Tesseract OCR Engine
  2. Allow the setup to install the Tesseract Engine
  3. Click Yes for Do you want to install Tesseract-OCR 3.0.1?
  4. License Agreement
    • Check I accept the terms of the License Agreement
    • Click Next
  5. Choose Components
    • Keep the settings as it is
    • Click Next
  6. Choose Install Location
    • You may change the destination folder or just keep it as it is.Just Note down the destination folder for later
    • Click Install
  7. Let the Setup Install all the files
  8. Completing the Tessaract-OCR 3.01 Setup Wizard
    • You may check or uncheck Show Readme
    • Click Finish

We have Downloaded the OCR Engine. This Engine process the pdf files. however, for ease of use, we still need a Graphical Interface for OCR

2. Download gImageReader

Install gImageReader

  1. Double Click on the Downloaded Setup exe to install the gImageReader
  2. Welcome to the gImagereader Setup Wizard
    • Click Next
  3. License Agreement
    • Click I Agree
  4. Read Me
    • Click Next >
  5. Choose Components
    • Keep the default settings
    • Click Next >
  6. Choose Install Location
    • You may change the destination folder. Note down the destination folder for future
    • Click Install
  7. Let the Setup install all the files
  8. Installation Complete
    • Click Close

You are ready to use the OCR. To read a pdf file, follow the following steps

  • You can Open a originally scanned pdf file or image file.
  • Or can Acquire by selecting your scanner
  • After you select the file, Click Recognize All

There you go! Simple and useful. Tell me if you find this useful.

Currently, we have not added English Dictionary. I will soon post a tutorial on adding the English Dictionary

Comments

  • Tesseract + gImageReader – How to Add English Dictionary | Sanskruti Technologies | Aug 29,2012

    […] Websites DesignDomain RegistrationEcommerceGoogle Apps SetupContactBlogCareer Post navigation ← Previous Tesseract + gImageReader – How to Add English Dictionary Posted on August 29, 2012 by […]

  • You must be logged in to post a comment