Tesseract OCR

Tesseract is an open source OCR engine used for converting scanned images into .txt or .pdf documents whose text is searchable.

Note

To obtain a good quality output, use Tesseract 3.03 or higher.

Installation

To install Tesseract on Windows, complete the following steps:

  1. Download the Tesseract executable at the page [external link] https://github.com/UB-Mannheim/tesseract/wiki.

  2. Once downloaded, open it and select the language.


../../_images/tesseract_language_en.png

  1. The welcome wizard opens, press Next > to continue.


../../_images/tesseract_welcom_wizard_en.png

  1. Read the license terms, accept them and then press I Agree.


../../_images/tesseract_license_agreement_en.png

  1. Choose if the installation is for all users or only for the current user and press Next >.


../../_images/tesseract_installation_user_en.png

  1. Choose the features of Tesseract to be installed and then press Next >.


../../_images/tesseract_installation_features_en.png

Note

To install additional language data, a working internet connection to download the missing files is required.

  1. Choose the location where to install Tesseract and then press Next >.


../../_images/tesseract_installation_location_en.png

  1. Choose the folder where to create the Tesseract's shortcut or select Do not create shortcuts to avoid this step, then select Install.


../../_images/tesseract_shortcut_en.png

The installer will start working and it will look like this:


../../_images/installing_en.png

  1. Once the installation has been completed, press Next > and then Finish.


../../_images/tesseract_installation_completed_en.png

Once the installation is finished, Tesseract should be configured on the Genius Server through the Configuration Tool. For further details refer to Tesseract.

Warning

To install Tesseract on other Operating Systems, refer to [external link] https://github.com/tesseract-ocr/tesseract/wiki.

For further details about the usage of tesseract, refer to Tesseract.

Uninstallation

To uninstall Tesseract, proceed as follows:

  1. Open the Control Panel and select Programs and Features.


../../_images/control_panel_en.png

  1. Right click on Tesseract and select Uninstall.


../../_images/tesseract_uninstall_en.png

  1. Select a language and press OK.


../../_images/tesseract_language_uninstall_en.png

  1. Check the path of the folder from which Tesseract should be removed and then press Uninstall.


../../_images/tesseract_confirm_uninstall_en.png

  1. Once the uninstallation has been completed press Close.


../../_images/tesseract_uninstallation_complete_en.png

Tesseract has been successfully uninstalled.