In this article, we will cover what OCR is, how it works, and how we can use different AI OCRs in Python. AI OCRs also be called ICR(Intelligent Character Recognition).
We will also be implementing these OCRs:
- Keras OCR
What Is Optical Character Recognition
In simple words, Optical Character Recognition is a technology for recognizing text inside images. The text can be typed, handwritten, or printed. It processes images and turns them into a machine-readable format.
OCR technology can be used in many ways such as by searching for text in images, scanning and verifying identity cards, improving data management, and much more.
ICR also performs the same functionality as OCR but the main difference is that ICR uses Artificial intelligence and Machine learning to understand characters.
How Optical Character Recognition Works
OCR is a combination of NLP and Computer Vision and the working of an OCR is divided into 3 parts.
- Document Scanning
- Document Preprocessing
- Character Recognition
Document scanning is a significant part of the OCR. In this part, computer vision plays the role and it captures the image or document and converts it into a machine-readable format.
While using OCR you might get into a problem when the scanned image is slightly blurred or has shade etc. That’s where the performance of the OCR would not be up to the mark. For the best performance of the OCR, we can do preprocessing on the image such as unblurring, converting the image into grayscale, or removing the shade. We can use many preprocessing techniques to enhance the document or image, which will result in performing the OCR at its best.
After the completion of scanning and preprocessing the OCR focuses on one character or block of text at a time and the recognition of the character is carried out by using one of the following algorithms.
When training this algorithm we feed different fonts and formats from which the algorithm learns to recognize the character. The algorithm compares every character with fed data and returns the result.
In this algorithm, we feed the training examples with their features. For example, we can define the features of character ‘H’ as “2 straight lines and a single line between them.” Now while recognization the algorithm compares the features of each character and returns the result.
Implementation of Optical character recognition with Python OCR libraries
I hope at this point you guys were able able to understand what is optical character recognition and how optical character recognition works. Now it’s time for the implementation of Optical Character Recognition with Python OCR libraries. All the implementation will be straightforward, So let’s get started.
Firstly you will need python to be installed on your system. It will be much better if you are using conda environments so that implementations of multiple OCRs do not collide with each other.
You can download anaconda from this link:
After the setup, we are ready to create a conda environment. Open the terminal and enter the following command.
‘’conda create -n ‘myenv’ python=3.8’’
The above command will create an environment with python 3.8 and the name will be “myenv”.
We will be implementing pytesseract first. So open the terminal and activate the environment by this command:
“conda activate myenv”
Now install pytesseract along with opencv. Type the following commands:
“pip install opencv-python”
“pip install pytesseract”
You will also need to install tesseract on your system. You can go through the link and install tesseract according to your system’s os.
After the setup, create a python file and start writing code.
In the above code, we have imported our libraries, then the image has been read, and then it has passed to pytesseract function. In the result variable, we can find our extracted text. There are more functions that you can view on their repo.
Here is the git repository of pytesseract.
As we are doing the implementation of another OCR it will be better if we can create another environment for it. After the environment creation, we need to install easy ocr with this command.:
“pip install easyocr”
It will download some dependencies by itself. After that, you can write the following code.
After importing the library, we load the model for the English language in the above code.
After that, we call readtext function and store the result in a variable.
You can go through EasyOCR in detail from here.
Keras-OCR is an OCR that provides a detector as well as a pipeline to train our own custom detector. For the training of your own detector, you can visit this link.
To use a pre-trained model you can follow the below instructions. There are some requirements such as the python version should be >= 3.6 and the TensorFlow version should be >= 2.0.0.
After the requirements, you can install Keras-ocr using the below command:
“pip install keras-ocr”.
It takes a list of images for detection, the code is below:
In the above code block, we only read a single image and append it into a list because of the input for keras-ocr. After that, we initialize the detector and made a prediction. You can find more detail regarding keras-ocr by visiting this link.
That’s it guys, we have explored what is Optical Character Recognition, how it works. We also looked into the implementation of Optical Character Recognition with python OCR libraries.