Sample Project Report
Sample Project Report
Sample Project Report
On
HCR system to transform handwritten text to printed
format and evaluation of various models
Submitted
In partial fulfilment
For the award of the Degree of
This is to acknowledge our indebtedness to our Project Guide, Ms. Swapna Yennishetti, C-
DAC ACTS, Pune for her constant guidance and helpful suggestion for preparing this project
HCR system to transform handwritten text to printed format and evaluation of various
models. We express our deep gratitude towards her for inspiration, personal involvement,
constructive criticism that she provided us along with technical guidance during the course of
this project.
We take this opportunity to thank Head of the department Mr. Gaur Sunder for
providing us such a great infrastructure and environment for our overall development.
We express sincere thanks to Mrs. Namrata Ailawar, Process Owner, for their kind
cooperation and extendible support towards the completion of our project.
It is our great pleasure in expressing sincere and deep gratitude towards Mrs. Risha P R
(Program Head) and Mrs. Srujana Bhamidi (Course Coordinator, PG-DBDA) for their
valuable guidance and constant support throughout this work and help to pursue additional
studies.
Also, our warm thanks to C-DAC ACTS Pune, which provided us this opportunity to
carry out, this prestigious Project and enhance our learning in various technical fields.
5 Results 15-16
5.1 Results 15
6 Conclusion 17
6.1 Conclusion 17
7 References 18
7.1 References 18
HCR system to transform handwritten text to printed format and evaluation of various models
Chapter 1
Introduction
Detection of text regions either from handwritten or printed document images containing various non-textual information is a difficult task, and it can be more challenging to locate the position of the text regions when we deal with a doctor’s prescription.
1.1 Introduction
In a society that is now digitally enhanced, we depend on computers to process huge
amounts of data. Various economic and business requirements demand a fast inputting of
huge volumes of data into the computers. This cannot be achieved by manually typing the
data and entering it into the computers as it is very time-consuming. Hence mechanizing
the manual process plays an important role. Many kinds of research came in the character
recognition area where optical character recognition (OCR) has made a mark. Detection
of text regions either from handwritten or printed document images containing various
non-textual information is a difficult task, and it can be more challenging to locate the
position of the text regions when we deal with a doctor’s prescription.
Optical character recognition – It is known as the process of reading the text from the
documents, both the printed text and handwritten text and converting the text into a form
that the computers can operate on. Optical character recognition is the translation of
handwritten, typewritten, or printed paper into machine editable text by using any
scanning device or software. It is a field of research in pattern recognition, machine
vision, and artificial intelligence. And each year, this technology helps us free large
amounts of physical storage space once given over to file cabinets and boxes of paper
documents.
In our project work, we have proposed a model that uses CRNN and EasyOCR to convert
scanned images of input documents into machine editable text. Neural networks learn and
remember what they have learned, enabling them to predict classes or values for new
datasets, but what makes CRNN different is that unlike normal neural networks, CRNNs
rely on the information from previous output to predict for the upcoming input. Firstly,
the input documents are converted into an image format, which is then classified into
printed, handwritten cursive, handwritten discrete and semi-printed by an input classifier.
Text in the image is predicted using corresponding models. The predicted text is therefore
available as machine editable text which can be retrieved easily whenever necessary.
EasyOCR is a python package that allows the image to be converted to text. It is by far
the easiest way to implement OCR and has access to over 70+ languages including
English, Chinese, Japanese, Korean, Hindi, many more are being added. EasyOCR is
created by the Jaided AI Company.
1.2 Objective
The objectives of the project work are as -
□ To test a system capable of recognizing English alphabets.
□ To have better understanding of CRNNs and apply it for character recognition.
□ To have better understanding of different digital image processing tools.
□ To understand the process and steps involved in the development of EasyOCR.
The study will put emphasis on the testing of the CRNN software using computer printed
and handwritten English alphabets, as the system is capable of learning and recognizing a
single character at a time. The duration of training the system will, therefore, be long
because the handwritten characters have more complex factors to be considered such as
alignment and different writing styles.
Chapter 2
LITERATURE REVIEW
Dibyajyoti Dhar et al. [1] In this research paper, a method has been proposed to classify
printed and handwritten texts found in doctor’s prescriptions. As the proposed method
has successfully classified the printed and handwritten texts in the documents and with a
very low complexity, hence this can easily be embedded with recognition module as an
additional resource requirement. The dataset used in this model contains handwritten
prescriptions which are used to classify whether the text is printed text or handwritten
text.
Chammas et al. [3] In this research work, they presented a state-of-the-art CRNN system
for text-line recognition of historical documents. They showed how to train such system
with few labeled text-line data. . They also improved the performance of the system by
augmenting the training set with specially crafted synthetic data at multiscale. At the end,
they proposed a model-based normalization scheme by introducing the notion of the
variability in the writing scale to the test data.
Shabana Mehfuz et al. [4] Handwritten character recognition is always a frontier area of
research in the field of pattern recognition and image processing and there is a large
demand for Optical Character for Recognition on hand written documents. This paper
provides a comprehensive review of existing works in handwritten character recognition
based on soft computing technique during the past decade.
Shakrupa et al. [5] In this research paper they have evaluated two RNN architectures for
handwritten text recognition based on Connectionist Temporal Classification and
Sequence-to-Sequence Learning approach. The obtained results are comparable with
81.5% average recognition rate over all manuscripts. Both methods showed promising
results, the CTC model consistently outperformed the Seq2Seq model on both training
and test datasets.
Sueiras et al. [6] use the method of combining deep neural networks with sequence to
sequence networks, also called an encoder-decoder. The proposed architecture aims to
identify characters and conceptualize them with their neighbors to recognize a given
word. For training and testing IAM and RIMES, these datasets consist of handwritten
texts on white background from many people. The error rate in the test set is 12.7% in
IAM and 6.6% in RIMES. This method is more efficient in the case of language
translation and speech to text conversion rather than handwriting recognition.
Chapter 3
Methodology and Techniques
3.1 Methodology:
3.1.1 CRNN
The outline of the proposed work is represented using the block diagram as shown in
Fig.1. The first step in the process is training the dataset. The dataset is trained with CNN
and RNN layers. The obtained output and the ground truth text are passed through the
CTC layer to get the trained model. The obtained trained model is then used to recognize
the text in the input image.
The input handwritten image is pre-processed by adjusting the resolution. The first step in
recognition is to break down the paragraph image into line images. Then line images are
further segmented into word images. The word images are then preprocessed and passed
through the same CNN and RNN layers that were used in training.
The output of the RNN layers is given to the CTC layer (decoding level) to decode the
output text with the help of the trained model. The method of extracting word images
from a paragraph and combining CNN, RNN, and CTC techniques to train the NN model
is very effective for implementation. As a whole, we are proposing an end-to-end
handwritten text recognition system which is to be implemented by using CRNN and
EasyOCR techniques.
3.1.2 EasyOCR
EasyOCR is actually a python package that holds PyTorch as a backend handler.
EasyOCR like any other OCR (tesseract of Google or any other) detects the text from
images but in my reference, while using it I found that it is the most straightforward way
to detect text from images also when high end deep learning library (PyTorch) is
supporting it in the backend which makes it accuracy more credible. EasyOCR supports
42+ languages for detection purposes. EasyOCR is created by the company named Jaided
AI company.
3.2 Dataset
The IAM dataset has been used for training the model. The IAM database consists of
handwritten English sentences. It is based on the Lancaster-Oslo/Bergen (LOB) corpus.
The database serves as a basis for a variety of recognition tasks, particularly useful in
recognition tasks where linguistic knowledge beyond the lexicon level is used, as this
The IAM database also includes a few image-processing procedures for extracting the
handwritten text from the forms and the segmentation of the text into lines and words.
The training of the model is done using the IAM dataset along with the IAM dataset,
custom handwritten paragraph images collected from random people were used for
testing. These custom images were captured under normal lighting with a 5MP camera
with a resolution of range 800 to 1000 dpi.
The main goal of this step is to modify the images in a way that will make it easier and
faster for the recognizer to learn from them. Images of similar size are nowadays
commonly used in image recognition with convolutional networks our main goal was
rapid prototyping of various RNN architectures and therefore the size of input data was
important factor affecting the total time of training.
We considered using convolutional layers for feature extraction, but initial experiments
showed that it provided only minor improvement of accuracy and considerably
increased the training time. We also tried simple image enhancement techniques
(Gaussian blur for noise reduction and binarization) as additional preprocessing steps,
which provided small increase of accuracy for Seq2Seq approach, but did not
substantially affect accuracy of CTC approach.
LSTMs are specially constructed RNN nodes to preserve long lasting dependencies.
They consist of self-connected memory cell that can be compared to the classical RNN
node and three gates that control output and input of the node. Each gate is in fact a
sigmoid function of the input to the LSTM node.
The first gate is an input gate which controls whether new input is available for the node.
The second gate is a forget gate which makes possible for the node to reset activation
values of the memory cell. Here last gate is an output gate controlling which parts of the
cell output are available to the next nodes.
Further improvement to the RNN models based on LSTM is achieved by the use of
opposite two directional layers or so-called Bidirectional Long Short-Term Memory
(BLSTM). The goal of the forward layer is to learn context of the input by processing the
sequence from the beginning to the end, while the backwards layer performs the opposite
operation by processing the sequence from the end to the beginning. It was demonstrated
that this architecture performs better than a simple uni-directional LSTM.
Being a very hard problem by itself its complexity keeps increasing in time since there is
a high tendency of encountering possible errors already in the segmentation step. As a
result this would also limit the context of the data learned by the RNN. To target this
problem the Connectionist Temporal Classification (CTC) approach was introduced,
originally for speech recognition and afterwards also for handwriting recognition.
CTC makes it possible to avoid the previously mentioned direct alignment between the
input variables and the target labels by interpreting the output of the network as a
probability distribution over all possible label sequences on the given input sequence.
Chapter 4
Implementation
1. Use of Python Platform for writing the code with Keras, TensorFlow, OpenCV
2. Hardware and Software Configuration:
Hardware Configuration:
● CPU: 8 GB RAM, Quad core processor
● GPU: 16GB RAM Nvidia's GTX 1080Ti
Software Required:
● Anaconda: It is a package management software with free and open-source
distribution of the Python and R programming language for scientific
computations (data science, machine learning applications, large-scale data
processing, predictive analytics, etc.), that aims to simplify deployment.
● Jupyter Notebook:
Jupyter is a web-based interactive development environment for Jupyter
notebooks, code, and data.
Jupyter is flexible: configure and arrange the user interface to support a
wide range of workflows in data science, scientific computing, and
machine learning.
Jupyter is extensible and modular: write plugins that add new components
and integrate with existing ones.
● Spyder: Spyder, the Scientific Python Development Environment, is a free
integrated development environment (IDE) and open source scientific
environment that is included with Anaconda written in Python, for Python,
and designed by and for scientists, engineers and data analysts.
It includes editing, interactive testing, debugging, and introspection features
with the data exploration, interactive execution, deep inspection, and
beautiful visualization capabilities of a scientific package.
CRNN Model:
Model Summary –
CTC Layer-
User Interface –
Chapter 5
Results
Epochs
Loss Curve
Accuracy Score –
Chapter 6
Conclusion
6.1 Conclusion
Chapter 7
References
[1] Dhar, Dibyajyoti & Garain, Avishek & Singh, Pawan & Sarkar, Ram. (2021).
HP_DocPres: a method for classifying printed and handwritten texts in doctor’s
prescription. Multimedia Tools and Applications. 80. 1-34.
10.1007/s11042-020-10151-w.
[2] AL-Saffar, A.; Awang, S.; AL-Saiagh, W.; AL-Khaleefa, A.S.; Abed, S.A. A
Sequential Handwriting Recognition Model Based on a Dynamically Configurable
CRNN. Sensors 2021, 21, 7306. https:// doi.org/10.3390/s21217306
[3] Chammas, Edgard & Mokbel, Chafic & Likforman-Sulem, Laurence. (2018).
Handwriting Recognition of Historical Documents with Few Labeled Data. 43-48.
10.1109/DAS.2018.15.
[5] Shkarupa, Yaroslav & Mencis, Roberts & Sabatelli, Matthia. (2016). Offline
Handwriting Recognition Using LSTM Recurrent Neural Networks. THE 28TH
BENELUX CONFERENCE ON ARTIFICIAL INTELLIGENCE November 10-11,
2016, Amsterdam (NL). 1. 88.
[6] J. Sueiras, V. Ruiz, A. Sanchez and J.F. Velez, “Offline Continuous Handwriting
Recognition using Sequence to Sequence Neural Networks”, Neurocomputing, Vol. 289,
[7] https://github.com/Breta01/handwriting-ocr
[8] https://github.com/githubharald/SimpleHTR
[9] https://github.com/solivr/tf-crnn
[10] https://keras.io/examples/vision/handwriting_recognition/