946-Article Text-1837-1-10-20200913 PDF
946-Article Text-1837-1-10-20200913 PDF
946-Article Text-1837-1-10-20200913 PDF
Abstract— In the field of Deep Learning for Computer Vision, most powerful learning models, they are useful in automation
scientists have made many enhancements that helped a lot in the of tasks where the decision of a human being takes too long, or
development of millions of smart devices. On the other hand, is imprecise. A neural network can be very fast at delivering
scientists brought a revolutionary change in the field of image
results and may detect connections between seen instances of
processing and one of the biggest challenges in it is to identify
documents in both printed as well as hand-written formats. One of data that humans cannot see. Having acquired the knowledge
the most widely used techniques for the validity of these types of that is explained in this text, the neural network has been
documents is ‘Character Recognition’. This project seeks to classify implemented on a low level without using libraries that already
an individual handwritten word so that handwritten text can be facilitate the process. By doing this, we evaluate the
translated to a digital form. It demonstrates the use of neural performance of neural networks in the given problem and
networks for developing a system that can recognize handwritten
provide source code for the network that can be used to solve
English alphabets. In this system, each English alphabet is
represented by binary values that are used as input to a simple many different classification problems. A small step towards
feature extraction system, whose output is fed to our neural this goal is explored in this work by training a neural network
network system. The CNN approach is used to accomplish this task: model to learn which parts of an image are interesting to human
classifying words directly and character segmentation. For the observers that search for a specific object. This knowledge can
former, Convolutional Neural Network (CNN) is used with various then be used to speed up object search in computer vision.
architectures to train a model that can accurately classify words. For
Adopting the principle of convolution to neural networks
the latter, Long Short Term Memory networks are used with
convolution to construct bounding boxes for each character. We then led to convolutional neural networks. The first driving force
pass the segmented characters to a CNN for classification, and then behind handwritten text classification was for digit
reconstruct each word according to the results of classification and classification for postal mail. Jacob Rabinowitz early postal
segmentation. readers incorporated scanning equipment and hardwired logic
to recognize monospaced fonts [1]. Allum et. al improved this by
Keywords : Computer Vision , CNN, Character Recognition, making a sophisticated scanner which allowed for more
Classification, Deep Learning, Neural Networks
variations in how the text was written as well as encoding the
information onto a barcode that was printed directly on the letter
I. INTRODUCTION [2]
.The first prominent piece of OCR software was invented by
Handwritten character recognition is a field of research in Ray Kurzweil in 1974 as the software allowed for recognition
artificial intelligence, computer vision, and pattern recognition. for any font [3]. This software used a more developed use of the
A computer performing handwriting recognition is said to be matrix method (pattern matching). Essentially, this would
able to acquire and detect characters in paper documents, compare bitmaps of the template character with the bitmaps of
pictures, touch-screen devices and other sources and convert the read character and would compare them to determine which
them into machine-encoded form. Its application is found in character it most closely matched with. The downside was this
optical character recognition and more advanced intelligent software was sensitive to variations in sizing and the
character recognition systems. Most of these systems nowadays distinctions between each individual's way of writing
implement machine learning mechanisms such as neural
networks. Machine learning is a branch of artificial intelligence
inspired by psychology and biology that deals with learning II. PROBLEM IDENTIFICATION AND APPROACH
from a set of data and can be applied to solve wide spectrum of
Despite the abundance of technological writing tools, many
problems. A supervised machine learning model is given people still choose to take their notes traditionally: with pen and
instances of data specific to a problem domain and an answer paper. However, there are drawbacks to handwriting text. It’s
that solves the problem for each instance. When learning is difficult to store and access physical documents in an efficient
complete, the model is able not only to provide answers to the manner, search through them efficiently and to share
data it has learned on, but also to yet unseen data with high
precision. Neural networks are learning models used in machine
learning. Their aim is to simulate the learning process that
occurs in an animal or human neural system. Being one of the
www.asianssr.org 123
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11
them with others. Thus, a lot of important knowledge gets lost III. DEEP LEARNING
or does not get reviewed because of the fact that documents Deep Learning is an application of artificial intelligence
never get transferred to digital format. We have thus decided to (AI) that provides systems the ability to automatically learn and
tackle this problem in our project because we believe the improve from experience without being explicitly programmed.
significantly greater ease of management of digital text Deep learning methods aim at learning feature hierarchies with
compared to written text will help people more effectively features from higher levels of the hierarchy formed by the
access, search, share, and analyze their records, while still composition of lower level features. Automatically learning
allowing them to use their preferred writing method. The aim features at multiple levels of abstraction allow a system to learn
of this project is to further explore the task of classifying complex functions mapping the input to the output directly from
handwritten text and to convert handwritten text into the digital data, without depending completely on human-crafted
format. features.Deep learning focuses on the development of computer
Handwritten text is a very general term, and we wanted to programs that can access data and use it to learn for themselves.
narrow down the scope of the project by specifying the meaning The process of learning begins with observations or data, such
of handwritten text for our purposes. In this project, we took on as examples, direct experience, or instruction, in order to look
the challenge of classifying the image of any handwritten word, for patterns in data and make better decisions in the future based
which might be of the form of cursive or block writing. This on the examples that we provide. The primary aim is to allow
project can be combined with algorithms that segment the word the computers to learn automatically without human
images in a given line image, which can in turn be combined intervention or assistance and adjust actions accordingly. The
with algorithms that segment the line images in a given image algorithms are often categorized as supervised or unsupervised.
of a whole handwritten page. With these added layers, our (i) Supervised Learning Algorithm
project can take the form of a deliverable that would be used by
an end user, and would be a fully functional model that would This algorithm can apply what has been learned in the past
help the user solve the problem of converting handwritten to new data using labeled examples to predict future events.
documents into digital format, by prompting the user to take a Starting from the analysis of a known training dataset, the
picture of a page of notes. Note that even though there needs to learning algorithm produces an inferred function to make
be some added layers on top of our model to create a fully predictions about the output values. The system is able to
functional deliverable for an end user, I believe that the most provide targets for any new input after sufficient training. The
interesting and challenging part of this problem is the learning algorithm can also compare its output with the correct,
classification part, which is why we decided to tackle that using intended output and find errors in order to modify the model
the Convolutional Neural Networks. I approach this problem accordingly.
with complete handwritten alphabet images because CNN's (ii) Unsupervised Learning Algorithm
tend to work better on raw input pixels rather than features or In contrast, unsupervised machine learning algorithms are
parts of an image [4]. Given our findings using handwritten used when the information used to train is neither classified nor
alphabets, the program soughts improvement by extracting labeled. Unsupervised learning studies how systems can infer a
characters from the handwritten image and then classifying function to describe a hidden structure from unlabeled data. The
each character independently to reconstruct the digital letter. In system doesn’t figure out the right output, but it explores the
summary, in both of our techniques, our models take in an data and can draw inferences from datasets to describe hidden
image of an alphabet which is handwritten and output the structures from unlabeled data.
alphabet digitally. Semi-supervised machine learning algorithms fall
Two phase processes are involved in the overall processing somewhere in between supervised and unsupervised learning,
of our proposed scheme: the Pre-processing and Neural since they use both labeled and unlabeled data for training –
network based Recognizing tasks. The pre-processing steps typically a small amount of labeled data and a large amount of
handle the manipulations necessary for the preparation of the unlabeled data. The systems that use this method are able to
characters for feeding as input to the neural network system. considerably improve learning accuracy. Usually, semi-
First, the required character or part of characters needs to be supervised learning is chosen when the acquired labeled data
extracted from the pictorial representation. The splitting of requires skilled and relevant resources in order to train it or
alphabets into 25 segment grids, scaling the segments so split learn from it. Otherwise, acquiring unlabeled data generally
to a standard size and thinning the resultant character segments doesn’t require additional resources.
to obtain skeletal patterns. The following pre-processing steps
may also be required to furnish the recognition process:
A.. The alphabets can be thinned and their skeletons
(iii) Reinforcement Learning Algorithm
obtained using well-known image processing techniques,
before extracting their binary forms. It is a learning method that interacts with its environment by
B. The scanned documents can be “cleaned” and producing actions and discovers errors or rewards. Trial and
“smoothed” with the help of image processing techniques for error search and delayed reward are the most relevant
better performance. characteristics of reinforcement learning. This method allows
machines and software agents to automatically determine the
www.asianssr.org 124
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11
www.asianssr.org 125
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11
B. Loss Function
Figure 5.a
A loss function is one of the parameters required to
The first phase forward propagation occurs when the quantify how close a particular neural network is to the ideal
network is exposed to the training data and these cross the entire weight during the training process. The choice of the best
neural network for their predictions (labels) to be calculated. function of loss resides in understanding what type of error is
That is, passing the input data through the network in such a or is not acceptable for the problem in particular.
way that all the neurons apply their transformation to the
information they receive from the neurons of the previous layer C. Optimisers
and sending it to the neurons of the next layer. When the data The optimizer is another of the arguments required in the
has crossed all the layers, and all its neurons have made their compile() method. Keras currently has different optimizers that
calculations, the final layer will be reached with a result of label can be used: SGD, RMSprop, Adagrad, Adadelta, Adam,
prediction for those input examples. The loss function is used Adamax, Nadam. In general, the learning process is seen as a
to estimate the loss (or error) and to compare and measure how global optimization problem where the parameters (weights and
good/bad our prediction result was in relation to the correct biases) must be adjusted in such a way that the loss function
result (remember that we are in a supervised learning presented above is minimized.
environment and we have the label that tells us the expected
value). Ideally, we want our cost to be zero, that is, without D. Model Parameterization
divergence between estimated and expected value. Therefore,
as the model is being trained, the weights of the It is also possible to increase the number of epochs, add more
interconnections of the neurons will gradually be adjusted until neurons in a layer or add more layers. However, in these cases,
good predictions are obtained. Once the loss has been the gains in accuracy have the side effect of increasing the
calculated, this information is propagated backwards. Hence, its execution time of the learning process . We can check with the
name: backpropagation. Starting from the output layer, that loss summary() method that the number of parameters increases (it
information propagates to all the neurons in the hidden layer is fully connected) and the execution time is significantly
that contribute directly to the output. However, the neurons of higher, even reducing the number of epochs. With this model,
the hidden layer only receive a fraction of the total signal of the the accuracy reaches 94%. And if we increase to 20 epochs, a
loss, based on the relative contribution that each neuron has 96% accuracy is achieved.
contributed to the original output. This process is repeated,
layer by layer, until all the neurons in the network have received E. Epochs
a loss signal that describes their relative contribution to the total
loss. Visually, It can be summarized with this visual scheme the As we have already done, epochs tells us the number of
stages: times all the training data have passed through the neural
The various stages are as follows: network in the training process. A good clue is to increase the
A. Back Propagation number of epochs until the accuracy metric with the validation
B. Loss Function data starts to decrease, even when the accuracy of the training
C. Optimiser data continues to increase (this is when we detect a potential
D. Model Parameterisation overfitting).
E. Epochs
F. Batch Size
F. Batch Size
As we have said before, we can partition the training data in
G. Learning Rate mini batches to pass them through the network. In Keras, the
H. Initialization of parameter weights batch_size is the argument that indicates the size of these
I. Neural Network Methodology batches that will be used in the fit() method in an iteration of
the training to update the gradient. The optimal size will depend
www.asianssr.org 126
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11
on many factors, including the memory capacity of the Step 3:Perform non-linear transformation using an activation
computer that we use to do the calculations. function (Sigmoid). Sigmoid will return the output as 1/(1 +
exp(-x)).
G. Learning Rate
hiddenlayer_activations = sigmoid(hidden_layer_input)
The gradient vector has a direction and a magnitude.
Gradient descent algorithms multiply the magnitude of the
gradient by a scalar known as learning rate (also sometimes Step 4: Perform a linear transformation on hidden layer
called step size) to determine the next point. activation (take matrix dot product with weights and add a bias
of the output layer neuron) then apply an activation function
H. Initialisation of parameter weights (again used sigmoid, but you can use any other activation
function depending upon your task) to predict the output
Initialization of the parameters’ weight is not exactly a
hyperparameter, but it is as important as any of them and that is output_layer_input = matrix_dot_product
why we make a brief paragraph in this section. It is advisable to (hiddenlayer_activations * wout ) + bout
initialize the weights with small random values to break the
symmetry between different neurons, if two neurons have
exactly the same weights they will always have the same output = sigmoid(output_layer_input)
gradient; that supposes that both have the same values in the
subsequent iterations, so they will not be able to learn different All above steps are known as “Forward Propagation“
characteristics. Initializing the parameters randomly following
a standard normal distribution is correct, but it can lead to
Step 5: Compare prediction with actual output and calculate the
possible problems of vanishing gradients (when the values of a
gradient of error (Actual – Predicted). Error is the mean square
gradient are too small and the model stops learning or takes too
loss = ((Y-t)^2)/2
long due to that) or exploding gradients (when the algorithm
assigns an exaggeratedly high importance to the weights).
E = y – output
I. Neural Network Methodology
Step 6: Compute the slope/ gradient of hidden and output layer
This is the step by step building methodology of Neural neurons ( To compute the slope, we calculate the derivatives of
Network (MLP with one hidden layer, similar to above-shown non-linear activations x at each layer for each neuron). Gradient
architecture). At the output layer, we have only one neuron as of sigmoid can be returned as x * (1 – x).
we are solving a binary classification problem (predict 0 or 1).
We could also have two neurons for predicting each of both
classes. slope_output_layer = derivatives_sigmoid(output)
First look at the broad steps:
We take input and output slope_hidden_layer =
● X as an input matrix derivatives_sigmoid(hiddenlayer_activations)
● y as an output matrix
Step 1 : We initialize weights and biases with random values Step 7: Compute change factor(delta) at output layer,
(This is one time initiation. In the next iteration, we will use dependent on the gradient of error multiplied by the slope of
updated weights, and biases). Let us define: output layer activation
www.asianssr.org 127
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11
Step 9: Compute change factor(delta) at hidden layer, multiply This makes it an excellent dataset for evaluating models,
the error at hidden layer with slope of hidden layer activation allowing the developer to focus on machine learning with very
little data cleaning or preparation required. Each image is a 28
d_hiddenlayer = Error_at_hidden_layer * slope_hidden_layer by 28 pixel square (784 pixels total). A standard spit of the
dataset is used to evaluate and compare models, where 60,000
images are used to train a model and a separate set of 10,000
Step 10: Update weights at the output and hidden layer: The
images are used to test it. It is a digit recognition task. [13] As
weights in the network can be updated from the errors
calculated for training example(s). such there are 24 Alphabets (A - Z) and (a-z) and 10 digits (0
to 9) or 10 classes to predict. Results are reported using
prediction error, which is nothing more than the inverted
wout = wout + classification accuracy. Excellent results achieve a prediction
matrix_dot_product(hiddenlayer_activations.Transpose, error of less than 1%. State-of-the-art prediction error of
d_output)*learning_rate approximately 0.2% can be achieved with large Convolutional
Neural Networks
wh = wh +
matrix_dot_product(X.Transpose,d_hiddenlayer)*learning_rat A. Evaluation Parameters
e-learning_rate: The amount that weights are updated is The performance of the algorithms is measured as used in
controlled by a configuration parameter called the learning rate) multilayer perceptrons: backpropagation and resilient
propagation. We have considered the scenario of recognition
Step 11: Update biases at the output and hidden layer: The from image, where the dataset consists only of 40 character
biases in the network can be updated from the aggregated errors image bitmaps per character. For this comparison, the datasets
at that neuron. are only comprised of characters of digits, therefore the size of
the dataset contains 400 examples. For relevant values, we have
● bias at output_layer =bias at output_layer + sum of split the dataset into training and validation sets, with the ratio
delta of output_layer at row-wise * learning_rate being 7:3.[14] Also, before using the learning algorithms, the
● bias at hidden_layer =bias at hidden_layer + sum of dataset has been randomly shuffled. The configuration of the
delta of output_layer at row-wise * learning_rate learning model whose results are presented here is:
• The regularization parameter is 0.
• The number of epochs is 100.
bh = bh + sum(d_hiddenlayer, axis=0) * learning_rate • In backpropagation, the learning rate is 0.3.
• In resilient backpropagation, η − , η + ,
bout = bout + sum(d_output, axis=0)*learning_rate
and Δ0 are 0.5, 1.2, and 0.01,
Steps from 5 to 11 are known as “Backward Propagation“ respectively.
• The perceptron architectures are as described in the
One forward and backward propagation iteration is considered plan of solution.
as one training cycle. As I mentioned earlier, When do we trainThe measured error of the backpropagation and CNN algorithms on
second time then update weights and biases are used for forwardthe training and validation sets. This has been tested using fractions
propagation. of the dataset of various sizes and a learning curve has been plotted.
Learning curve represents error as a function of the dataset size and
is a perfect tool to visualize high bias or variance.
Above, we have updated the weight and biases for hidden and
output layer and we have used full batch gradient descent
algorithm.
www.asianssr.org 128
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11
Figure 3.1 O
C. Validation
Table 3.1.
B. Experimental Methodology
As we have more data available in the touch mode than a
pure image bitmap, we have also decided to collect the bitmap
of stroke end points to be able to better distinguish characters
such as '8' and 'B', as mentioned in the overview. The resized
bitmaps of these characters are often similar, but the writing
style of each is usually different. By providing this extra bitmap
with each example, we are giving a hint to the neural network Figure 3.3
classifier about what features to focus on when performing
automatic feature extraction with the hidden layer. The pipeline
for recognition based on an image or a camera frame is
different:
1. Acquire the image bitmap in gray-scale colors.
2. Apply a median filter to the bitmap.
3. Segment the bitmap using thresholding to get a binary
bitmap.
4. Find the bounding boxes of external contours in the
bitmap. Figure 3.4
5. Extract sub-bitmaps from the bounding boxes.
6. Resize the sub-bitmaps to 20x20 pixels.
7. Unroll the sub-bitmap matrices to feature vectors per
400 elements.
8. Feed each feature vector to a trained multilayer
perceptron, giving us predictions.
www.asianssr.org 129
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11
www.asianssr.org 130
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11
based language-based model to add a penalty/benefit score to recognition,” Pattern Recognition, vol. 35, no. 11, pp. 2355–
each of the possible final beam search candidate paths, along 2364, Nov. 2002.
with their combined individual softmax probabilities,
representing the probability of the sequence of
[11]
K. Cheung, D. Yeung and RT. Chin, “A Bayesian
characters/words. If the language model indicates perhaps the framework for deformable pattern recognition with application
most likely candidate word according to the softmax layer and to handwritten character recognition,” IEEE Trans
beam search is very unlikely given the context so far as opposed PatternAnalMach Intell, vol. 20, no. 12, pp. 382– 1388, Dec.
to some other likely candidate words, then the model can 1998.
correct itself accordingly
I.J. Tsang, IR. Tsang and DV Dyck, “Handwritten character
[12]
www.asianssr.org 131