946-Article Text-1837-1-10-20200913 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Asian Journal of Convergence In Technology Volume V Issue III

ISSN No : 2350-1145 I.F-5.11

Handwritten Character Recognition using Convolutional


Neural Networks in Python with Keras
Hanu Priya Indiran, Student Member - IEEE, Bachelors in Electronics and Communication Engineering
Kumaraguru College of Technology, Coimbatore, Tamil Nadu, India.
hanupriya28@gmail.com

Abstract— In the field of Deep Learning for Computer Vision, most powerful learning models, they are useful in automation
scientists have made many enhancements that helped a lot in the of tasks where the decision of a human being takes too long, or
development of millions of smart devices. On the other hand, is imprecise. A neural network can be very fast at delivering
scientists brought a revolutionary change in the field of image
results and may detect connections between seen instances of
processing and one of the biggest challenges in it is to identify
documents in both printed as well as hand-written formats. One of data that humans cannot see. Having acquired the knowledge
the most widely used techniques for the validity of these types of that is explained in this text, the neural network has been
documents is ‘Character Recognition’. This project seeks to classify implemented on a low level without using libraries that already
an individual handwritten word so that handwritten text can be facilitate the process. By doing this, we evaluate the
translated to a digital form. It demonstrates the use of neural performance of neural networks in the given problem and
networks for developing a system that can recognize handwritten
provide source code for the network that can be used to solve
English alphabets. In this system, each English alphabet is
represented by binary values that are used as input to a simple many different classification problems. A small step towards
feature extraction system, whose output is fed to our neural this goal is explored in this work by training a neural network
network system. The CNN approach is used to accomplish this task: model to learn which parts of an image are interesting to human
classifying words directly and character segmentation. For the observers that search for a specific object. This knowledge can
former, Convolutional Neural Network (CNN) is used with various then be used to speed up object search in computer vision.
architectures to train a model that can accurately classify words. For
Adopting the principle of convolution to neural networks
the latter, Long Short Term Memory networks are used with
convolution to construct bounding boxes for each character. We then led to convolutional neural networks. The first driving force
pass the segmented characters to a CNN for classification, and then behind handwritten text classification was for digit
reconstruct each word according to the results of classification and classification for postal mail. Jacob Rabinowitz early postal
segmentation. readers incorporated scanning equipment and hardwired logic
to recognize monospaced fonts [1]. Allum et. al improved this by
Keywords : Computer Vision , CNN, Character Recognition, making a sophisticated scanner which allowed for more
Classification, Deep Learning, Neural Networks
variations in how the text was written as well as encoding the
information onto a barcode that was printed directly on the letter
I. INTRODUCTION [2]
.The first prominent piece of OCR software was invented by
Handwritten character recognition is a field of research in Ray Kurzweil in 1974 as the software allowed for recognition
artificial intelligence, computer vision, and pattern recognition. for any font [3]. This software used a more developed use of the
A computer performing handwriting recognition is said to be matrix method (pattern matching). Essentially, this would
able to acquire and detect characters in paper documents, compare bitmaps of the template character with the bitmaps of
pictures, touch-screen devices and other sources and convert the read character and would compare them to determine which
them into machine-encoded form. Its application is found in character it most closely matched with. The downside was this
optical character recognition and more advanced intelligent software was sensitive to variations in sizing and the
character recognition systems. Most of these systems nowadays distinctions between each individual's way of writing
implement machine learning mechanisms such as neural
networks. Machine learning is a branch of artificial intelligence
inspired by psychology and biology that deals with learning II. PROBLEM IDENTIFICATION AND APPROACH
from a set of data and can be applied to solve wide spectrum of
Despite the abundance of technological writing tools, many
problems. A supervised machine learning model is given people still choose to take their notes traditionally: with pen and
instances of data specific to a problem domain and an answer paper. However, there are drawbacks to handwriting text. It’s
that solves the problem for each instance. When learning is difficult to store and access physical documents in an efficient
complete, the model is able not only to provide answers to the manner, search through them efficiently and to share
data it has learned on, but also to yet unseen data with high
precision. Neural networks are learning models used in machine
learning. Their aim is to simulate the learning process that
occurs in an animal or human neural system. Being one of the

www.asianssr.org 123
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11

them with others. Thus, a lot of important knowledge gets lost III. DEEP LEARNING
or does not get reviewed because of the fact that documents Deep Learning is an application of artificial intelligence
never get transferred to digital format. We have thus decided to (AI) that provides systems the ability to automatically learn and
tackle this problem in our project because we believe the improve from experience without being explicitly programmed.
significantly greater ease of management of digital text Deep learning methods aim at learning feature hierarchies with
compared to written text will help people more effectively features from higher levels of the hierarchy formed by the
access, search, share, and analyze their records, while still composition of lower level features. Automatically learning
allowing them to use their preferred writing method. The aim features at multiple levels of abstraction allow a system to learn
of this project is to further explore the task of classifying complex functions mapping the input to the output directly from
handwritten text and to convert handwritten text into the digital data, without depending completely on human-crafted
format. features.Deep learning focuses on the development of computer
Handwritten text is a very general term, and we wanted to programs that can access data and use it to learn for themselves.
narrow down the scope of the project by specifying the meaning The process of learning begins with observations or data, such
of handwritten text for our purposes. In this project, we took on as examples, direct experience, or instruction, in order to look
the challenge of classifying the image of any handwritten word, for patterns in data and make better decisions in the future based
which might be of the form of cursive or block writing. This on the examples that we provide. The primary aim is to allow
project can be combined with algorithms that segment the word the computers to learn automatically without human
images in a given line image, which can in turn be combined intervention or assistance and adjust actions accordingly. The
with algorithms that segment the line images in a given image algorithms are often categorized as supervised or unsupervised.
of a whole handwritten page. With these added layers, our (i) Supervised Learning Algorithm
project can take the form of a deliverable that would be used by
an end user, and would be a fully functional model that would This algorithm can apply what has been learned in the past
help the user solve the problem of converting handwritten to new data using labeled examples to predict future events.
documents into digital format, by prompting the user to take a Starting from the analysis of a known training dataset, the
picture of a page of notes. Note that even though there needs to learning algorithm produces an inferred function to make
be some added layers on top of our model to create a fully predictions about the output values. The system is able to
functional deliverable for an end user, I believe that the most provide targets for any new input after sufficient training. The
interesting and challenging part of this problem is the learning algorithm can also compare its output with the correct,
classification part, which is why we decided to tackle that using intended output and find errors in order to modify the model
the Convolutional Neural Networks. I approach this problem accordingly.
with complete handwritten alphabet images because CNN's (ii) Unsupervised Learning Algorithm
tend to work better on raw input pixels rather than features or In contrast, unsupervised machine learning algorithms are
parts of an image [4]. Given our findings using handwritten used when the information used to train is neither classified nor
alphabets, the program soughts improvement by extracting labeled. Unsupervised learning studies how systems can infer a
characters from the handwritten image and then classifying function to describe a hidden structure from unlabeled data. The
each character independently to reconstruct the digital letter. In system doesn’t figure out the right output, but it explores the
summary, in both of our techniques, our models take in an data and can draw inferences from datasets to describe hidden
image of an alphabet which is handwritten and output the structures from unlabeled data.
alphabet digitally. Semi-supervised machine learning algorithms fall
Two phase processes are involved in the overall processing somewhere in between supervised and unsupervised learning,
of our proposed scheme: the Pre-processing and Neural since they use both labeled and unlabeled data for training –
network based Recognizing tasks. The pre-processing steps typically a small amount of labeled data and a large amount of
handle the manipulations necessary for the preparation of the unlabeled data. The systems that use this method are able to
characters for feeding as input to the neural network system. considerably improve learning accuracy. Usually, semi-
First, the required character or part of characters needs to be supervised learning is chosen when the acquired labeled data
extracted from the pictorial representation. The splitting of requires skilled and relevant resources in order to train it or
alphabets into 25 segment grids, scaling the segments so split learn from it. Otherwise, acquiring unlabeled data generally
to a standard size and thinning the resultant character segments doesn’t require additional resources.
to obtain skeletal patterns. The following pre-processing steps
may also be required to furnish the recognition process:
A.. The alphabets can be thinned and their skeletons
(iii) Reinforcement Learning Algorithm
obtained using well-known image processing techniques,
before extracting their binary forms. It is a learning method that interacts with its environment by
B. The scanned documents can be “cleaned” and producing actions and discovers errors or rewards. Trial and
“smoothed” with the help of image processing techniques for error search and delayed reward are the most relevant
better performance. characteristics of reinforcement learning. This method allows
machines and software agents to automatically determine the

www.asianssr.org 124
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11

ideal behavior within a specific context in order to maximize its


performance. Simple reward feedback is required for the agent
to learn which action is best; this is known as the reinforcement
signal.
Machine learning enables analysis of massive quantities of
data. While it generally delivers faster, more accurate results in
order to identify profitable opportunities or dangerous risks, it
may also require additional time and resources to train it
properly. Combining machine learning with AI and cognitive
technologies can make it even more effective in processing
large volumes of information. Figure 4.a

IV. NEURAL NETWORK Neural networks take a different approach to problem


solving than that of conventional computers. Conventional
The simplest definition of a neural network, more properly
computers use an algorithmic approach i.e. the computer
referred to as an 'artificial' neural network (ANN), is provided
follows a set of instructions in order to solve a problem.
by the inventor of one of the first neurocomputers, Dr. Robert
Unless the specific steps that the computer needs to follow
Hecht-Nielsen. He defines a neural network as"...a computing
are known the computer cannot solve the problem. That
system made up of a number of simple, highly interconnected
restricts the problem solving capability of conventional
processing elements, which process information by their
computers to problems that we already understand and know
dynamic state response to external inputs. In "Neural Network
how to solve. But computers would be so much more useful if
Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989.
they could do things that we don't exactly know how to do.
ANNs are processing devices (algorithms or actual hardware)
Neural networks process information in a similar way the
that are loosely modeled after the neuronal structure of the
human brain does. The network is composed of a large number
mammalian cerebral cortex but on much smaller scales. A large
of highly interconnected processing elements(neurones)
ANN might have hundreds or thousands of processor units,
working in parallel to solve a specific problem. Neural
whereas a mammalian brain has billions of neurons with a
networks learn by example. They cannot be programmed to
corresponding increase in magnitude of their overall interaction
perform a specific task. The examples must be selected
and emergent behavior. Although ANN researchers are
carefully otherwise useful time is wasted or even worse the
generally not concerned with whether their networks accurately
network might be functioning incorrectly. The disadvantage is
resemble biological systems, some have. For example,
that because the network finds out how to solve the problem by
researchers have accurately simulated the function of the retina
itself, its operation can be unpredictable.
and modeled the eye rather well. Although the mathematics
Neural networks and conventional algorithmic computers
involved with neural networking is not a trivial matter, a user
are not in competition but complement each other. These tasks
can rather easily gain at least an operational understanding of
are more suited to an algorithmic approach like arithmetic
their structure and function. Neural networks are typically
operations and tasks that are more suited to neural networks.
organized in layers. Layers are made up of a number of
Even more, a large number of tasks require systems that use a
interconnected 'nodes' which contain an 'activation function'.
combination of the two approaches (normally a conventional
Patterns are presented to the network via the 'input layer', which
computer is used to supervise the neural network) in order to
communicates to one or more 'hidden layers' where the actual
perform at maximum efficiency. So this project involves the
processing is done via a system of weighted 'connections'.
Convolutional Neural Networks to analyse the problem and
Most ANNs contain some form of 'learning rule' which
provide the appropriate solution.
modifies the weights of the connections according to the input
patterns that it is presented with. In a sense, ANNs learn by
V. MODEL APPROACH
example as do their biological counterparts; a child learns to
recognize dogs from examples of dogs. Although there are A neural network is made up of neurons connected to each
many different kinds of learning rules used by neural networks, other; at the same time, each connection of our neural network
this demonstration is concerned only with one; the delta rule. is associated with a weight that dictates the importance of this
The delta rule is often utilized by the most common class of relationship in the neuron when multiplied by the input value.
ANNs called 'backpropagation neural networks' (BPNNs). Each neuron has an activation function that defines the output
Backpropagation is an abbreviation for the backwards of the neuron. The activation function is used to introduce non-
propagation of error. With the delta rule, as with other types of linearity in the modeling capabilities of the network. We have
backpropagation, 'learning' is a supervised process that occurs several options for activation functions that we will present in
with each cycle or 'epoch' (i.e. each time the network is this post. Training our neural network, that is, learning the
presented with a new input pattern) through a forward values of our parameters (weights wij and bj biases) is the most
activation flow of outputs, and the backwards error propagation genuine part of Deep Learning and we can see this learning
of weight adjustments. The process flow of a neural network is process in a neural network as an iterative process of “going and
as per the figure below. returning” by the layers of neurons. The “going” is a forward

www.asianssr.org 125
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11

propagation of the information and the “return” is a A. Back Propagation


backpropagation of the information. The figure below Backpropagation is a method to alter the parameters (weights
illustrates the process. and biases) of the neural network in the right direction. It starts
by calculating the loss term first, and then the parameters of the
neural network are adjusted in reverse order with an
optimization algorithm taking into account this calculated
loss.Three arguments are passed to the method: an optimizer, a
loss function, and a list of metrics. In classification problems
like our example, accuracy is used as a metric. Let’s go a little
deeper into these arguments.

B. Loss Function
Figure 5.a
A loss function is one of the parameters required to
The first phase forward propagation occurs when the quantify how close a particular neural network is to the ideal
network is exposed to the training data and these cross the entire weight during the training process. The choice of the best
neural network for their predictions (labels) to be calculated. function of loss resides in understanding what type of error is
That is, passing the input data through the network in such a or is not acceptable for the problem in particular.
way that all the neurons apply their transformation to the
information they receive from the neurons of the previous layer C. Optimisers
and sending it to the neurons of the next layer. When the data The optimizer is another of the arguments required in the
has crossed all the layers, and all its neurons have made their compile() method. Keras currently has different optimizers that
calculations, the final layer will be reached with a result of label can be used: SGD, RMSprop, Adagrad, Adadelta, Adam,
prediction for those input examples. The loss function is used Adamax, Nadam. In general, the learning process is seen as a
to estimate the loss (or error) and to compare and measure how global optimization problem where the parameters (weights and
good/bad our prediction result was in relation to the correct biases) must be adjusted in such a way that the loss function
result (remember that we are in a supervised learning presented above is minimized.
environment and we have the label that tells us the expected
value). Ideally, we want our cost to be zero, that is, without D. Model Parameterization
divergence between estimated and expected value. Therefore,
as the model is being trained, the weights of the It is also possible to increase the number of epochs, add more
interconnections of the neurons will gradually be adjusted until neurons in a layer or add more layers. However, in these cases,
good predictions are obtained. Once the loss has been the gains in accuracy have the side effect of increasing the
calculated, this information is propagated backwards. Hence, its execution time of the learning process . We can check with the
name: backpropagation. Starting from the output layer, that loss summary() method that the number of parameters increases (it
information propagates to all the neurons in the hidden layer is fully connected) and the execution time is significantly
that contribute directly to the output. However, the neurons of higher, even reducing the number of epochs. With this model,
the hidden layer only receive a fraction of the total signal of the the accuracy reaches 94%. And if we increase to 20 epochs, a
loss, based on the relative contribution that each neuron has 96% accuracy is achieved.
contributed to the original output. This process is repeated,
layer by layer, until all the neurons in the network have received E. Epochs
a loss signal that describes their relative contribution to the total
loss. Visually, It can be summarized with this visual scheme the As we have already done, epochs tells us the number of
stages: times all the training data have passed through the neural
The various stages are as follows: network in the training process. A good clue is to increase the
A. Back Propagation number of epochs until the accuracy metric with the validation
B. Loss Function data starts to decrease, even when the accuracy of the training
C. Optimiser data continues to increase (this is when we detect a potential
D. Model Parameterisation overfitting).
E. Epochs
F. Batch Size
F. Batch Size
As we have said before, we can partition the training data in
G. Learning Rate mini batches to pass them through the network. In Keras, the
H. Initialization of parameter weights batch_size is the argument that indicates the size of these
I. Neural Network Methodology batches that will be used in the fit() method in an iteration of
the training to update the gradient. The optimal size will depend

www.asianssr.org 126
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11

on many factors, including the memory capacity of the Step 3:Perform non-linear transformation using an activation
computer that we use to do the calculations. function (Sigmoid). Sigmoid will return the output as 1/(1 +
exp(-x)).
G. Learning Rate
hiddenlayer_activations = sigmoid(hidden_layer_input)
The gradient vector has a direction and a magnitude.
Gradient descent algorithms multiply the magnitude of the
gradient by a scalar known as learning rate (also sometimes Step 4: Perform a linear transformation on hidden layer
called step size) to determine the next point. activation (take matrix dot product with weights and add a bias
of the output layer neuron) then apply an activation function
H. Initialisation of parameter weights (again used sigmoid, but you can use any other activation
function depending upon your task) to predict the output
Initialization of the parameters’ weight is not exactly a
hyperparameter, but it is as important as any of them and that is output_layer_input = matrix_dot_product
why we make a brief paragraph in this section. It is advisable to (hiddenlayer_activations * wout ) + bout
initialize the weights with small random values to break the
symmetry between different neurons, if two neurons have
exactly the same weights they will always have the same output = sigmoid(output_layer_input)
gradient; that supposes that both have the same values in the
subsequent iterations, so they will not be able to learn different All above steps are known as “Forward Propagation“
characteristics. Initializing the parameters randomly following
a standard normal distribution is correct, but it can lead to
Step 5: Compare prediction with actual output and calculate the
possible problems of vanishing gradients (when the values of a
gradient of error (Actual – Predicted). Error is the mean square
gradient are too small and the model stops learning or takes too
loss = ((Y-t)^2)/2
long due to that) or exploding gradients (when the algorithm
assigns an exaggeratedly high importance to the weights).
E = y – output
I. Neural Network Methodology
Step 6: Compute the slope/ gradient of hidden and output layer
This is the step by step building methodology of Neural neurons ( To compute the slope, we calculate the derivatives of
Network (MLP with one hidden layer, similar to above-shown non-linear activations x at each layer for each neuron). Gradient
architecture). At the output layer, we have only one neuron as of sigmoid can be returned as x * (1 – x).
we are solving a binary classification problem (predict 0 or 1).
We could also have two neurons for predicting each of both
classes. slope_output_layer = derivatives_sigmoid(output)
First look at the broad steps:
We take input and output slope_hidden_layer =
● X as an input matrix derivatives_sigmoid(hiddenlayer_activations)
● y as an output matrix
Step 1 : We initialize weights and biases with random values Step 7: Compute change factor(delta) at output layer,
(This is one time initiation. In the next iteration, we will use dependent on the gradient of error multiplied by the slope of
updated weights, and biases). Let us define: output layer activation

● wh as weight matrix to the hidden layer d_output = E * slope_output_layer


● bh as bias matrix to the hidden layer
● wout as weight matrix to the output layer Step 8: At this step, the error will propagate back into the
● bout as bias matrix to the output layer network which means error at hidden layer. For this, we will
take the dot product of output layer delta with weight
Step 2: We take matrix dot product of input and weights parameters of edges between the hidden and output layer
assigned to edges between the input and hidden layer then add (wout.T).
biases of the hidden layer neurons to respective inputs, this is
known as linear transformation: Error_at_hidden_layer = matrix_dot_product(d_output,
wout.Transpose)
hidden_layer_input= matrix_dot_product(X,wh) + bh

www.asianssr.org 127
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11

Step 9: Compute change factor(delta) at hidden layer, multiply This makes it an excellent dataset for evaluating models,
the error at hidden layer with slope of hidden layer activation allowing the developer to focus on machine learning with very
little data cleaning or preparation required. Each image is a 28
d_hiddenlayer = Error_at_hidden_layer * slope_hidden_layer by 28 pixel square (784 pixels total). A standard spit of the
dataset is used to evaluate and compare models, where 60,000
images are used to train a model and a separate set of 10,000
Step 10: Update weights at the output and hidden layer: The
images are used to test it. It is a digit recognition task. [13] As
weights in the network can be updated from the errors
calculated for training example(s). such there are 24 Alphabets (A - Z) and (a-z) and 10 digits (0
to 9) or 10 classes to predict. Results are reported using
prediction error, which is nothing more than the inverted
wout = wout + classification accuracy. Excellent results achieve a prediction
matrix_dot_product(hiddenlayer_activations.Transpose, error of less than 1%. State-of-the-art prediction error of
d_output)*learning_rate approximately 0.2% can be achieved with large Convolutional
Neural Networks
wh = wh +
matrix_dot_product(X.Transpose,d_hiddenlayer)*learning_rat A. Evaluation Parameters
e-learning_rate: The amount that weights are updated is The performance of the algorithms is measured as used in
controlled by a configuration parameter called the learning rate) multilayer perceptrons: backpropagation and resilient
propagation. We have considered the scenario of recognition
Step 11: Update biases at the output and hidden layer: The from image, where the dataset consists only of 40 character
biases in the network can be updated from the aggregated errors image bitmaps per character. For this comparison, the datasets
at that neuron. are only comprised of characters of digits, therefore the size of
the dataset contains 400 examples. For relevant values, we have
● bias at output_layer =bias at output_layer + sum of split the dataset into training and validation sets, with the ratio
delta of output_layer at row-wise * learning_rate being 7:3.[14] Also, before using the learning algorithms, the
● bias at hidden_layer =bias at hidden_layer + sum of dataset has been randomly shuffled. The configuration of the
delta of output_layer at row-wise * learning_rate learning model whose results are presented here is:
• The regularization parameter is 0.
• The number of epochs is 100.
bh = bh + sum(d_hiddenlayer, axis=0) * learning_rate • In backpropagation, the learning rate is 0.3.
• In resilient backpropagation, η − , η + ,
bout = bout + sum(d_output, axis=0)*learning_rate
and Δ0 are 0.5, 1.2, and 0.01,
Steps from 5 to 11 are known as “Backward Propagation“ respectively.
• The perceptron architectures are as described in the
One forward and backward propagation iteration is considered plan of solution.
as one training cycle. As I mentioned earlier, When do we trainThe measured error of the backpropagation and CNN algorithms on
second time then update weights and biases are used for forwardthe training and validation sets. This has been tested using fractions
propagation. of the dataset of various sizes and a learning curve has been plotted.
Learning curve represents error as a function of the dataset size and
is a perfect tool to visualize high bias or variance.
Above, we have updated the weight and biases for hidden and
output layer and we have used full batch gradient descent
algorithm.

VI.. RESULTS AND INFERENCES


The dataset was constructed from a number of scanned
document dataset available from the National Institute of
Standards and Technology (NIST). This is where the name for
the dataset comes from, as the Modified NIST or MNIST
dataset. Images of digits and alphabets were taken from a
variety of scanned documents, normalized in size and centered.

www.asianssr.org 128
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11

Figure 3.1 O
C. Validation

The proposed alphabet recognition system was trained to


recognize handwritten English alphabets. Since the alphabets
are divided into 25 segments, neural network architecture is
designed specially for the processing of 25 input bits. The
network parameters used for training are: Learning rate
coefficient = 0.05 No. of Units in Input layer = 25 No. of Hidden
Layers = 2 No. of Units in Hidden layer= 25 Initial Weights =
Random [0,1] Transfer Function Used for Hidden Layer 1 =
“Logsig” Transfer Function Used for Hidden Layer 2 =
“Tansig” The training set involves the binary codes of
alphabets.[15] It was not practical to input these shapes
individually when creating training sets, because the shape of a
particular segment of the actual character depends on
Figure 3.2 handwriting. Therefore, this was automated so that the entire
In the learning curves, no significant overfitting or underfitting letter is input to the system, and then the shape of the segment
is apparent. We can see that the RPROP algorithm manages to needed is extracted from this full letter instead of drawing the
converge to a better minimum given 100 epochs than shape of the segment itself. The figure below shows the results
backpropagation. This is caused by the advantages of the for the training dataset (Figure 3.3 and 3.4) and the test
RPROP algorithm to pure backpropagation that we explained dataset(Figure 3.5).
earlier in this work. Table 1 confirms these findings.

Table 3.1.

B. Experimental Methodology
As we have more data available in the touch mode than a
pure image bitmap, we have also decided to collect the bitmap
of stroke end points to be able to better distinguish characters
such as '8' and 'B', as mentioned in the overview. The resized
bitmaps of these characters are often similar, but the writing
style of each is usually different. By providing this extra bitmap
with each example, we are giving a hint to the neural network Figure 3.3
classifier about what features to focus on when performing
automatic feature extraction with the hidden layer. The pipeline
for recognition based on an image or a camera frame is
different:
1. Acquire the image bitmap in gray-scale colors.
2. Apply a median filter to the bitmap.
3. Segment the bitmap using thresholding to get a binary
bitmap.
4. Find the bounding boxes of external contours in the
bitmap. Figure 3.4
5. Extract sub-bitmaps from the bounding boxes.
6. Resize the sub-bitmaps to 20x20 pixels.
7. Unroll the sub-bitmap matrices to feature vectors per
400 elements.
8. Feed each feature vector to a trained multilayer
perceptron, giving us predictions.

www.asianssr.org 129
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11

approaches that are nowadays used in similar applications.


After that, we delved into the inner workings of a multilayer
perceptron, focusing on backpropagation and resilient back,
which has been implemented. Even though our dataset consists
of the images of every word separately, some words within
these images were slightly tilted. This was because the
participants of the dataset were asked to write on blank paper
with no lines, and some of the words were written in a more
Figure 3.5 tilted fashion. This occasion happens very frequently in real life
The binary input of each alphabet of different handwriting whether or not the page has lines, thus we decided to make our
styles is then feed to all the units of input layer of the network training data more robust to this issue by rotating an image
at once and the network after processing through hidden layers towards the right by a very small angle with random probability
and with the training algorithm, Gradient Descent Back and adding that image to our training set. The technique we
propagation, can be trained for these inputs. Now, the trained have found to be useful during experimentation was keeping the
network is capable to recognize the alphabet of different number of epochs low when trying out different
handwriting. The results are tabulated below. hyperparameters. This approach certainly saved us a lot of time
in training; however, it also had its own disadvantages. This
data augmentation technique helped us make our model more
robust to some minor yet so frequent details that might come up
in our test set. Before training the models with the dataset, we
have applied various preprocessing and data augmentation
techniques on our dataset in order to make our data more
compatible with the models and to make our dataset more
robust to real life situations. With the knowledge we had
described, we specified the requirements of the project and
planned the solution. Several improvements for the application
or the learning model used within can be suggested. For
example, the feature extraction performed by the neural
network could be constrained to operate on more strictly
preprocessed data. Also, several classifiers learning on different
features could be combined to make the system more robust.
The proposed and developed a scheme for recognizing
handwritten English alphabets and numbers. It has been tested
on experiment over all English alphabets and numerical digits
with several Handwriting styles. Experimental results shown
that the machine has successfully recognized the alphabets and
numbers with the average accuracy of 82.5%, which significant
and may be acceptable in some applications. The machine
found less accurate to classify similar alphabets and in future
this misclassification of the similar patterns may improve and
further a similar experiment can be tested over a large data set
and with some other optimized networks parameters to improve
the accuracy of the machine.
Table 3.2 Firstly, to have more compelling and robust training, we
We can observe from the table that recognition accuracy is could apply additional preprocessing techniques such as
lower for the similar input patterns like: c & e; i, j, l & r; and u jittering. We could also divide each pixel by its corresponding
& v. for these similar patterns in different handwriting, even standard deviation to normalize the data. Next, given time and
human eye can not be able easily distinguish, hence the machine budget constraints, we were limited to 20 training examples for
needs lots of training epochs to recognize them but with some each given word in order to efficiently evaluate and revise our
misclassifications. model. Another method of improving our character
segmentation model would be to move beyond a greedy search
VII. CONCLUSION AND FUTURE POSSIBILITIES for the most likely solution. We would approach this by
This work has mostly been focused on the machine learning considering a more exhaustive but still efficient decoding
methods used in the project. At first, we reviewed the algorithm such as beam search. We can use a character/word-

www.asianssr.org 130
Asian Journal of Convergence In Technology Volume V Issue III
ISSN No : 2350-1145 I.F-5.11

based language-based model to add a penalty/benefit score to recognition,” Pattern Recognition, vol. 35, no. 11, pp. 2355–
each of the possible final beam search candidate paths, along 2364, Nov. 2002.
with their combined individual softmax probabilities,
representing the probability of the sequence of
[11]
K. Cheung, D. Yeung and RT. Chin, “A Bayesian
characters/words. If the language model indicates perhaps the framework for deformable pattern recognition with application
most likely candidate word according to the softmax layer and to handwritten character recognition,” IEEE Trans
beam search is very unlikely given the context so far as opposed PatternAnalMach Intell, vol. 20, no. 12, pp. 382– 1388, Dec.
to some other likely candidate words, then the model can 1998.
correct itself accordingly
I.J. Tsang, IR. Tsang and DV Dyck, “Handwritten character
[12]

recognition based on moment features derived from image


partition,” in Int. Conf. image processing 1998, vol. 2, pp 939–
REFERENCES
942.
[1]
H. Al-Yousefi and S. S. Udpa, "Recognition of handwritten [13]
H. Soltanzadeh and M. Rahmati, “Recognition of Persian
Arabic characters," in Proc. SPIE 32nd Ann. Int. Tech. Symp. handwritten digits using image profiles of multiple
Opt. Optoelectric Applied Sci. Eng. (San Diego, CA), Aug. orientations,” Pattern Recognition Lett, vol. 25, no. 14, pp.
1988. 1569–1576, Oct.2004.
[2]
K. Badi and M. Shimura, "Machine recognition of Arabic [14]
FN. Said, RA. Yacoub and CY Suen, “Recognition of
cursive script" Trans. Inst. Electron. Commun. Eng., Vol. E65, English and Arabic numerals using a dynamic number of
no. 2, pp. 107-114, Feb. 1982. hidden neurons” in Proc. 5th Int Conf. document analysis and
[3]
recognition, 1999, pp 237–240
M Altuwaijri , M.A Bayoumi , "Arabic Text Recognition
Using Neural Network" ISCAS 94. IEEE International [15]
J. Sadri, CY. Suen, and TD. Bui, “Application of support
Symposium on Circuits and systems, Volume 6, 30 May-2 June vector machines for recognition of handwritten Arabic/Persian
1994. digits,” in Proc. 2th Iranian Conf. machine vision and image
processing, 2003, vol. 1,pp 300–307.
[4]
C. Bahlmann, B. Haasdonk, H. Burkhardt., “Online
Handwriting Recognition with Support Vector Machine – A
Kernel Approach”, In proceeding of the 8th Int. Workshop in
Handwriting Recognition (IWHFR), pp 49- 54, 2002
[5]
Homayoon S.M. Beigi, "An Overview of Handwriting
Recognition," Proceedings of the 1st Annual Conference on
Technological Advancements in Developing Countries,
Columbia University, New York, July 24-25, 1993, pp. 30- 46.
[6]
Nadal, C. Legault, R. Suen and C.Y, “Complementary
Algorithms for Recognition of totally Unconstrained
Handwritten Numerals,” in Proc. 10th Int. Conf. Pattern
Recognition, 1990, vol. 1, pp. 434-449.

S. Impedovo, P. Wang, and H. Bunke, editors, “Automatic


[7]

Bankcheck Processing,” World Scientific, Singapore, 1997.


[8]
CL Liu, K Nakashima, H Sako and H. Fujisawa,
“Benchmarking of state-of- the-art techniques,” Pattern
Recognition, vol. 36, no 10, pp. 2271– 2285, Oct. 2003.
[9]
M. Shi, Y. Fujisawa, T. Wakabayashi and F. Kimura,
“Handwritten numeral recognition using gradient and curvature
of gray scale image,” Pattern Recognition, vol. 35, no. 10, pp.
2051–2059, Oct 2002.
[10]
LN. Teow and KF. Loe, “Robust vision-based features and
classification schemes for off-line handwritten digit

www.asianssr.org 131

You might also like