Optical Character Recognition Using Artificial Neural Network

International Journal of Engineering Research and General Science Volume 3, Issue 1, January-February, 2015
ISSN 2091-2730
Optical Character Recognition Using Artificial Neural Network

Dr.Mrs.V.V.Patil 1, Rajharsh Vishnu Sanap 2 , Rohini Babanrao Kharate 3 .
1, 2
Dept Of Electronics Engg , Dr J.J.Magdum College Of Engg , Jaysingpur , India.
3 Dept
Of Electronics & Telecom. Engg ,TPCTs College Of Engg , Osmanabad , India.
vvpatil2429@gmail.com , rvssanap@gmail.com , kharaterohini@gmail.com

Abstract -This paper examines the use of neural networks to accomplish optical character recognition.Recognition of Handwritten
text has been one of the active and challenging areas of research in the field of image processing and pattern recognition [4].The
whole process of recognition includes two phases segmentation of characters intoline, word and characters and then recognition
through feedforward neural network. Basically an offline handwritten alphabetical character recognition system using multilayer feed
forward neural network has been described in our work. a method of feature extraction is introduced for extracting the features of the
handwritten alphabets and then we use the data to train the artificial neural network.It contributes immensely to the advancement of an
automation process and can improve the interface between man and machine in numerous applications [7].
Keywords - Character Recognition, Training, Feature Extraction, Image Processing, ANN,OCR, classification.
INTRODUCTION
Character recognition, usually abbreviated to optical character recognition or shortened OCR, is the mechanical or electronic
translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine editable text [1]. It is a
field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the
focus on character recognition has shifted to implementation of proven techniques. For many document-input tasks, character
recognition is the most cost-effective and speedy method available. And each year, the technology frees acres of storage space once
given over to file cabinets and boxes full of paper documents [10].
The goal of Optical Character Recognition (OCR) is to classify optical patterns (often contained
in a digital image) corresponding to alphanumeric or other characters. The process of OCR involves several steps including
segmentation, feature extraction, and classification [12].Theneural network technology can be usedto analyze the stroke edge, the line
of discontinuity between the text characters, and the background [3]. Allowing for irregularities of printed ink on paper, each
algorithm averages the light and dark along the side of a stroke, matches it to known characters and makes a best guess as to which
character it is. The OCR software then averages or polls the results from all the algorithms to obtain a single reading [2]. Neural
networks can be used, if we have a suitable dataset for training and learning purposes. Datasets are one of the most important things
when constructing new neural network.
METHODOLOGY
To solve the defined handwritten character recognition problem of classification we used MATLAB computation software with
Neural Network Toolbox and Image Processing Toolbox add-on.In Classification Processthere are two steps in building a
classifiertraining and testing. These steps can be broken down further into substeps.
TRAINING
a. Pre-processing Processes the data so it is in a suitable form .
b. Feature extraction Reduce the amount of data by extracting relevant
informationUsually results in a vector of scalar values. (We also need to
normalize the features for distance measurements)
c. Model Estimation from the finite set of feature vectors, need to estimate a model
(usually statistical) for each class of the training data.
TESTING
a. Pre-processing
73
www.ijergs.org
ISSN 2091-2730
b. Feature extraction (both same as above)

c. Classification Compare feature vectors to the various models and find the
closest match. One can use a distance measure.
Fig 1. Training And Testing Of Data
AUTOMATIC IMAGE PREPROCESSING

The image is first being converted to grayscale image follow by the threshing technique, which make the image become binary image.
The binary image is then sent through connectivity test in order to check for the maximum connected component, which is, the box of
the form [6]. After locating the box, the individual characters are then cropped into different sub images that are the raw data for the
following feature extraction routine. Binarization is Usually presented with a grayscale image, binarization is then simply amatter of
choosing a threshold value.Morphological OperatorsRemove isolated specks and holes in characters, can use
theMajorityoperator.Segmentation is by far the most important aspect of the pre-processing stage. It allows the recognizer to extract
features from each individual character [12]. In the more complicated case of handwritten text, the segmentation problem becomes
much more difficult as letters tend to be connected to each other.It Checks the connectivity of shapes, label, and isolate.
FEATURE EXTRACTION
The sub-images have to be cropped sharp to the border of the character in order to standardize the sub-images. The image
standardization is done by finding the maximum row and column with 1s and with the peak point, increase and decrease the counter
until meeting the white space, or the line with all 0s. This technique is shown in figure below where a character S is being cropped
and resized.
Fig 2. Cropped and resized picture

The image pre-processing is then followed by the image resize again to meet the network input requirement, 5 by 7 matrices, where
the value of 1 will be assign to all pixel where all 10 by 10 box are filled with 1s, as shown below:
74
www.ijergs.org
ISSN 2091-2730
Fig 3. Image resize again to meet the network input requirement

Finally, the 5 by 7 matrices is concatenated into a stream so that it can be feed into network 35 input neurons. The input of the network
is actually the negative image of the figure, where the input range is 0 to 1, with 0 equal to black and 1 indicate white, while the value
in between show the intensity of the relevant pixel [15]. By this, we are able to extract the character and pass to another stage for
future "classification" or "training" purpose of the neural network character.
DESIGN AND IMPLEMENTATION

Initially we are making the Algorithm of Character Extraction. We are using MATLAB as tool for implementing the
algorithm. Then we design neural network, we need to have a Neural Network that would give the optimum results . There is no
specific way of finding the correct model of Neural Network. It could only be found by trial and error method [11]. Take different
models of Neural Network, train it and note the output accuracy. There are basically two main phases in our Paper: Pre-processing and
Character Recognition. In first phase we have are preprocessing the given scanned document for separating the Characters from it and
normalizing each characters. Initially we specify an input image file, which is opened for reading and preprocessing. The image would
be in RGB format (usually) so we convert it into binary format [8]. To do this, it converts the input image to grayscale format (if it is
not already an intensity image), and then uses threshold to convert this grayscale image to binary i.e.all the pixels above certain
threshold as 1 and below it as 0. we needed a method to extract a given character from the document.
The character recognition application can be used in two different ways. First way is to type every command inside the
MATLAB console and workspace on hand. The second way is to use already pre-prepared Graphical User Interface [10]. The GUI
consists of two files. First file include all necessary programming code, and the second file include visible interface shapes and forms.
The interface works like the workflow of recognition process. First we load the image, than we select the character and after that we
click crop, pre-process, feature extraction and finally recognize [7]. On every stage, GUI shows us a new image, which is unique for
the each step. The images can be viewed in the Main window, RGB, Binary, Crop to Edges and Features window.
CONCLUSION
This paper carries out a study handwritten character recognition using Artificial Neural Network.Artificial neural networks are
commonly used to perform character recognition due to their high noise tolerance. The systems have the ability to yield excellent
results. The feature extraction step of optical character recognition is the most important. A poorly chosen set of features will yield
poor classification rates by any neural network. At the current stage of development, the software does perform well either in terms of
speed or accuracy but not better. It is unlikely to replace existing OCR methods, especially for English text. A simplistic approach for
recognition of Optical characters using artificial neural networks has been described.
REFERENCES:
[1] S. Mori, C. Y. Suen and K. Kamamoto, Historical review of OCR research and development, Proc. Of IEEE, vol. 80, (1992)
July, pp. 1029-1058.
[2] N. Arica and F. Yarman-Vural, An Overview of Character Recognition Focused on Off-line Handwriting, IEEE Transactions on
Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 31, no. 2, (2001), pp. 216 - 233.
[3] A. Rajavelu, M. T. Musavi and M. V. Shirvaikar, A Neural Network Approach to Character Recognition, Neural Networks, vol.
2, (1989), pp. 387-393.
[4] R. Plamondon and S. N. Srihari, On-line and offline handwritten character recognition: A comprehensive survey, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, (2000), pp. 63-84.
75
www.ijergs.org
ISSN 2091-2730
[5] U. Bhattacharya and B. B. Chaudhuri, Handwritten numeral databases of Indian scripts and multistage recognition of mixed
numerals, IEEE Transaction on Pattern analysis and machine intelligence, vol. 31, no. 3, (2009), pp. 444-457.
[6] M. Hanmandlu, K. R. M. Mohan and H. Kumar, Neural-based Handwritten character recognition, in Proceedings of Fifth IEEE
International Conference on Document Analysis and Recognition, ICDAR99, Bangalore, India, (1999), pp. 241-244.
[7] T. V. Ashwin and P. S. Sastry, A font and size-independent OCR system for printed Kannada documents using support vector
machines, in Sadhana, vol. 27, Part 1, (2002) February, pp. 3558.
[8] S. V. Rajashekararadhya and P. Vanajaranjan, Efficient zone based feature extraction algorithm for handwritten numeral
recognition of four popular south-Indian scripts, Journal of Theoretical and Applied Information Technology, JATIT, vol. 4, no. 12,
(2008), pp. 1171-1181.
[9] R. G. Casey and E. Lecolinet, A Survey of Methods and Strategies in Character Segmentation, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 18, no. 7, (1996) July, pp. 690-706.
[10] R. C. Gonzalez, R. E. woods and S. L. Eddins, Digital Image Processing using MATLAB, Pearson Education, Dorling
Kindersley, South Asia, (2004).
[11] T. V. Ashwin and P. S. Sastry, A font and size-independent OCR system for printed Kannada documents using support
vectormachines, in Sadhana, vol. 27, Part 1, (2002) February, pp. 3558.
[12] M. Blumenstein and B. Verma, Neural-based solutions for the segmentation and recognition of difficult handwritten words
from a benchmark database, in Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR
99, (1999) September, pp. 281 284.
[13] Y. Tay, M. Khalid, R. Yusof and C. Viard-Gaudin, Offline cursive handwriting recognition system based on hybrid markov
model and neural networks, in Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and
Automation 2003, vol. 3, (2003) July, pp. 1190 1195.
[14] G. Kim, V. Govindaraju and S. Srihari, A segmentation and recognition strategy for handwritten phrases, in Proceedings of the
13th International Conference on Pattern Recognition 1996, vol. 4, (1996) August, pp. 510514.
[15] Y. Y. Chung and M. T. Wong, Handwritten character recognition by fourier descriptors and neural network, in Proceedings of
IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications, TENCON 97, vol.
1, (1997) December, pp. 391 394
76
www.ijergs.org

Optical Character Recognition Using Artificial Neural Network

Uploaded by

Copyright:

Available Formats

Optical Character Recognition Using Artificial Neural Network

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optical Character Recognition Using Artificial Neural Network

Uploaded by

Copyright:

Available Formats

International Journal of Engineering Research and General Science Volume 3, Issue 1, January-February, 2015

Optical Character Recognition Using Artificial Neural Network

Dept Of Electronics Engg , Dr J.J.Magdum College Of Engg , Jaysingpur , India.

Of Electronics & Telecom. Engg ,TPCTs College Of Engg , Osmanabad , India.

vvpatil2429@gmail.com , rvssanap@gmail.com , kharaterohini@gmail.com

b. Feature extraction (both same as above)

Fig 1. Training And Testing Of Data

AUTOMATIC IMAGE PREPROCESSING

Fig 2. Cropped and resized picture

Fig 3. Image resize again to meet the network input requirement

DESIGN AND IMPLEMENTATION

You might also like