Optical Character Recognition Using Artificial Neural Network
Optical Character Recognition Using Artificial Neural Network
Optical Character Recognition Using Artificial Neural Network
ISSN 2091-2730
3 Dept
Keywords - Character Recognition, Training, Feature Extraction, Image Processing, ANN,OCR, classification.
INTRODUCTION
Character recognition, usually abbreviated to optical character recognition or shortened OCR, is the mechanical or electronic
translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine editable text [1]. It is a
field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the
focus on character recognition has shifted to implementation of proven techniques. For many document-input tasks, character
recognition is the most cost-effective and speedy method available. And each year, the technology frees acres of storage space once
given over to file cabinets and boxes full of paper documents [10].
The goal of Optical Character Recognition (OCR) is to classify optical patterns (often contained
in a digital image) corresponding to alphanumeric or other characters. The process of OCR involves several steps including
segmentation, feature extraction, and classification [12].Theneural network technology can be usedto analyze the stroke edge, the line
of discontinuity between the text characters, and the background [3]. Allowing for irregularities of printed ink on paper, each
algorithm averages the light and dark along the side of a stroke, matches it to known characters and makes a best guess as to which
character it is. The OCR software then averages or polls the results from all the algorithms to obtain a single reading [2]. Neural
networks can be used, if we have a suitable dataset for training and learning purposes. Datasets are one of the most important things
when constructing new neural network.
METHODOLOGY
To solve the defined handwritten character recognition problem of classification we used MATLAB computation software with
Neural Network Toolbox and Image Processing Toolbox add-on.In Classification Processthere are two steps in building a
classifiertraining and testing. These steps can be broken down further into substeps.
TRAINING
a. Pre-processing Processes the data so it is in a suitable form .
b. Feature extraction Reduce the amount of data by extracting relevant
informationUsually results in a vector of scalar values. (We also need to
normalize the features for distance measurements)
c. Model Estimation from the finite set of feature vectors, need to estimate a model
(usually statistical) for each class of the training data.
TESTING
a. Pre-processing
73
www.ijergs.org
International Journal of Engineering Research and General Science Volume 3, Issue 1, January-February, 2015
ISSN 2091-2730
FEATURE EXTRACTION
The sub-images have to be cropped sharp to the border of the character in order to standardize the sub-images. The image
standardization is done by finding the maximum row and column with 1s and with the peak point, increase and decrease the counter
until meeting the white space, or the line with all 0s. This technique is shown in figure below where a character S is being cropped
and resized.
www.ijergs.org
International Journal of Engineering Research and General Science Volume 3, Issue 1, January-February, 2015
ISSN 2091-2730
CONCLUSION
This paper carries out a study handwritten character recognition using Artificial Neural Network.Artificial neural networks are
commonly used to perform character recognition due to their high noise tolerance. The systems have the ability to yield excellent
results. The feature extraction step of optical character recognition is the most important. A poorly chosen set of features will yield
poor classification rates by any neural network. At the current stage of development, the software does perform well either in terms of
speed or accuracy but not better. It is unlikely to replace existing OCR methods, especially for English text. A simplistic approach for
recognition of Optical characters using artificial neural networks has been described.
REFERENCES:
[1] S. Mori, C. Y. Suen and K. Kamamoto, Historical review of OCR research and development, Proc. Of IEEE, vol. 80, (1992)
July, pp. 1029-1058.
[2] N. Arica and F. Yarman-Vural, An Overview of Character Recognition Focused on Off-line Handwriting, IEEE Transactions on
Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 31, no. 2, (2001), pp. 216 - 233.
[3] A. Rajavelu, M. T. Musavi and M. V. Shirvaikar, A Neural Network Approach to Character Recognition, Neural Networks, vol.
2, (1989), pp. 387-393.
[4] R. Plamondon and S. N. Srihari, On-line and off- line handwritten character recognition: A comprehensive survey, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, (2000), pp. 63-84.
75
www.ijergs.org
International Journal of Engineering Research and General Science Volume 3, Issue 1, January-February, 2015
ISSN 2091-2730
[5] U. Bhattacharya and B. B. Chaudhuri, Handwritten numeral databases of Indian scripts and multistage recognition of mixed
numerals, IEEE Transaction on Pattern analysis and machine intelligence, vol. 31, no. 3, (2009), pp. 444-457.
[6] M. Hanmandlu, K. R. M. Mohan and H. Kumar, Neural-based Handwritten character recognition, in Proceedings of Fifth IEEE
International Conference on Document Analysis and Recognition, ICDAR99, Bangalore, India, (1999), pp. 241-244.
[7] T. V. Ashwin and P. S. Sastry, A font and size-independent OCR system for printed Kannada documents using support vector
machines, in Sadhana, vol. 27, Part 1, (2002) February, pp. 3558.
[8] S. V. Rajashekararadhya and P. Vanajaranjan, Efficient zone based feature extraction algorithm for handwritten numeral
recognition of four popular south-Indian scripts, Journal of Theoretical and Applied Information Technology, JATIT, vol. 4, no. 12,
(2008), pp. 1171-1181.
[9] R. G. Casey and E. Lecolinet, A Survey of Methods and Strategies in Character Segmentation, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 18, no. 7, (1996) July, pp. 690-706.
[10] R. C. Gonzalez, R. E. woods and S. L. Eddins, Digital Image Processing using MATLAB, Pearson Education, Dorling
Kindersley, South Asia, (2004).
[11] T. V. Ashwin and P. S. Sastry, A font and size-independent OCR system for printed Kannada documents using support
vectormachines, in Sadhana, vol. 27, Part 1, (2002) February, pp. 3558.
[12] M. Blumenstein and B. Verma, Neural-based solutions for the segmentation and recognition of difficult handwritten words
from a benchmark database, in Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR
99, (1999) September, pp. 281 284.
[13] Y. Tay, M. Khalid, R. Yusof and C. Viard-Gaudin, Offline cursive handwriting recognition system based on hybrid markov
model and neural networks, in Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and
Automation 2003, vol. 3, (2003) July, pp. 1190 1195.
[14] G. Kim, V. Govindaraju and S. Srihari, A segmentation and recognition strategy for handwritten phrases, in Proceedings of the
13th International Conference on Pattern Recognition 1996, vol. 4, (1996) August, pp. 510514.
[15] Y. Y. Chung and M. T. Wong, Handwritten character recognition by fourier descriptors and neural network, in Proceedings of
IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications, TENCON 97, vol.
1, (1997) December, pp. 391 394
76
www.ijergs.org