OCR Sanskrit CNN

2018 13th IAPR International Workshop on Document Analysis Systems
Optical Character Recognition for Sanskrit using

Convolution Neural Networks
Meduri Avadesh* and Navneet Goyal**
Department of Mechanical Engneering*,
Department of Computer Science**,
Birla Institute of Technology and Science Pilani, Pilani Campus, India
f2015641@pilani.bits-pilani.ac.in*, goel@pilani.bits-pilani.ac.in**
Abstract—Ancient Sanskrit manuscripts are a rich source of characters are either less frequent or completely absent in
knowledge about Science, Mathematics, Hindu mythology, Indian Hindi text, Hindi OCRs would not be trained to segment
civilization, and culture. It therefore becomes critical that access and classify such characters correctly. Subsequently, the Hindi
to these manuscripts is made easy, to share this knowledge with
the world and to facilitate further research on this Ancient OCRs would display poor results in Sanskrit text.
literature. In this paper, we propose a Convolutional Neural Most of the recent Indic OCR systems make use of machine
Network (CNN) based Optical Character Recognition system learning algorithms such as support vector machines (SVMs)
(OCR) which accurately digitizes Ancient Sanskrit manuscripts [12] and artificial neural networks (ANNs) [11,16] to classify
(Devanagari Script) that are not necessarily in good condition.
letters in the image. These classifier models used in the OCRs
We use an image segmentation algorithm for calculating pixel
intensities to identify letters in the image. The OCR considers are trained with input images that are often downsampled
typical compound characters (half letter combinations) as sepa- by applying PCA [15,16], Gabour Filters [15,27], Geometric
rate classes in order to improve the segmentation accuracy. The Feature Graphs [27] etc., in order to reduce the complexity of
novelty of the OCR is its robustness to image quality, image the data. However, this results in a loss of important informa-
contrast, font style and font size, which makes it an ideal choice
tion necessary to make the classifier robust. For example, the
for digitizing soiled and poorly maintained Sanskrit manuscripts.
SVM classifier [12] displays different classification accuracy
for different font styles, showing that it does not generalize to
Index Terms—Devanagari Script, Sanskrit, Hindi, Deep Learn-
ing, OCR, digitization, Optical character recognition, CNN
different font styles. In addition to this, existing Indic OCRs
display poor results on degraded or poorly maintained docu-
ments or materials and their digitizing capability is limited to
I. I NTRODUCTION good quality text documents [27].
Sanskrit is gaining importance in various academic commu- In order to develop a robust OCR system which can digitize
nities due to the presence of ancient scientific and mathemat- soiled and noisy documents with high accuracy, we propose
ical research work written in this language. Scientists all over the use of Convolution neural networks (convnets) as opposed
the world, are spending increasing amount of time trying to to the popular use of SVMs and ANNs, as convnets possess
understand these ancient research manuscripts. However, the very high learning capacity and the capability to handle high
lack of accurately digitized and tagged versions of Sanskrit dimensional data such as images [2]. Convnets have displayed
manuscripts is a major bottleneck. In addition to this, the poor these characteristics consistently in various large-scale image
maintenance and text quality adds to the problem. Hence, it classification and video recognition tasks. Popular Convnet
becomes essential to digitize such ancient manuscripts which architectures such as the GoogLeNet [5], ResNet [24], VGG
are not only important for research but, are also an important Net [6] have achieved state of the art results in popular image
part of the culture and heritage of India. In order to facilitate classification challenges like the ILSVRC challenge or the
digitization of ancient Sanskrit material, we build an Indic ImageNet Challenge. In addition to this, researchers make
Optical Character Recognition System (OCR), specifically for use of convnets for various other tasks such as human pose
Sanskrit. estimation, dense semantic segmentation [26] etc.
In the recent years, several OCRs have been developed for The main contributions of the paper are 1) Developing
various Indian languages such as Hindi, Bangla, Telugu etc. an OCR framework for Sanskrit which can digitize soiled
[10,11,12,13]. However, very little work has been done to and poorly maintained documents 2) The use of CNNs as
develop good OCRs for Sanskrit. Even though both Hindi and classifiers for Sanskrit OCRs 3) A Sanskrit letter dataset
Sanskrit are written in the Devanagari script, it is important to consisting of 11,230 images belonging 602 classes 1 .
use a Sanskrit OCR instead of a Hindi OCR to digitize Sanskrit The rest of the paper is organized as follows. We first review
text due to the significant difference in complexity between the related work in section 2 and describe the features of
the two languages. Sanskrit text consists of several compound Devanagari script in section 3. In section 4, we discuss the
characters which are formed by different combinations of half approach used to segment letters in the image and the proce-
letter and full letter consonants. Some examples of compound
characters are shown in Fig 3 and Fig 4. Since such compound 1 Link for dataset: https://github.com/avadesh02/Sanskrit-letter-dataset
978-1-5386-3346-5/18 $31.00 © 2018 IEEE 447

DOI 10.1109/DAS.2018.50
dure used to generate the Sanskrit letter dataset. Subsequently,
we provide details of the proposed methodology and the CNN
architecture that is used to classify letters in the image. Section
6, presents the OCR results on the test data. Finally, we provide
the conclusion and future scope in Section 7.
II. R ELATED W ORK

Early efforts in Indian OCRs involve a template based and
feature-based approach for letter classification. In the template-
based approach, each unknown letter or pattern is compared
with a standard template pattern and the degree of similarity
between the two patterns is used to decide classification.
Results of the early OCRs were further improved by using Figure 1: Basic characters in Devanagari script
feature-based approaches along with the traditional template-
based approach [15,25]. Feature-based approaches identify
unique characteristics of patterns or letters and utilize them for A unique feature in the Devanagari script is the presence
classification. A template-based approach complemented by a of a horizontal line at the top of each word. The horizontal
feature based approach has been used in various Indic OCRs line is called the shirorekha and is used to connect consecutive
for scripts such as Devanagari, Bangla, Telugu etc [15,25]. characters or letters in a word. We refer to the shirorekha as
In recent years, OCR frameworks make use of various the header line.
machine learning algorithms instead of template-based and
feature-based methods, to classify letters. [11,17] employ IV. DATA ACQUISITION , S EGMENTATION & A NOTATION
an artificial neural network to recognize letters by inputting In this section, we describe the adopted algorithm to seg-
segmented images of letters into the classifier model. [12,27] ment the letters in the image and highlight the procedure
adopt a similar OCR framework, but they make use of support used to annotate the letter images. Scanned pages of Sanskrit
vector machines instead of ANNs to recognize letters. The ad- magazines such as different editions of Chandamama [21] are
vantages of [11,12,17,27] over traditional Indic OCR systems used for creating the letter dataset. The font size and font style
are that they identify unique characteristics in patterns without of the text in the magazines varied depending on the edition
any explicit derivation of features [15,25]. of the magazine. The image quality and contrast are also not
In addition to OCR frameworks that use image classification uniform due to poor maintenance of the magazines.
to label letters, few Indic OCRs have been designed based
on deep learning algorithms. [10] makes use of Bidirectional
A. Image Pre-Processing
Long Short-Term Memory Networks (BLSTMs) to classify
text in the image. The BLSTM OCR system functions by The scanned images are converted from RGB to grayscale.
recognizing text at a word level by recording the past and This is followed by binary thresholding of the image, with
future context of a word. As a result, these OCRs can directly a threshold value equal to the mean pixel intensity of the
digitize images without any prior letter detection and image grayscale image. The gray scale image is thresholded to
segmentation. suppress any irregularities or noise that may be present. The
The only existing Sanskrit OCR, available in the literature, threshold parameter was set to a different value for different
is given by Dinesh et al [17]. This Sanskrit OCR utilizes an images as each image had varying amounts of brightness and
ANN model to classify letters in the image. The limitation with contrast. Moreover, a constant threshold for all images would
the Sanskrit OCR [17] is its sensitivity to image quality, font result in excessive or insufficient thresholding. Further, it could
style and font size, which can be accounted to its relatively lead to a loss in important data or presence of unnecessary
simple ANN architecture (Sec VI C). To the best of our noise respectively.
knowledge, there are no other Indic OCRs developed for
Sanskrit. Further, no existing research work is focused on B. Letter Segmentation
digitizing degraded ancient Sanskrit manuscripts.
The processed thresholded images are used to identify
and segment letters. Letter segmentation is carried out by
III. D EVANAGARI S CRIPT identifying the corresponding line and word in the image first.
The Devanagari script consists of 44 basic characters in- The lines in the text are identified by calculating a row-wise
cluding vowels and consonants (Fig 1). The vowels in the black colored pixel density. Subsequently, a line is bound to
Devanagari script occur as modified shapes in words. These exist between any two local minima as the minimas represent
modified characters are called modifiers or allographs [15]. In the white spaces between the lines. Following this, the words
some cases, the possibility of consonant modifiers also exists. in the segmented line are identified by calculating the column-
In addition to this, several half letter and full letter consonants wise black pixel intensity. As a result, the segment of the
often combine to form compound characters. Examples of image between any two consecutive local minima contains a
compound characters are shown in Fig 3 and Fig 4. word.
448
Figure 3: Examples of compound characters that are consid-
ered as unique classes
Figure 4: Examples of ambiguous situation while segmenting

half letters
4a segmentation should not occur after the half “Na” letter. A

similar ambiguity can be observed in Fig 4b and Fig 4c. In
the case of images Fig 4d, Fig 4e and Fig 4f, an additional
problem arises as there is no common way to segment the half
Figure 2: Letter segmentation procedure. Image (a) depicts letters.
header line removal. Image (b) shows pixel intensity calcula- The final dataset used for training contains 10,106 letter
tion. Image (c) shows the segmented letters, depicted by the images of size 32x32x3(RGB), belonging to 602 classes. An
vertical lines drawn at the location of the local minima. independent test data set was built for evaluating the model.
The test data contained letters of various font size and font
style as they were obtained from different magazines.
The segmented words are used to identify letters. Segment-
ing the letters by directly identifying the local minima of the V. M ETHODOLOGY
column-wise black pixel intensity is not feasible due to the In this section, we explain the baseline concepts used while
presence of the header line. This is because the thickness designing the convnet architecture. Subsequently, we describe
of the header line varies consistently, making it difficult to the adopted procedure for training the convolutional neural
identify the local minima. In order to avoid this ambiguity, network (convnet). Finally, we describe the rationale for the
the header line in each segmented word is removed. This is proposed convnet architecture.
achieved by calculating a row-wise black color pixel intensity
and subsequently removing the region of the image which
contained the maximum row-wise pixel density. Following A. Data Augmentation
this, the image without a header line is used to segment Data augmentation is a technique where, images are sub-
letters. The letters are identified by locating the local minima jected to a random amount of distortion (for example shear,
of column-wise black colored pixel intensities, as a letter shifting, rotation) such that the distorted image also belongs
would exist between any two consecutive minima. The letter to the same class the original image belongs to. Data aug-
segmentation procedure is depicted in Fig 2. mentation is an artificial way of magnifying the dataset size
so that the classifiers do not overfit. In addition to this, the
C. Data Labeling augmentation makes the convnet scale invariant. Since the
dataset used in this work is small, data augmentation is used
The proposed segmentation methodology considers the re- extensively during training. Each image is subjected to a
gion between any two local minima (zero intensity) to contain random amount of twisting, shifting, shear, zoom in and out
a letter. As a result, in some cases, a combination of half letter and jittering, before training.
or full and half letter (compound characters) are segmented
as one entity (Fig 3). Each of these possible combinations
has been considered to be a unique class while labeling the B. Baseline Architecture
letters. Even though this method increases the number of All architectures trained on the dataset are based on a
classes, the segmentation results were superior as compared common baseline architecture. The features of the baseline
to the approach of identifying and segmenting each half letter architecture are as follows. The input of all the convnets is
in the word. The segmentation results were poor or non- fixed to 32x32x3 RGB image. The image is passed through
uniform while segmenting half letters due to the presence a series of convolution and max-pooling operations. Each
of ambiguities. For example, in Fig 4a, segmentation should convolution operation contains a 3x3 kernel with stride 1 and
occur after the half “Na” letter, while in the second case of Fig no spatial padding. Subsequently, the convolutional output
449
A B C D
Input Image Input Image Input Image Input Image
6 WeightLayers 8 WeightLayers 8 WeightLayers 8 WeightLayers
Conv3-32 Conv3-32 Conv3-64 Conv3-64
Conv3-32 Conv3-64 Conv3-64
Maxpool Maxpool Maxpool Maxpool
Maxpool Maxpool Maxpool Maxpool
Fc-2048 Fc-2048 Fc-4096 Fc-4096(dp 0.2)
Fc-1024(dp 0.2) Fc-1024(dp 0.2) Fc-2048(dp 0.2) Fc-2048(dp 0.2)
Softmax Softmax Softmax Softmax Figure 5: Pictorial representation of the proposed convnet
architecture for the Sanskrit OCR.
Table I: Convnet architectures trained on the data. Convolution
layer is depicted as Conv<filter size> <number of filters>.
Dropout is represented as dp <probability> conv3-64 layer (Table I). The training error for convnet B did
not stagnate, rather convnet B started to overfit on the data.
This proved that slight changes in the architecture would be
is followed by a ReLU function [2,4]. The ReLU functions sufficient to improve the results. As a result, convnet C is
do not saturate while training the convnet and eventually designed by doubling the number of neurons in each fully
help avoid the vanishing gradient problem [2,24]. A max- connected layer of convnet B. However, Convnet C showed a
pooling operation is used after every 2 or 3 Convolution slight improvement in results before it started to overfit on the
operations, depending on the architecture design, to reduce data. In order to prevent overfitting, a dropout layer is added
the computational intensity of the architecture. A kernel/filter after the first fully connected layer. The resulting architecture,
size of 2x2 with a stride of 2 pixels is used in each max convnet D, achieves the best results on the dataset. Hence
pooling operation. The final convolution layer is followed by convnet D is the proposed classifier for the OCR (Fig 5). The
2 fully connected layers. Subsequently, the convnet terminates model accuracies of the convnets on the train data are shown
with a softmax layer. The number of channels in each fully in Table II.
connected layer and the number of filters in each convolution In all the convnets, a series of convolution operations with
operation varies for different convnets. 3x3 filters are used to generate an effective larger receptive
field of 5x5 or 7x7, instead of directly using a 5x5 or 7x7
C. Training filters. This is because two consecutive 3x3 filters produce an
The Convnets are trained by optimizing on a cross-entropy effective receptive field of a 5x5 filter, but at the same time
loss function using mini batch gradient descent and back- reduce the number of parameters in the network. Similarly,
propagation [3]. The batch size is set to 32, with a nestrov three consecutive 3x3 filters produce an effective receptive
momentum of 0.9. The learning rate is set to a constant field of a 7x7 filter, while reducing the number of parameters
value of 0.001, i.e. this value is not altered during training. A in the network [6]. In addition to this, increasing the number of
constant dropout of 0.2 [1] is used, to prevent over-fitting. The convolution operations would also increase the non-linearity
weights are randomly initialized with a standard deviation of which leads to an increase in the learning capacity of the
0.1. Each convnet is subjected to a variable number of epochs networks.
during training (between 120 to 140 epochs). While training, We avoided using top-performing convnets such as
the validation and train accuracies were closely observed. GoogLeNet [5], Microsoft Resnet [24], VGG Net [6] because
Training is terminated when the validation accuracy started they are designed for more complex datasets like the ImageNet
to drop while the training accuracy continued to improve. In [ILSVRC] which contain larger images and more number of
other words, each Convnet is trained till just before they started classes. Hence, such deep and complex convnet architectures
to overfit on the data. would overfit on relative simple dataset such as ours. In
addition to this, these deep and complex convnets would
demand huge computing power which is not available in
D. Rationale for proposed architecture
ordinary computers for which the proposed OCR has been
The various modifications made on the baseline architecture designed to enable prevalent use. Finally, these complex nets
were based on the philosophy of Karen et al [6], i.e. the depth display slow run times while classifying images which make
of the convnet is increased while keeping the size of the filters them unsuitable for an OCR where hundreds of letters must
same. In addition to this, the number of channels in the fully be classified in each page.
connected layers are also altered. All convnets architectures
trained on the data are shown in Table I.
E. Implementation Details
Initially convnet A (Table I) is trained on the data using
the procedure mentioned in section V C. The training error of The entire software is implemented in python. Image seg-
convnet A reached a constant value, showing that a more com- mentation and letter localization are carried out with the help
plex model is required to attain better results. Subsequently, of open source libraries like OpenCV and PIL. The convnets
Convnet B is designed by adding an additional conv3-32 and are implemented on Keras [7] using Tensorflow [14] as the
450
Convnet architecture Train Error (%) Validation Error (%)
A 10 11.97
B 7.9 12.46
C 6.5 11.94
D 4.93 4
Table II: Top-1 train and validation errors of experimented

convnet architectures on the dataset.
(a) A cropped section of an input image entered into the OCR
Figure 6: Examples of segmented letter images present in the

test data. The font size and style, image contrast and quality
are different in the segmented images.
backend. Training is carried out on an NVIDIA 960M GPU,

training a single convnet took about 1 hour. Since the convnets (b) The output from the OCR before conversion to Devanagari script. The
English class labels for the Sanskrit letters are based on the i-trans scheme.
are simple in nature, techniques like Multi-GPU training are
not used to speed up the training process. Figure 7: Example of digitizing a text image. The OCR
identifies and classifies each letter in the image.
VI. R ESULTS
In this section, we describe the proposed pipeline of the Classifiers Test Error (%)
Sanskrit OCR for digitizing scanned images. Subsequently, we ANN 21.79
Convnet A 13.44
present the results achieved by trained convnets architectures Convnet B 12.89
on the test data. Finally, we compare the result of the proposed Convnet C 10.41
convnet with other OCRs. Convnet D 6.68
The test data was generated by randomly selecting scanned Table III: Top-1 test errors of experimented convnet architec-
images of pages belonging to different Sanskrit magazines. tures and the ANN on the test dataset.
This ensured that the letters in the dataset belonged to different
font style and font size. In addition to this, the image quality
and contrast were not uniform. The test data contained 1124 C. Comparison with other OCRs
images of letters. Some examples of the segmented images in
the test data are shown in Fig 6. In order to evaluate the performance of our OCR system
with an existing Sanskrit OCR, we trained the Artificial Neural
A. Complete pipeline of the OCR Network model designed by Dinesh et al [17] on our dataset.
The performance of the ANN on the test data is shown in
A scanned image of a Sanskrit page is entered into the Table IV. The ANN displays poor results due to its limited
Sanskrit OCR. The Sanskrit OCR segments the letters in the learning capability which can be accounted to its relatively
page following the procedure mentioned in Section IV B. simple architecture design [17]. As a result, the ANN is unable
Subsequently, the segmented image is classified by the OCR to classify noisy and degraded images accurately.
using the proposed convnet. Finally, the English class names
Since there are no other reliable OCRs available for San-
of the classified letters are used to generate a text document
skrit, we compare our results with existing Hindi OCRs (Table
which is then transliterated into Devanagari script with an
IV). Our OCR shows better results even though it is used to
Itrans converter [20]. An example of the English text output
digitize Sanskrit text which is more complex than Hindi (Sec
is shown in Fig 7.
I). In addition to this, our OCR achieves low error rates on
the test data which contains letters of various font size and
B. Performance on Test Data
style, image quality and contrast (Sec IV). Hence making our
For comparison with a standard classifier, we train a 5-layer OCR suitable for digitizing poorly maintained text such as old
artificial neural network (ANN) on the data. The ANN contains
2500, 2000,1000 and 800 neurons in the first, second, third and
OCR author/ Name Language Accuracy(%)
fourth layer respectively. The ANN finally terminates with 602
Dinesh et al [17] Sanskrit 15.3%
neuron softmax layer. Tesseract[8] Hindi 73.33%
The results achieved by the ANN and convnets on the test Google[18] Hindi 92.90%
dataset is depicted in Table III. The ANN is outperformed by Our OCR Sanskrit/Hindi 93.32%
all the convnets due to its limited learning capability and high Table IV: Comparison with other OCRs
sensitivity to image quality when compared to convnets.
451
Sanskrit manuscripts and documents. Computing, 2009. ARTCom’09. International Conference on,
pp. 31-38. IEEE, 2009.
VII. C ONCLUSION AND F UTURE S COPE [12] C. V. Jawahar, M. N. S. S. K. Pavan Kumar, S. S. Ravi
Kiran. A Bilingual OCR for Hindi-Telugu Documents and its
We present an OCR for Sanskrit (Devanagari script). We
Applications. ICDAR 2003, 0-7695-1960-1/03 $17.00 © 2003
introduce a novel approach of using convnets as classifiers
IEEE
for Indic OCRs. We show that convnet are more suitable
[13] B. B. Chaudhuri, U. Pal. An OCR System to Read Two
than SVMs and ANNs, for multi-class image classification
Indian Language Scripts: Bangla and Devanagari (Hindi). 0-8
problems. In addition to this, we show that our OCR is ideal
186-7898-4/97 $10.00 0 1997 IEEE
for digitizing old and poorly maintained material as it robust
[14] Abadi, Martín, Ashish Agarwal, Paul Barham, Eugene
to font size and style, image quality and contrast.
Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado et al.
To improve the OCR system further, learning can be intro-
"Tensorflow: Large-scale machine learning on heterogeneous
duced for letter segmentation and identification. This could be
distributed systems." arXiv preprint arXiv:1603.04467 (2016).
achieved with the help of a selective search algorithm followed
[15] Pal, U., and B. B. Chaudhuri. "Indian script char-
by an R-CNN [23].
acter recognition: a survey." pattern Recognition 37, no. 9
(2004):1887-1899.
VIII. R EFERENCES [16] Dineshkumar, R., and J. Suganthi. "A research survey
[1] Hinton, Geoffrey E., Nitish Srivastava, Alex Krizhevsky, on Sanskrit offline handwritten character recognition." KTVR
Ilya Sutskever, and Ruslan R. Salakhutdinov. "Improving neu- Knowledge Park for Engineering and Technology, Hindusthan
ral networks by preventing coadaptation of feature detectors." College of Engineering and Technology Tamilnadu (2013).
arXiv preprint arXiv:1207.0580 (2012). [17] Dineshkumar, R., and J. Suganthi. "Sanskrit character
[2] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hin- recognition system using neural network." Indian Journal of
ton. "Imagenet classification with deep convolutional neural Science and Technology 8, no. 1 (2015): 65.
networks." In Advances in neural information processing sys- [18] “Google ocr,” https://support.google.com/drive/answer/
tems, pp. 1097-1105. 2012. 176692? hl=en.
[3] LeCun, Yann, Bernhard Boser, John S. Denker, Don- [19] MD Zeiler, R Fergus. Visualizing and Understanding
nie Henderson, Richard E. Howard, Wayne Hubbard, and Convolutional Networks. European conference on computer
Lawrence D. Jackel. "Backpropagation applied to handwritten vision, 2014
zip code recognition." Neural computation 1, no. 4 (1989): [20] Itrans converter and code: http://www.aczoom.com/itra-
541-551. ns
[4] Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear [21] Sanskrit Chandamama source: https://archive.org/detai-
units improve restricted Boltzmann machines." In Proceedings ls/Chandamama
of the 27th international conference on machine learning [22] Yadav, Divakar, Sonia Sánchez-Cuadrado, and Jorge
(ICML-10), pp. 807-814. 2010.. Morato. "Optical character recognition for Hindi language
[5] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre using a neural-network approach." JIPS 9, no. 1 (2013): 117-
Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, 140.
Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper [23] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian
with convolutions." In Proceedings of the IEEE conference on Sun. "Faster R-CNN: Towards real-time object detection with
computer vision and pattern recognition, pp. 1-9. 2015. region proposal networks." In Advances in neural information
[6] Simonyan, Karen, and Andrew Zisserman. "Very deep processing systems, pp. 91-99. 2015.
convolutional networks for large-scale image recognition." [24] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and
arXiv preprint arXiv:1409.1556 (2014). Jian Sun. "Deep residual learning for image recognition." In
[7] Keras GitHub repository:https://github.com/fchollet/ke- Proceedings of the IEEE conference on computer vision and
ras. pattern recognition, pp. 770-778. 2016.
[8] R Smith, google inc. An overview of Tesseract OCR [25] Bansal, Veena, and R. M. K. Sinha. "Integrating
engine ICDAR 2007 Distributed Systems Preliminary White knowledge sources in Devanagari text recognition system."
Paper, November 9, 2015. IEEE Transactions on Systems, Man, and Cybernetics-Part A:
[9] Dalal, Navneet, and Bill Triggs. "Histograms of oriented Systems and Humans 30, no. 4 (2000): 500-505.
gradients for human detection." In Computer Vision and Pat- [26] Long, Jonathan, Evan Shelhamer, and Trevor Darrell.
tern Recognition, 2005. CVPR 2005. IEEE Computer Society "Fully convolutional networks for semantic segmentation." In
Conference on, vol. 1, pp. 886-893. IEEE, 2005. Proceedings of the IEEE Conference on Computer Vision and
[10] Sankaran, Naveen, and C. V. Jawahar. "Recognition of Pattern Recognition, pp. 3431-3440. 2015.
printed Devanagari text using BLSTM Neural Network." In [27] Jayadevan, R., Satish R. Kolhe, Pradeep M. Patil, and
Pattern Recognition (ICPR), 2012 21st International Confer- Umapada Pal. "Offline recognition of Devanagari script: A sur-
ence on, pp. 322-325. IEEE, 2012. vey." IEEE Transactions on Systems, Man, and Cybernetics,
[11] Rahiman, M. Abdul, and M. S. Rajasree. "A detailed Part C (Applications and Reviews) 41, no. 6 (2011): 782-796.
study and analysis of ocr research in south indian scripts."
In Advances in Recent Technologies in Communication and
452

OCR Sanskrit CNN

Uploaded by

Copyright:

Available Formats

OCR Sanskrit CNN

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OCR Sanskrit CNN

Uploaded by

Copyright:

Available Formats

2018 13th IAPR International Workshop on Document Analysis Systems

Optical Character Recognition for Sanskrit using

978-1-5386-3346-5/18 $31.00 © 2018 IEEE 447

II. R ELATED W ORK

Figure 4: Examples of ambiguous situation while segmenting

4a segmentation should not occur after the half “Na” letter. A

Table II: Top-1 train and validation errors of experimented

(a) A cropped section of an input image entered into the OCR

Figure 6: Examples of segmented letter images present in the

backend. Training is carried out on an NVIDIA 960M GPU,

You might also like