0% found this document useful (0 votes)
6 views

Dot_Matrix_Deep_Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Dot_Matrix_Deep_Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Performance Improvement of

Dot-Matrix Character Recognition by


Variation Model based Learning

Koji Endo, Wataru Ohyama, Tetsushi Wakabayashi and Fumitaka Kimura

Graduate School of Engineering, Mie University


1577 Kurimamachiya-cho, Tsu-shi, Mie 5148507, Japan
{endo, ohyama}@hi.info.mie-u.ac.jp

Abstract. This paper describes an effective learning technique for op-


tical dot-matrix characters recognition. Automatic reading system for
dot-matrix character is promising for reduction of cost and labor re-
quired for quality control of products. Although dot-matrix characters
are constructed by specific dot patterns, variation of character appear-
ance due to three-dimensional rotation of printing surface, bleeding of
ink and missing parts of character is not negligible. The appearance vari-
ation causes degradation of recognition accuracy. The authors propose
a technique improving accuracy and robustness of dot-matrix character
recognition against such variation, using variation model based learning.
The variation model based learning generates training samples contain-
ing four types of appearance variation and trains a Modified Quadratic
Discriminant Function (MQDF) classifier using generated samples. The
effectiveness of the proposed learning technique is empirically evaluated
with a dataset which contains 38 classes (2030 character samples) cap-
tured from actual products by standard digital cameras. The recognition
accuracy has been improved from 78.37% to 98.52% by introducing the
variation model based learning.

1 Introduction
Dot-matrix characters are widely used for clarifying important information of a
product such as consumption or appreciation expiration dates. The dot-matrix
characters must be printed directly on the products in order to make both con-
sumers and producers being able to read information about the products. Au-
tomatic reading system for the dot-matrix characters is promising for reduction
of cost and labor required for quality control of products.
Fig.1 shows examples of actual camera-captured dot-matrix characters. As
implied by the figure, recognition of dot-matrix characters contains several types
of difference from standard character recognition. Since a dot-matrix character is
constructed by multiple dots which are observed as multiple separated connected
components in recognition process, a preprocessing which connects these dots
is required to handle a character as one connected component. Although dot-
matrix characters are constructed by specific dot patterns, variation of character
2 K.Endo,W.Ohyama,T.Wakabayashi and F.Kimura

appearance due to three-dimensional rotation of printing surface, diffusing or


squeeze out of ink and lack of dots in a printing pattern is not negligible.
For accurate recognition of these dot-matrix characters, several attempts
have been proposed [1–5]. These methods are mainly divided into two groups,
i.e. preprocessing-based and training-based methods. The preprocessing-based
methods employ several ad hoc preprocessing techniques such as blob-connection,
slant and rotation correction to restrict appearance variations of captured dot-
matrix characters[3, 5, 4]. However, dot-matrix characters used in actual produc-
tion scene contain a large amount of variation in matrix font patterns, dot size,
printing quality and degradation. Construction of an universal preprocessing
technique is quite difficult. Training-based methods employ classification models
which are trained to capture possible appearance variation of dot-matrix char-
acters. Artificial neural networks were employed as classifiers in literatures[1,
2].
It is also known that there are many undocumented dot-matrix recognition
products for factory automation. Many of them simplify the recognition task
by restriction of appearance variation of character using controlled environment
for character image capturing. And it is also relatively easy to recognize con-
trolled dot-matrix characters using actual training dataset consists of the same
capturing environment. If an universal recognition technique which is applicable
uncontrolled or less-controlled environment, it should have certain amount of
contributions for industrial scene.
In this paper, the authors propose a technique improving accuracy and ro-
bustness of dot-matrix character recognition against such variation, using varia-
tion model based learning. The variation model based learning generates train-
ing samples containing four types of appearance variation and trains a Modified
Quadratic Discriminant Function (MQDF) classifier using generated samples.
The effectiveness of the proposed learning technique is empirically evaluated
with a dataset which contains 38 classes (2030 character samples) captured from
actual products.
The paper is organized as follows. In Section 2, generation models for training
dataset is described. Classification process on this research is shown in section 3.
section 4 provides information about evaluation experiments and results. Finally,
conclusions of the paper is given by section 5.

2 Variation model based learning

Training dataset which correctly reflect a model of data generation is necessary


for a classifier obtaining high recognition performance. Since obtaining a priori
generation model is generally difficult, an approach where the generation model
is estimated from given or sampled training data is employed. Although the
size of training data does not guarantee the accuracy of estimated model, it
is expected that a large size of training data contributes improving statistical
accuracy of the model.
Dot-Matrix Character Recognition by Variation Model based Learning 3

Fig. 1. Examples of actual camera-captured dot-matrix characters

Fig. 2. Variation of actual dot-matrix font face

Small size of training data sometimes cause insufficient accuracy of estimated


model. The generative learning, which artificially generates training dataset from
assumptions or a priori knowledge of data generation process, has been proposed
as a promising solution for similar situation[6].
The variation model based learning proposed in this paper also generates
large scale training data containing four possible appearance variations of dot-
matrix characters to recognize. In this section, we describe data generation pro-
cess for these appearance variations.

2.1 Multiple dot-matrix font faces

Fig.2 shows examples of actual dot-matrix characters in the same class. As shown
in the figure, multiple dot-matrix font faces are employed even they are same
characters. The proposed method employs two types matrices, i.e. 5 by 7 and 5
4 K.Endo,W.Ohyama,T.Wakabayashi and F.Kimura

(a) 5 by 7 matrix characters

(b) 5 by 5 matrix characters

(c) extra dot-matrix font face

Fig. 3. Generated multiple dot-matrix font face

by 5 which are widely used by actual products. Fig.3 (a) and (b) shows generated
5 by 7 and 5 by 5 standard dot-matrix characters, respectively.
Some character class consists of multiple font face to improve readability
by human. For instance, ‘3’ contains multiple font face as shown by Fig.2. To
correctly recognize these characters, the proposed method also generate extra
font faces addition in the standard characters. Specifically, one extra font face is
added for each of ‘1’, ‘2’, ‘5’ and ‘6’, and two are added for ‘3’. Fig.3 (c) shows
actual extra font faces.

2.2 Three dimensional rotation


A major problem of camera-based character recognition is three-dimensional ro-
tation of captured characters. Occurred situation in dot-matrix character recog-
nition is that characters printed on rounded surface easily change their appear-
ance by spacial relationship between the camera and the product. To handle
these appearance variation, the proposed method generates rotated dot-matrix
characters and add them to the training dataset. Fig.4 illustrates generation
process of three dimensional rotation characters.

2.3 Size of dots


Dots constructing characters are easily diffused in printing. Diffusing dots also
causes large appearance variation. To handle such variation, the proposed method
generates the training dot-matrix characters with multiple diameter d. Fig.5
shows examples of dot-matrix character ‘A’ generated with different d.
Dot-Matrix Character Recognition by Variation Model based Learning 5

(a) three dimensional (b) x-axis rotation (c) y-axis rotation (d) z-axis rotation
rotation

Fig. 4. Generation of variation model by three dimensional rotation

(a) d=5 (b) d=7 (c) d=9 (d) d=11

Fig. 5. Appearance variation by difference of diameter of dots

2.4 Missing dots


The dots constructing characters sometimes disappear due to printing error or
capturing environment. When even one dot disappears, the dot-matrix character
changes significantly in appearance. To recognize characters containing missing
dots, the method generates characters of which one dot is deleted as shown by
Fig.6. Additionally, deleting one dot from a character results ambiguous ap-
pearance between two different character class. These ambiguous characters are
excluded from training dataset. Fig.7 shows examples of excluded dot-matrix
patterns.

3 Classification
3.1 Gradient feature extraction
In this research, the gradient feature vector [7] are used for classification. The
gradient feature vector is composed of directional histogram of gradient of the
input character image. In this section, we summarize the gradient feature ex-
traction. The gradient feature extraction is performed as in the following steps:
6 K.Endo,W.Ohyama,T.Wakabayashi and F.Kimura

Fig. 6. Examples of generated character of which one constructing dot is deleted

Step 1: A 2 × 2 mean filtering is applied 4 times on the input image.


Step 2: The gray-scale image obtained in Step 1 is normalized so that the mean
gray scale values lie in the range of -1 to +1, with a mean value of 0.
Step 3: Roberts filter is applied to the image to obtain gradient image. The di-
rection of the gradient is initially quantized into 4D directions and the
strength of the gradient is accumulated for each of the quantized direction.
The strength and the direction of the gradient are defined by (1) and (2)
respectively.

2 2
f (x, y) = (∆u) + (∆v) , (1)

∆v
θ (x, y) = tan−1 , (2)
∆u

∆u = g (x + 1, y + 1) − g (x, y) ,

∆v = g (x + 1, y) − g (x, y + 1) .

Step 4: The enclosing rectangular of the input character is divided into 2m − 1


square blocks in height and width respectively and the histogram of (2m −
1)2 × 4D dimension is extracted.
Step 5: Directional histogram of (2m − 1) × (2m − 1) blocks and 4D directions are
down sampled into m × m blocks and D directions using Gaussian filters,
and the histogram of m2 × D dimension is obtained.

In this research, m and D are set as 6 and 8, respectively, so we obtained


62 × 8 = 288 dimensional original feature vector.
Dot-Matrix Character Recognition by Variation Model based Learning 7

excluded original

excluded original

Fig. 7. Excluded dot-matrix characters due to ambiguous appearance

3.2 MQDF classfier


The MQDF [8] classifier was employed. The MQDF is expressed by:
1 2
g (X) = (N + N0 + n − 1) ln[1 + [||X − M ||
N0 σ 2

k
λi { T }2
− N0 2
Φi (X − M ) ]]
λ + Nσ
i=1 i
∑ (
k
N0 2
)
+ ln λi + σ − 2 ln P (ω), (3)
i=1
N

where, X denotes a n-dimensional gradient feature vector of input dot-matrix


character, M is mean vector of training samples, λi and Φi are i-th eigenvalue and
corresponding eigenvector of covariance matrix of training samples, respectively.
k is a parameter which denotes the number of eigenvectors used for classification.
N and P (ω) denote the number of training data and a priori probability of class
ω. σ 2 is variance assuming spherical a priori distribution of X and determined
by mean of all eigenvalues in all classes.
The parameter k which denotes the number of eigenvector used for classifi-
cation is determined by prior experiment with verification dataset, i.e. a sub-set
of experimental dataset. N0 is defined by:
α
N0 = N, (4)
1−α
where, the parameter α (0 < α < 1) is also determined by prior experiment.
8 K.Endo,W.Ohyama,T.Wakabayashi and F.Kimura

4 Experiments and results


To confirm the effectiveness of the proposed variation model based learning, we
conducted evaluation experiments.

4.1 Evaluation dataset


The evaluation dataset employed in this research consists of 2030 dot-matrix
character images of 38 character classes. Each character image is captured from
actual industrial, medical and food products by digital cameras and extracted
and binarized manually. As shown in Fig.1, the evaluation dataset contains ap-
pearance variations due to distortion, blur, multiple font-face and three-dimensional
rotation. Since the character classes ‘0 (numeral)’ and ‘O (alphabetical)’ have
same appearance, they are handled as a same character class.

4.2 Experimental results


Fig. 8 shows recognition performances by combinations of training data genera-
tion methods. (a) to (d) in the table below bar chart denote:
(a) multiple dot-matrix font face,
(b) three dimensional rotation,
(c) multiple size of dots,
(d) missing dots,
respectively. Circles in the table mean that a corresponding variation model is
employed for data generation.
The original recognition accuracy of 78.37% was improved by introducing
generated training data and the highest recognition accuracy of 98.52% has been
obtained when all of four variation models were employed for training data gen-
eration. This results imply that training data generation reflecting appearance
variation is effective for performance improvement even when no actual captured
character images are available in the training dataset.
Failure recognition samples are shown in Fig.9. Since significant degradation
of characters were not included in the training dataset, the MQDF classifier
could not handle these degraded characters.

5 Conclusions
In this paper, the authors propose a technique improving accuracy and robust-
ness of dot-matrix character recognition against such variation, using variation
model based learning. The variation model based learning generates training
samples containing four type of appearance variation and trains a MQDF clas-
sifier using generated samples. The effectiveness of the proposed training data
generation was confirmed by the experiments using actual camera-captured dot-
matrix characters. The MQDF classifier trained generated dataset, which did not
Dot-Matrix Character Recognition by Variation Model based Learning 9

Fig. 8. Improvement of recognition performance by variation model based learning

Fig. 9. Examples of failure character recognition due to significant degradation of dot-


matrix patterns

contain any actual captured data, successfully recognize dot-matrix characters


against appearance variations happened in usual industrial scene.
Future study topics include (1) further performance improvement by opti-
mization of data generation conditions, (2) integration of the proposed recogni-
tion method and a dot-matrix character detection method and (3) introducing
failure printing detection.

Acknowledgement. A part of this research is supported by OMRON Corpo-


ration.

References

1. Yanikoglu, B.A.: Pitch-based segmentation and recognition of dot-matrix text. In-


ternational Journal on Document Analysis and Recognition 3 (2000) 34–39
10 K.Endo,W.Ohyama,T.Wakabayashi and F.Kimura

2. Namane, A., Soubari, E.H., Meyrueis, P.: Degraded dot matrix character recognition
using csm-based feature extraction. In: Proceedings of the 10th ACM symposium
on Document engineering, ACM (2010) 207–210
3. Grafmüller, M., Beyerer, J.: Segmentation of printed gray scale dot matrix charac-
ters. In: Proceedings of 14th world multi-conference on systemics, cybernetics and
informatics WMSCI. Volume 2. (2010) 87–91
4. Du, Y., Ai, H., Lao, S.: Dot text detection based on fast points. In: Document
Analysis and Recognition (ICDAR), 2011 International Conference on, IEEE (2011)
435–439
5. Mohammad, K., Agaian, S.: Practical recognition system for text printed on clear
reflected material. ISRN Machine Vision 2012 (2012)
6. Ishida, H., Yanadume, S., Takahashi, T., Ide, I., Mekada, Y., Murase, H.: Recog-
nition of low-resolution characters by a generative learning method. Proc. CBDAR
(2005) 45–51
7. Shi, M., Fujisawa, Y., Wakabayashi, T., Kimura, F.: Handwritten numeral recog-
nition using gradient and curvature of gray scale image. Pattern Recognition 35
(2002) 2051–2059
8. Kimura, F., Wakabayashi, T., Tsuruoka, S., Miyake, Y.: Improvement of handwrit-
ten japanese character recognition using weighted direction code histogram. Pattern
recognition 30 (1997) 1329–1337

You might also like