Likforman Sigelle PR08 2
Likforman Sigelle PR08 2
Likforman Sigelle PR08 2
Pattern Recognition
journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r
A R T I C L E
I N F O
Article history:
Received 1 March 2007
Received in revised form 15 January 2008
Accepted 15 March 2008
Keywords:
Markovian models
Hidden Markov models
Dynamic Bayesian networks
Historical documents
Broken character recognition
A B S T R A C T
In this paper, we investigate the application of dynamic Bayesian networks (DBNs) to the recognition of
degraded characters. DBNs are an extension of one-dimensional hidden Markov models (HMMs) which
can handle several observation and state sequences. In our study, characters are represented by the
coupling of two HMM architectures into a single DBN model. The interacting HMMs are a vertical HMM
and a horizontal HMM whose observable outputs are the image columns and image rows, respectively.
Various couplings are proposed where interactions are achieved through the causal influence between
state variables. We compare non-coupled and coupled models on two tasks: the recognition of artificially
degraded handwritten digits and the recognition of real degraded old printed characters. Our models
show that coupled architectures perform more accurately on degraded characters than basic HMMs, the
linear combination of independent HMM scores, as well as discriminative methods such as support vector
machines (SVMs).
2008 Elsevier Ltd. All rights reserved.
1. Introduction
Since the seminal work of Rabiner [1], stochastic approaches
such as hidden Markov models (HMMs) have been widely applied to
speech recognition, handwriting [2,3] and degraded text recognition
[4,5]. This is largely due to their ability to cope with incomplete information and non-linear distorsions. These models can handle variable length observation sequences and offer joint segmentation and
recognition which are useful to avoid segmenting cursive words into
characters [6]. However, HMMs may also be used as classifiers for
single characters [7,8] or characters segmented from words by an
"explicit'' segmentation method [9]: the scores output for each character and each class are combined at the word level. Another property of HMMs is that they belong to the class of generative models.
Generative models better cope with degradation since they rely on
scores output for each character and each class while discriminative
models, like neural networks and support vector machines (SVMs),
are powerful to discriminate classes through frontiers. In case of
degradation, characters are expected to be still correctly classified
by generative models even if lower scores are given.
Noisy and degraded text recognition is still a challenging task
for a classifier [10]. In the field of historical document analysis, old
printed documents have a high occurence of degraded characters,
especially broken characters due to ink fading. When dealing with
broken characters, several options are generally considered: restoring and enhancing characters [11--13] or recovering characters
through sub-graphs within a global word graph optimization scheme
[14]. Another solution is to combine classifiers or to combine data.
Several methods can be used for combining classifiers [15], one of
them consists of multiplying or summing the output scores of each
classifier. In the works of [16,17], two HMMs are combined to recognize words. A first HMM, modeling pixel columns, proposes word
hypotheses and the corresponding word segmentation into characters. The hypothesized characters or sub segments are then given to
a second HMM modeling pixel rows. This second HMM normalizes
and classifies single characters. The results of both HMMs are combined by a weighted voting approach or by multiplying scores. Our
approach differs with restoration methods as it aims at enhancing
the classification of characters without restoration. This is motivated by the fact that preprocessing may introduce distortions to
character images. In our previous work [18], we compared data and
decision fusion and showed that data fusion yields better accuracy
than decision fusion for HMM-based printed character recognition.
The present dynamic Bayesian network (DBN) approach is a data
fusion scheme which couples two data streams, image columns and
image rows into a single DBN classifier. It differs from the approach
presented in [16,17] where two classifiers are coupled (one classifier per stream) in a decision fusion scheme, and from a data fusion
scheme consisting of a multi-stream HMM which would require
large and full covariance matrices in order to take into account
dependencies between the streams [18].
Our study consists of building DBN models which include in a
single classifier two sequences of observations: the pixel rows and
N
P(Xi | Pa(Xi ))
i=1
X2
Xti ), i = 1, 2.
DBNs provide general-purpose training and decoding algorithms
based on the expectation-maximization (EM) algorithm and on inference mechanisms [22]. Model training consists of estimating model
parameters, CPTs and CPDs. Inference algorithms are performed on
the network to compute the best state sequences or the likelihoods
of observation sequences.
An HMM is a particular case of DBN where there is only one observation stream and one state sequence. The dynamic character of
DBNs makes it suitable for applications such as speech and character recognition. In [23,24], DBNs are used to model the interactions
between speech observations at different frequency bands in a way
that is robust with respect to noise.
3. Independent and coupled architectures
In this study, we couple data streams into single DBN classifiers.This coupling is performed through various DBN architectures
(graphical representations) which combine two basic HMMs: the
vertical HMM whose outputs are the columns of pixels and the
horizontal HMM whose outpouts are the image rows. In our models, the interactions are usually (but not only) performed through
states, leading to efficient models in terms of model complexity (see
Section 3.3). Brand et al. [25] have proposed coupled architectures
"coupled HMMs'' for modeling human interactions: in their models,
a state of one HMM is linked to all other HMM states of the adjacent
time-slice. This yields symmetric architectures while our coupled
architectures are highly non-symmetric.
In our framework, all character classes share the same DBN architecture. Admissible architectures do not include continuous variables with discrete children (for exact inference purposes [23]) and
have also a small number of parameters (in order to get a tractable
inference algorithm). One approach consists of learning network architecture from data [26]. This approach is tractable for static BNs
when dealing with a few observed variables but becomes rapidly too
complex in the presence of hidden state variables. Automatic architecture learning is beyond the scope of this paper and our strategy
consists of heuristically looking for various admissible architectures
and selecting those which provide the best recognition performance.
Y2
X2
X3
Y11
X21
X22
Y1
the conditional state distribution P(Xt2 | Xt1 ), the state transition dis2 ) and the two CPDs are the Gaussian pdfs P(Y i |
tribution P(Xt2 | Xt1
t
1
Y2
Y11
X21
hidden state and observation attributes in stream i. Xti and Yti are the
random variables (nodes) for Xi and Yi at time t.
We assume that the process modelled by DBNs is first-order
Markovian and stationary. In practice, this means that the parents
of any variable Xti or Yti belong to the time-slice t or t 1 only, and
that model parameters are independent of t. Parameters are thus
tied and a DBN can be represented by the first two time slices as in
Fig. 1. For each observation sequence, the network is repeated as
many times as necessary. Fig. 1 shows an example of unrolled DBN
for an observation sequence of length T = 3: the initial network is
repeated T times. Parameters for this model are given by CPTs and
CPDs: the three CPTs are the initial state distribution encoding P(X11 ),
X11
X1
3093
X22
2
Y1
Y13
Y2
X3
2
Y2
Y3
Fig. 1. Because of parameter tying, a DBN can be represented by only two time slices (left). To fit the two observation sequences {Y 1 } and {Y 2 } of length T = 3, the DBN is
unrolled and represented on 3 time slices (right).
3094
X1
X2
XT
X1
X2
XT
Y11
Y12
YT
Y1
Y2
YT
Fig. 2. Independent HMMs represented as DBNs: (a) vertical-HMM and (b) horizontal-HMM.
Fig. 3. Horizontal and vertical observation sequences obtained by scanning digit 3 from top to bottom and from left to right, respectively. Digit images are normalized to
size d d. Length of observation sequences is T = d, length of observation vectors is also d.
j,k
Bi (yi ) = P(Y i = yi | X i = k)
t
t
t
k t
(1)
i ; i , i ),
=
N(y
t k k
i
1 (k) = P(X1i = k), k [1, Q ].
Because of the stationarity assumption, all nodes Xti share the same CPT A
and all nodes Yt share the same matrix B.
1
probability density function (pdf) N(yti , ik ): ik is the mean vector
i
N(yti ; ik + Wki yt1
, ik ). CPDs B are written for each stream i and
for t 2 as
i i
i
i
i
Bk (yt , yt1
) = P(Yti = yti | Xti = k, Yt1
= yt1
), k [1, Q ]
i
i
i
i
i
= N(yt ; k + Wk yt1 , k ).
(2)
The matrix A and the initial distribution remain the same as for
basic HMMs. The matrix A is still constrained to be left--right. Note
that basic HMMs are a particular case of AR-HMMs with Wki = 0.
3095
X11
X21
XT1
X12
X22
XT2
Y11
Y21
YT1
Y12
Y22
YT2
Fig. 4. Independent auto-regressive AR-HMMs represented as DBNs: (a) AR-vertical and (b) AR-horizontal.
Fig. 5. Observation and state sequences for simple O and H shapes on a 5 5 grid. Joint configurations of long bars of pixels (observations a) and short bars of pixels
(observations b) occur for different state configurations.
2
and the value of the current vertical state Xt1 . This destate Xt1
pendence between the horizontal and the vertical states expresses
the dependence of the observations, i.e. between row t and column
t. Although row t and column t share only one pixel in common, the
whole row and the whole column of pixels may be correlated. The
more they are correlated the higher the probability of observing one
column configuration captured by the vertical state, in conjunction
with one row configuration captured by the horizontal state. As an
example, consider simple shapes on a 5 5 grid belonging to two
classes: H and O shapes, as shown in Fig. 5. We set the number of
states to three and we consider two discrete observation symbols
a and b: Yti = a when the number of pixels in column (or row) t is
> 3, else Yti = b. For H shapes the long central bar (row observation
a) is correlated with short bars (column observation b) in the central area of the image. For O shapes, long bars (row observations
a) are correlated with long bars (column observations a) at the top
and bottom of the image. The state/observation sequences shown
for both models in Fig. 5 express these correlations. For O shapes,
when (Xt1 = 1, Xt2 = 1) or (Xt1 = 3, Xt2 = 3), the probability of observing
long bars (a) in both row and column is high. For H shapes, when
(Xt1 = 2, Xt2 = 2) the long horizontal bar (a) is observed jointly with
a short bar (b).
To obtain the first coupled architecture, called the state-coupled
model (ST_CPL), we add directed edges between the hidden state
nodes of the vertical and horizontal HMMs as shown in Fig. 6a. The
parameters of ST_CPL are the CPDs bi and CPTs A and U. The conditional probability table A capturing the HMM left-right structure
of the vertical sequence {X 1 } can be written:
1 = j)
Aj,k = P(Xt1 = k | Xt1
k, j [1, Q ],
3096
where Aj,k is a left-right state transition matrix as for classical leftright HMMs. The value k of the current state is either equal to the
value j of the preceding state or to j + 1. For t 2, we write:
1
Y1
1
Y2
1
YT
(3)
the CPD bik is a single Gaussian pdf N(yti ; ik , ik ) as for basic
HMMs. The CPTs U are more complex and of larger size than the
CPTs A: The CPTs U are a set of Q matrices (one for each value of Xt2 )
of size Q Q . Left--right transitions are allowed for state transitions
within stream 2 while ergodic transitions are allowed for state
transitions from stream 1 to stream 2 as shown in Fig. 7(a). All state
values increase through time because of the left--right constraint.
On the other hand, the value of Xt2 can be equal to, greater than or
X1
X2
XT
X1
X2
XT
Y2
YT
Y1
2 = j, X 1 = k) k, j, l [1, Q ],
Uj,k,l = P(Xt2 = l | Xt1
t
less than the value of Xt1 because of the ergodic property (all state
transitions are allowed between Xt1 and Xt2 ). In practise, the values
of Xt2 and Xt1 follow each other as can be seen in Fig. 7(b) during the
decoding of a sample digit, but without predefined order. Sample
values from CPT U of the digit four model are shown in Fig. 8. A
2 , X 1 ) only if
transition can be made to state Xt2 =8 from states (Xt1
t
1
Y1
1
Y2
YT
X1
X2
XT
2
X1
2
X2
2
XT
Y2
YT
Y1
Y1
Y2
X1
X2
X2
X1
Y1
Y2
YT
XT
XT
YT
2 = 7, X 1 = 8).
states (Xt1
t
The initial state distributions 1 and 2 for the vertical and the
horizontal streams, respectively, can be written:
1
(k) = P(X11 = k),
k [1, Q ],
k, j [1, Q ].
(4)
j,k t
= N(yt1 ; 1j,k , 1j,k ),
(5)
2 2
2
2
2
2
2
2
bk (yt ) = P(Yt = yt | Xt = k) = N(yt ; k , k ).
The form of the distribution of the horizontal observations is the
same as in the previous ST_CPL model. The distribution of vertical
observations b1 is a single pdf with mean 1j,k of length d and d d
3097
Fig. 7. (a) Types of transitions (ergodic or left-right) allowed between the different state variables. (b) Example of state value sequences for both streams (horizontal and
vertical) during decoding of digit 4. Horizontal and vertical state values increase through time (left--right assumption) but at time t the value of the horizontal state can be
equal, greater or less than the value of the vertical state.
Table 1
Space and time complexity of independent and coupled models as functions of Q
(number of states) and d (length of observation vectors)
Fig. 8. Sample values from CPT U which includes state transition probabilities to
2
and Xt1 .
state Xt2 from states Xt1
i
and the regression matrix W. This model benefits
observation yt1
from both the predicting abilities of AR- models and the fusion of
observations performed by the coupling through states.
3.3. Complexity
The above architectures, associated to their parameters (CPTs and
CPDs) provide character models. In order to limit the number of parameters and to follow the HMM paradigm, we assume that the matrix A is left-right for all models. We also assume that for each CPT
U (which can couple up to three states together), state transitions
within the same stream are left--right whereas state transitions between different streams can be ergodic (see Section 3.2). Covariance
Model
Cov + Mean
Single HMM
Single AR-HMM
ST_CPL
GNL_CPL
AR_CPL
Qd2 + Qd
Qd2 + Qd
2(Qd2 + Qd)
(Q 2 + Q)(d2 + d)
2(Qd2 + Qd)
Q
Q
Q
Q
Q
U
1
1
1
1
1
W
Qd2
2Q 2
2Q 2
2Q 2
2Qd2
Decoding
O(Qd)
O(Qd)
O(Q 2 d)
O(Q 2 d)
O(Q 2 d)
matrices for Gaussian pdfs may be full matrices. The space complexity for all models is given in Table 1 as a function of the common
number of states Q and of the length d of observation vectors. Because of character size normalization, the length of the observation
sequence is T = d. Since the number of states Q is inferior to the
length of the observation sequences T, Q < T as in classical HMMs,
the coupled model with lowest complexity is ST_CPL: its complexity
is of order O(Qd2 ), similar to that of the AR_CPL model. Though the
AR-coupled model has the most dependence between observations,
the GNL_CPL model is the one with highest space complexity: its
complexity is of order O(Q 2 d2 ), since the dimension of the space of
conditioning states for the observations {Yt1 } is Q Q in this case. The
computational time complexity is dominated by inference. Indeed,
inference complexity depends both on the size of the cliques in the
junction tree and on their number [23,30]. Since all coupled models
share the same number of cliques, only clique sizes may be different. Time complexity is of order O(TQ p+1 ) where p is the maximum
number of parents for hidden state variables in the original graph.
This complexity is reduced by a factor Q in our models because the
state space is reduced in the cliques which include the hidden state
variables: there is always one hidden variable related to its parent in a left--right transition. Inference time complexity is shown in
Table 1 for the decoding (likelihood of an observation set) of a single
character. Our time estimation does not include the computation of
the observation pdfs for all states in each time-slice.
3098
Fig. 9. Pairs of original (left) and degraded (right) characters. Two breaks are created
within each digit (w = 2).
Fig. 10. Sample document from the Renaissance Festival Books collection.
borders are added to training characters in order to deal with intraword spaces. In other cases, white borders can be removed as well
as the states representing each border: models with lower complexity are then obtained by reducing the number of states from Q
to Q 2.
Because some classes had very few samples, we selected the
classes which had enough samples (around 50) within the first three
pages dedicated to training. This led to 16 classes: a, b, c, d, e, i, l,
m, n, o, p, r, s, long s, t and u. Sample characters from standard and
degraded sets are shown in Fig. 11.
3099
Fig. 11. Sample old printed characters from standard and degraded pages.
95
95
Recognition rate (%)
90
85
80
vertical
horizontal
75
70
65
90
85
80
vertical
horizontal
75
70
10
14
18
65
22
Coupled DBNs
18
22
100
Recognition rate (%)
14
AR Coupled DBN
100
95
90
STCPL
GNLCPL
85
80
10
Number of states
Number of states
10
14
18
22
Number of states
95
90
85
80
ARCPL
10
14
18
22
Number of states
Fig. 12. Performance of independent and coupled models according to the number of states.
3100
with a training set of 4000 digits and a test set of 1000 digits. The
value of Q ranges from 6-state models to 22-state models. Results
in Fig. 12 show that recognition performance increases with Q until
reaching a maximum for Q = 14 or 18. There is no improvement for
values of Q > 18 and a slight improvement is obtained for coupled
models by increasing Q from Q =14 to 18. The price for this improvement is higher space and time complexity: in the following, we set
Q = 14 which offers the best compromise between complexity and
performance.
Fig. 12 also shows that the best performances are always reached
by the AR_CPL model whatever the number of states Q. This shows
the superiority of the AR_CPL model over all independent and other
coupled models (see also Section 5.1).
For training digit models, we used a subset S of 5000 samples (500
per class) from the MNIST training database. Then, we conducted
cross validation experiments in the following way: the subset S was
split into F = 5 sets of 1000 characters. Each DBN architecture was
trained on F 1 sets and tested on the remaining set, F times. Within
each model, the parameters yielding the best cross-validation recognition performance were selected for testing. Testing was performed
on the 10 000 digits of the MNIST test set.
For training old printed character models, 50 characters per class
were selected from the first three standard pages. Then character
models were tested on two test sets: a standard test set (test-s) from
the two remaining standard pages and a degraded test set (test-d)
from the two degraded pages. The standard and degraded test sets
include 1009 and 1079 characters, respectively, for the 16 classes
considered.
During recognition, each character was assigned to the class with
the highest log-likelihood value. We use the BayesNet toolbox [35]
which provides general MatLab source code for training and inference in static and dynamic Bayesian networks.
5. Experimental results
We evaluate the various DBN architectures, independent and
coupled, for the problem of the recognition of degraded characters.
Two off-line recognition tasks are considered. We first evaluate DBN
architectures for the recognition of artificially broken handwritten
digits: their robustness to degradation is evaluated against different degradation parameters. We also test these architectures for the
recognition of real degraded printed characters.
5.1. Handwritten digits
5.1.1. Comparison between independent and coupled architectures
Independent and coupled architectures are first tested on the set
of artificially broken digits. We consider three levels of degradation:
no additional degradation using the original MNIST test set (w = 0),
one break created within digits (w=1) and two breaks created (w=2).
Recognition accuracies for each model are given in Table 2.
For each degradation level, vertical models perform better than
horizontal ones within each type of independent models (HMM/AR).
This means that columns of character images are more discriminating than rows for handwritten digits. This is also observed for
old printed characters: there is a predominance of vertical strokes
for these forms of letters and digits [28]. Comparing between independent models, the auto-regressive vertical model performs better
than the basic vertical-HMM. This is due to the fact that the basic
HMM assumes conditional independence of observation variables
with respect to hidden states, whereas the AR model assumes explicit dynamic dependence between observations. For horizontal independent models, performances of the basic horizontal-HMM and
the horizontal-AR model are comparable. The prediction of image
rows is less efficient than the prediction of image columns.
Table 2
Recognition rates (%) for handwritten digits under different levels of degradation
(w = 0: no additional degradation, w = 1: one break, w = 2: two breaks)
Model
w=0
w=1
w=2
Vertical-HMM
Horizontal-HMM
Vertical-AR
Horizontal-AR
ST_CPL
GNL_CPL
AR_CPL
Combination of HMM scores
Combination of AR-HMM scores
SVM
90.2
87.4
93.2
87.7
92.4
93.4
94.9
93.1
94.7
96.1
86.9
82.8
89.8
81.6
90.8
90
93.4
90.6
91.9
91.1
83.8
75.3
85.3
75.6
87.4
86.2
90.9
87
89
85.4
100
3101
95
90
Model
Standard (test-s)
Degraded (test-d)
Vertical-HMM
Horizontal-HMM
Vertical-AR
Horizontal-AR
ST_CPL
GNL_CPL
AR_CPL
Comb. of HMM scores
Comb. of AR-HMM scores
SVM
98.3
93.7
97.9
96.2
98.7
98.6
98.8
98.4
98.7
98.4
93.8
88.1
94.5
91.2
95.5
94
96
95.4
95.5
94.9
100
85
0.2
0.3
0.7
0.8
0.9
1
96
100
98
96
0.1
Fig. 13. Combination of HMM scores: recognition rate versus weight factor .
98
94
92
90
88
86
94
84
92
82
90
80
tests
88
86
testd
Degradation Level
testh
Fig. 15. Comparison of the AR_CPL model with the combination of AR-HMM scores
for old printed characters and several degradation levels.
84
82
80
0
1
2
Degradation Level
Fig. 14. Comparison of the AR_CPL model with the combination of AR-HMM scores
according to degradation.
more rapidly on the degraded set because defects in the vertical observation disturb both horizontal and vertical state sequences.
The auto-regressive coupled model (AR_CPL) always performs
better than independent models, the state-coupled model and the
combination of HMM scores for which the optimal value of = 0.65
was found on a validation set of 1038 characters (Fig. 13). As before,
the ST_CPL and AR_CPL coupled architectures better cope with degraded characters (test-d) than independent HMMs (basic and AR)
and the combination of HMM scores.
5.2.2. Comparison with the combination of auto-regressive HMM scores
For the combination of auto-regressive HMMs (AR-HMMs), the
optimal weight searched on the validation set is equal to = 0.4.
Recognition accuracies are given in Table 3. The AR_CPL model performs better than the combination of AR-HMMs whatever the level
of degradation. However, several classifiers have high performance
on the set of non-degraded characters (test-s): coupled DBN classifiers and the combination of AR-HMM scores perform accurately on
such characters. The improvement brought by the AR_CPL model is
enhanced for degraded characters. To highlight this property, an additional set of highly degraded characters is provided by lowering
the binarization threshold of degraded characters by 20%. This leads
to the test h set (highly degraded) including highly fading characters. Recognition accuracies are compared in Fig. 15 for all test sets.
The improvement brought by the AR_CPL model over the combination of AR-HMM scores increases as degradation increases as seen
previously for handwritten digits (see Section 5.1.3).
When characters are broken due to natural fading process or
low binarization threshold, the AR_CPL model performs better than
the other models. This coupled architecture is thus particularly
3102
References
[1] L.R. Rabiner, A tutorial on hidden Markov models and selected applications in
speech recognition, Proc. IEEE 77 (2) (1989) 257--286.
[2] R. Plamondon, S. Srihari, On-line and off-line handwriting recognition: a
comprehensive survey, IEEE PAMI 22 (1) (2000) 63--84.
[3] C. Bahlmann, H. Burkhardt, The writer independent online handwriting
recognition system frog on hand and cluster generative statistical dynamic time
warping, IEEE PAMI 26 (3) (2004) 299--310.
[4] M. Schenkel, M. Jabri, Low resolution degraded document recognition using
neural networks and hidden Markov models, Pattern Recogn. Lett. 19 (1998)
365--371.
[5] A. Senior, A. Robinson, An off-line cursive handwriting recognition system, IEEE
PAMI 20 (3) (1998) 309--321.
[6] A. Vinciarelli, S. Bengio, H. Bunke, Offline handwriting recognition of
unconstrained handwritten texts using HMMs and statistical language models,
IEEE PAMI 26 (6) (2004) 709--720.
[7] J.-C. Anigbogu, A. Belaid, Recognition of multifont text using Markov models,
In: Proceedings of the Seventh Scandinavian Conference on Image Analysis,
Aalborg (Denmark), 1991, pp. 469--476.
[8] H.S. Park, S. Lee, Off-line recognition of large-set handwritten characters with
multiple hidden Markov models, Pattern Recogn. 31 (1998) 1849--1864.
[9] N. Arica, F.T. Yarman-Vural, Optical character recognition for cursive
handwriting, IEEE PAMI 24 (6) (2002) 801--813.
[10] H. Baird, The state of the art of document image degradation modeling, in:
Proceedings of the Fourth Workshop on Document Analysis Systems, DAS, Rio
de Janeiro, 2000, pp. 1--16.
[11] A. Whichello, H. Yan, Linking broken character borders with variable sized
masks to improve recognition, Pattern Recogn. 29 (8) (1996) 1429--1435.
[12] B. Allier, N. Bali, H. Emptoz, Automatic accurate broken character restoration
for patrimonial documents, IJDAR 8 (4) (2006) 246--261.
[13] A. Antonacopoulos, D. Karatzas, Document image analysis for World War II
personal records. In: First International Workshop on Document Image Analysis
for Libraries, DIAL 04, Palo Alto, 2004, pp. 336--341.
[14] M. Droettboom, Correcting broken characters in the recognition of historical
printed documents, in: Proceedings of Joint Conference on Digital Libraries,
JCDL'03, 2003.
[15] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining classifiers, IEEE PAMI 3 (20)
(1998) 226--239.
[16] W. Wang, A. Brakensiek, G. Rigoll, Combining HMM-based two pass classifiers
for off-line word recognition, in: Proceedings of ICPR, Quebec, 2002,
pp. 151--154.
[17] A.J. Elms, S. Procter, J. Illingworth, The advantage of using an HMM based
approach for faxed word recognition, IJDAR 1 (1998) 18--36.
[18] K. Hallouli, L. Likforman-Sulem, M. Sigelle, A comparative study between
decision fusion and data fusion in Markovian printed character recognition, in:
Proceedings of ICPR, Quebec, 2002, pp. 147--150.
[19] X. Xiao, G. Leedham, Signature verification using a modified Bayesian network,
Pattern Recogn. 35 (2002) 983--995.
[20] S. Cho, J. Kim, Bayesian Network modeling of hangul characters for on-line
handwriting recognition, In: Proceedings of ICDAR, 2003, pp. 297--211.
[21] R. Sicard, T. Artieres, E. Petit, Modeling on-line handwriting using pairwise
relational features, In: Proceedings of IWFHR, La Baule, 2006.
[22] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference, second ed, Morgan Kaufman, Los Altos, CA, 1988.
[23] G. Zweig, Bayesian network structures and inference techniques for automatic
speech recognition, Comput. Speech Language 17 (2003) 173--193.
[24] K. Daoudi, D. Fohr, C. Antoine, Dynamic Bayesian networks for multi-band
automatic speech recognition, Comput. Speech Language 17 (2003) 263--285.
[25] M. Brand, N. Oliver, A. Pentland, Coupled hidden Markov models for complex
action recognition, in: Proceedings of the IEEE Conference CVPR 97, 1997,
pp. 994--999.
[26] N. Friedman, D. Koller, Being Bayesian about network structure: a Bayesian
approach to structure discovery in Bayesian networks, Mach. Learning 2001,
pp. 201--210.
[27] J.D. Hamilton, Analysis of time series subject to changes in regime, J. Econometr.
45 (1990) 39--70.
[28] C. Sirat, Handwriting and the writing hand, in: W.C. Watt (Ed.), Writing
Systems and Cognition: Perspectives from Psychology, Physiology, Linguistics,
and Semiotics, Kluwer Academic Publishers, Dordrecht, 1994, pp. 375--459.
[29] A. Tonazzini, S. Vezzosi, L. Bedini, Analysis and recognition of highly degraded
printed characters, IJDAR 6 (2003) 236--247.
[30] G. Bilmes, Dynamic Bayesian multinets, in: UAI '00: Proceedings of the
16th Conference in Uncertainty in Artificial Intelligence, Stanford, CA, 2000,
pp. 38--45.
[31] British Library, British Library Digitised Festival Books. Available on the web
at: http://www.bl.uk/treasures/festivalbooks/homepage.html.
[32] Y. LeCun, C. Cortes, The MNIST handwritten digit database, 1998. Available on
the web at: http://yann.lecun.com/exdb/mnist/.
[33] H. Baird, Document image defect models, in: H.S. Baird, H. Bunke, K. Yamamoto
(Eds.), Structured Document Image Analysis, Springer, New York, 1992,
pp. 546--556.
[34] S. Procter, A.J. Elms, J. Illingworth, A method for connected hand-printed
numeral recognition using hidden Markov models, in: IEE European Conference
on Handwriting Analysis and Recognition, Brussels, 1998.
[35] K. Murphy, BayesNet Toolbox for Matlab, 2003. Available on the web at:
http://www.ai.mit.edu/murphyk/Bayes/bnintro.html.
[36] C.-L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten digit recognition:
benchmarking of state-of-the-art techniques, Pattern Recogn. 36 (2003)
2271--2285.
3103
[37] K.-M. Lin, C.-J. Lin, A study on reduced support vector machines, IEEE Trans.
Neural Networks 14 (2003) 1449--1559.
[38] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, 2001.
Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm.
About the Author---LAURENCE LIKFORMAN-SULEM is graduated in engineering from ENST-Bretagne (Ecole Nationale Suprieure des Tlcommunications) in 1984,
and received her PhD from ENST-Paris in 1989. She is Associate Professor at TELECOM Paris Tech (former ENST) in the Department of Signal and Image Processing where
she serves as a senior instructor in Pattern Recognition and Document Analysis.
Her research area concerns document analysis dedicated to handwritten and historical documents, document image understanding and character recognition. Laurence
Likforman and co-researchers won the first place at the ICDAR'05 Competition on Arabic handwritten word recognition.
Laurence Likforman is a founding member of the francophone GRCE (Groupe de Recherche en Communication Ecrite), association for the development of research activities
in the field of document analysis and writing communication. She chaired the program committee of the last CIFED (Conference Internationale Francophone sur l'Ecrit et le
Document) held in Fribourg (Switzerland) in 2006.
About the Author---MARC SIGELLE was born in Paris in 1954. He graduated from Ecole Polytechnique Paris in 1975 and from Ecole Nationale Suprieure des
Tlcommunications Paris in 1977. In 1993 he obtained a PhD from Ecole Nationale Suprieure des Tlcommunications. He worked first at Centre National d'Etudes des
Tlcommunications in Physics and Computer algorithms. Since 1989 he has been working in image and more recently in speech processing at Ecole Nationale Suprieure
des Tlcommunications.
His main fields of interests are restoration and segmentation of signals and images with Markov Random Fields (MRFs), hyperparameter estimation methods and relationship
with Statistical Physics. His work has been first devoted to blood vessel reconstruction in angiographic images and then to the processing of remote sensed satellite and
synthetic aperture radar images. His most recent interests deal with a MRF approach to image restoration using level sets for Total Variation and its extensions. He is also
devoted to speech and character recognition using MRFs and Bayesian Networks. M. Sigelle is IEEE Senior Member since Fall 2003.