Towards Compact and
Tractable Automaton-Based Representations
of Time Granularities
Ugo Dal Lago1 , Angelo Montanari2 , and Gabriele Puppis2
1
Dipartimento di Scienze dell’Informazione
Università di Bologna
Mura Anteo Zamboni 7, 40127 Bologna, Italy
dallago@cs.unibo.it
2
Dipartimento di Matematica e Informatica
Università di Udine
via delle Scienze 206, 33100 Udine, Italy
{montana,puppis}@dimi.uniud.it
Abstract. Different approaches to time granularity have been proposed
in the database literature to formalize the notion of calendar, based on
algebraic, logical, and string-based formalisms. In this paper, we further
develop an alternative approach based on automata, originally proposed
in [4], which makes it possible to deal with infinite time granularities in
an effective (and efficient) way. In particular, such an approach provides
an effective solution to fundamental problems such as equivalence and
conversion of time granularities. We focus our attention on two kinds
of optimization problems for automaton-based representations, namely,
computing the smallest representation and computing the most tractable
representation, that is, the one on which crucial algorithms (e.g., granule
conversion algorithms) run fastest. We first introduce and compare these
two minimization problems; then, we give a polynomial time algorithm
that solves the latter.
1
Introduction
The notion of time granularity comes into play in a variety of problems involving time representation and management in database applications, including
temporal database design, temporal data conversion, temporal database interoperability, temporal constraint reasoning, data mining, and time management
in workflow systems. Different approaches to time granularity have been proposed in the database literature, based on algebraic [1,9], logical [3], and stringbased [11] formalisms. We restrict our attention to the latter.
The string-based formalism eases access to and manipulation of data associated with different granularities, making it possible to solve some basic problems
about time granularities, such as the equivalence problem, in an effective way.
String-based algorithms, however, may potentially process every element (symbol) of representations, independently from their redundancy, thus requiring a
large amount of computational time. This efficiency problem is dealt with by
C. Blundo and C. Laneve (Eds.): ICTCS 2003, LNCS 2841, pp. 72–85, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Towards Compact and Tractable Automaton-Based Representations
73
the automaton-based approach to time granularity, that revises and extends the
string-based one.
According to such an approach, granularities are viewed as strings generated
by a specific class of automata, called Simple Single-String Automata (Simple
SSA for short), thus making it possible to (re)use well-known results from automata theory. Simple SSA were originally proposed by Dal Lago and Montanari
to model infinite periodical granularities [4]. Furthermore, they showed that regularities of modeled granularities can be naturally expressed by extending Simple
SSA with counters (let us call SSA the resulting class of automata). This extension makes the structure of the automata more compact, and it allows one to
efficiently deal with those granularities which have a quasi-periodic structure.
In [5], we proved that SSA provide an efficient solution to the fundamental
problems of equivalence, namely, the problem of establishing whether two different representations define the same granularity, and granule conversion, namely,
the problem of relating granules of a given granularity to those of another one.
To this end, we introduced a suitable variant of SSA, called Restricted Labeled
Single-String Automata (RLA for short), and we showed that these automata
are at least as expressive as the string-based formalism, better fitting for direct
algorithmic manipulation. As an example, granule conversion problems can be
solved in polynomial time with respect to the size of the involved RLA.
The algorithmic flavor of automaton-based representations of time granularity suggests an alternative point of view on their role: RLA can be used not only
as a formalism for the direct specification of time granularities, but also as a
low-level formalism into which high-level time granularity specifications can be
mapped. From this point of view, the problem of reducing as much as possible
the complexity of basic algorithms becomes even more crucial. In [5], we defined
a suitable set of algorithms mapping expressions of Calendar Algebra (the highlevel formalism for modeling time granularities developed by Ning et al. in [9])
to equivalent RLA-based representations. In this paper, we focus our attention
on minimization problems for RLA.
There exist at least two possible notions of minimization. According to the
first one, minimizing means computing the smallest representation of a given
time granularity; according to the second one, minimizing means computing
the most tractable representation of a given granularity, that is, the one on
which crucial algorithms run fastest. The former kind of automaton-based representation is called a size-optimal representation, while the latter is called a
complexity-optimal representation. These two criteria are clearly not equivalent,
since the smallest representation is not necessarily the most tractable one, and
vice versa. Furthermore we claim that both problems yield non-unique solutions.
In the following, we tackle the complexity-minimization problem by using dynamic programming: we state some closure properties of RLA with respect to
concatenation, iteration, and repetition of words, and we show how to compute
complexity-optimal automata from smaller (optimal) ones in a bottom-up fashion. The resulting algorithm runs in polynomial time with respect to the size of
the string-based description of the involved granularity.
74
Ugo Dal Lago, Angelo Montanari, and Gabriele Puppis
The rest of the paper is organized as follows. In Section 2, we give a definition of time granularity and we briefly describe the main features of Wijsen’s
string-based formalism, which represents regular granularities by means of (encodings of) ultimately periodic words. In Section 3 we focus our attention on
the automaton-based approach to time granularity. We define RLA and we state
some basic properties of them. In Section 4 we briefly describe some polynomial
algorithms which can be used to efficiently solve the equivalence and granule conversion problems for RLA-based representations of time granularies. In Section
5 we introduce the size-minimization and complexity-minimization problems,
we point out important aspects about their solutions, and we give an intuitive
explanation of the computation of complexity-optimal automata. In Section 6,
we discuss the details of the proposed solution; in particular, we show how a
complexity-optimal automaton recognizing a given ultimately periodic word can
be effectively built up from a suitable representation of the repetitions of the
word. In Section 7 we outline future research directions, with a special emphasis
on possible improvements on the proposed complexity-minimization algorithm
and on promising strategies to efficiently solve the size-minimization problem.
(Reference [6] is an extended version of this work, including all proof details.)
2
The String-Based Model of Time Granularities
Since in many applications different time granularities can be used to specify
the validity intervals of different facts [1], database systems need the ability
of properly relating granules belonging to different time granularities. Such an
ability presupposes the formalization of the notion of granularity. In this section,
we first give a formal definition of time granularity, which captures a reasonably
large class of temporal structures; then, we specialize such a definition in order
to allow a finite representation and an efficient manipulation of the associated
data.
Definition 1. Given a set T of temporal instants and a total order < on T , a
time granularity on the temporal domain (T, <) is a total function G : Z → 2T
such that, for every pair of integers x and y, x < y implies
∀ tx ∈ G(x). ∀ ty ∈ G(y). tx < ty .
Each non-empty set G(x), with x ∈ Z, is called a granule and each integer in the
set {x ∈ Z : G(x) = ∅} is called a label. Note that Definition 1 captures both
time granularities that cover the entire temporal domain, such as Day, Week, and
Month, and time granularities with gaps within and between granules, like, for
instance, BusinessDay, BusinessWeek, and BusinessMonth. Figure 1 depicts
some of these granularities.
In the following, we assume granularity labels to belong to the set N+ (as
a matter of fact, most applications assume the existence of a first granule). It
is immediate to see that the set of all functions satisfying Definition 1 becomes
uncountable as soon as the underlying temporal domain becomes infinite. As
Towards Compact and Tractable Automaton-Based Representations
75
Fig. 1. Some examples of time granularities.
a consequence, it is not possible to deal with all of them by means of a finitary formalism. However, the problem of mastering temporal structures for time
granularity can be tackled in an effective way by restricting to periodical granularities. In [11], Wijsen shows that such granularities can be naturally expressed
in terms of ultimately periodic words over an alphabet of three symbols, namely,
(filler), (gap), and ≀ (separator), which are respectively used to denote time
points covered by some granule, to denote time points not covered by any granule, and to delimit granules. In the following, we assume the reader to be familiar
with basic terminology and notation on finite and infinite strings [10]. In particular, we will often write a generic string u as u[1]u[2]u[3] . . ., where u[i] denotes
the i-th element of the string, and we will use the notation u[i, j] to denote the
substring u[i]u[i+1] . . . u[j] of u. Furthermore, given a finite set S, we will denote
by S ∞ the set S ω ∪ S ∗ , where S ω (respectively, S ∗ ) stands for the set of all and
only the infinite (respectively, finite) strings over S.
Definition 2. Given a word u ∈ {, , ≀}ω containing infinitely many occurrences of non-separator symbols, we say that u represents G if, for every pair
of positive integers x and y, x ∈ G(y) if and only if u[x + y − 1] = and
u[1, x + y − 2] contains exactly y − 1 occurrences of ≀.
As an example, the infinite word ≀ ≀ . . . represents the
granularity BusinessWeek over the temporal domain of days.
In order to finitely model time granularities, Wijsen introduces the notion
of granspec. A granspec is an ordered pair (u, v) of finite strings such that
u, v ∈ {, , ≀}∗ and v contains at least one occurrence of a non-separator symbol. Strings u and v are respectively called the prefix and the repeating pattern of
the ultimately periodic string u·v ω representing the (periodical) time granularity.
For instance, the granularity BusinessWeek ≀ ≀ . . . can
be encoded by the granspec (ε, ≀). Furthermore, to solve the equivalence problem for (representations of) time granularities, Wijsen proposed a
suitable canonical form of granspecs, which turns out to be a sort of minimum
representation of periodical granularities. However, it is worth mentioning that,
whenever the granularity to be represented has a long period and/or a long
prefix, the granspec formalism produces lengthy canonical granspecs. As a consequence, computations on time granularities represented by granspecs may take
a great deal of time. For example, if (u, v) is a granspec representing months of
the Gregorian Calendar in terms of days, we have that |u| + |v| ≥ 146097. In
76
Ugo Dal Lago, Angelo Montanari, and Gabriele Puppis
the following section, we introduce the automaton-based approach, which yields
more succinct representations of time granularities.
3
From Strings to Automata
The idea of viewing granularities as ultimately periodic strings naturally connects time granularity to the fields of formal languages and automata, because
any ω-regular language is uniquely determined by its ultimately periodic words
[2]. The basic idea underlying the automaton-based approach to time granularity is simple: we take an automaton M recognizing a single word u ∈ {, , ≀}ω
and we say that M represents granularity G if and only if u represents G. In
the following, we introduce Restricted Labeled Single-String Automata (RLA
for short), which differ from finite automata and Büchi automata as they accept
single words instead of sets of words. As a matter of fact, RLA can also be viewed
as a variant of SSA [4], in which counters over discrete domains are exploited to
obtain succinct representations of time granularities.
Before formalizing the notion of RLA, we give an intuitive explanation of
the structure and behavior of automata belonging to this class. In order to
simplify the notation and the formalization of useful properties, RLA label states
instead of transitions. The set of states of an RLA, denoted by S, is partitioned
into two groups, respectively denoted by SΣ and Sε . SΣ is the set of states
where the labeling function is defined, while Sε is the set of states where it is
not defined. Furthermore, there are two kinds of transitions, respectively called
primary and secondary transitions. Intuitively, primary transitions are defined in
the standard way, while secondary transitions have been introduced to succinctly
represent repetitions. At any point of the computation, at most one (primary or
secondary) transition is taken according to an appropriate rule envisaging the
state at which the automaton lies and the value of a counter.
Figure 2 depicts two RLA, that respectively recognize the words (6 ≀)ω and
( ≀ (≀)6 )ω , both representing mondays in terms of days (the former associates
the labels 1, 2, 3, . . . with the granules, while the latter associates the labels
1, 8, 15, . . . with them). States in SΣ are represented by labeled circles, while
states in Sε are represented by triangles. Primary and secondary transitions are
represented by continuous and dashed arrows, respectively. The initial state is
identified by a little triangular tip. The (initial values of) counters are associated
with states in Sε . This simple example provides an intuitive idea of how RLA
allow one to compactly encode repeating patterns in granularities by means of
counters and transitions.
Definition 3. A Restricted Labeled Single-String Automaton is an 8-tuple
M = (SΣ , Sε , Σ, Ω, δ, γ, s0 , C0 ), where
• SΣ and Sε are disjoint finite sets of states (let S = SΣ ∪ Sε );
• Σ is a finite alphabet;
• Ω : SΣ → Σ is the labeling function;
• δ : S ⇀ S, is a partial function, called primary transition function;
Towards Compact and Tractable Automaton-Based Representations
77
Fig. 2. Two RLA that represent mondays in terms of days.
• γ : Sε → S is a total function, called secondary transition function, such
that:
i) for every s ∈ Sε , (γ(s), s) belongs to the reflexive and transitive closure
δ ∗ of δ; the least n ∈ N such that (γ(s), s) ∈ δ n is called the γ-degree of
s and ΓM ⊆ Sε × S is a relation such that (s, r) ∈ ΓM iff r = δ i (γ(s))
with i less than or equal to the γ-degree of s;
∗
ii) the reflexive and transitive closure ΓM
of ΓM must be antisymmetric;
• s0 ∈ S is the initial state;
• C0 : Sε → N is the initial valuation.
Conditions i) and ii) on the secondary transition function enforce the existence of
∗
on states of M . Such an order immediately suggests an induca partial order ΓM
tion principle, called γ-induction, which may be used in both formal definitions
and proofs.
The definition of the computation of an RLA is based on the notion of configuration. For any finite set S of states, a valuation C on S is any function
C : S → N. In the following, we denote as CM the class NSε of all the valuations
for the counters of an RLA M = (SΣ , Sε , Σ, Ω, δ, γ, s0 , C0 ). A configuration is a
pair (s, C), with s ∈ S and C ∈ CM . The transitions of M are taken according
to a partial function ∆M : S × CM ⇀ S × CM such that ∆M (s, C) = (t, D) if
and only if one of the following three conditions holds:
• s ∈ SΣ ∧ t = δ(s) ∧ ∀ r. D(r) = C(r) (namely, whenever the automaton
lies in a labeled state, then it always makes a primary transition);
• s ∈ Sε ∧ C(s) = 0 ∧ t = γ(s) ∧ D(s) = C(s) − 1 ∧ ∀ r = s. (D(r) =
C(r)) (namely, whenever the automaton lies in a non-labeled state and the
corresponding counter is positive, then it makes a secondary transition);
• s ∈ Sε ∧ C(s) = 0 ∧ t = δ(s) ∧ D(s) = C0 (s) ∧ ∀ r = s. (D(r) =
C(r)) (namely, whenever the automaton lies in a non-labeled state and the
corresponding counter is 0, then it makes a primary transition and it reinitializes the counter).
The computation of M is the maximum (possibly infinite) sequence ρ ∈ (S ×
CM )∞ such that ρ[1] = (s0 , C0 ) and ∆M (ρ[i]) = ρ[i + 1] for every i ≥ 1. From
∞
the computation ρ of M , it is easy to extract a sequence of states ρΣ ∈ SΣ
by discarding states belonging to Sε and valuations. We say that M recognizes
the word u if and only if u = Ω(ρΣ ), where Ω(ρΣ ) is the sequence obtained by
applying the labeling function Ω to (each element of) the sequence of states ρΣ .
Thus, RLA recognize either finite words or ultimately periodic words (namely,
78
Ugo Dal Lago, Angelo Montanari, and Gabriele Puppis
those words which result from concatenating a finite prefix to a repeating pattern). Note that the computation of M may be an infinite sequence, even if the
recognized word is finite. However we can overcome this clumsy situation by
discarding useless states and transitions of RLA.
As already mentioned, the main feature of RLA is the way they encode
repeating patterns of words. As a matter of fact, it is possible to provide a
formal characterization of the words recognized by RLA in terms of repetitions
of smaller substrings. Precisely, given a RLA M = (SΣ , Sε , Σ, Ω, δ, γ, s0 , C0 ),
one can show that it recognizes the word
u = Ω ρs0 · ρδ(s0 ) · ρδ2 (s0 ) · . . .
where each ρs is defined to be
• s, whenever s ∈ SΣ ;
• (ργ(s) · ρδ(γ(s)) · ρδ2 (γ(s)) · . . . · ρδn−1 (γ(s)) )C0 (s) , where n is the γ-degree
of s, whenever s ∈ Sε .
As a consequence, any word recognized by an RLA can be represented using
expressions as (5 2 ≀)ω , ((2 )2 ≀)ω , . . . denoting nested repetitions.
4
Granularity Equivalence and Granule Conversions
In this section, we briefly discuss the equivalence and the granule conversion
problems. The decidability of the former problem implies the possibility of effectively testing the semantic equivalence of two descriptions, making it possible to
use smaller, or more tractable, representations in place of bigger, or less tractable,
ones. The relevance of the granule conversion problem has been advocated by
several authors, e.g., [1], even though in most solutions it has been only partially
worked out in a rather complex way.
To explain our solutions, we first address a simpler problem, which arises
very often when dealing with representation of time granularities as well as with
infinite strings in general, namely, the problem of finding the n-th occurrence of
a given symbol in a string. From the point of view of the theory of automata,
this problem can obviously be solved in linear time with respect to the number of transitions needed to reach the n-th occurrence of the symbol: it suffices
to follow the transitions of the automaton (of the RLA in our case) until the
n-th occurrence of the symbol is recognized. Nevertheless, we can improve this
straightforward solution by taking advantage of the definition of RLA. For instance, if we are searching for an occurrence of a symbol a ∈ Σ in the word u
recognized by the RLA M = (SΣ , Sε , Σ, Ω, δ, γ, s0 , C0 ) and Ω(ρs0 ) contains no
occurrences of symbol a, then we can avoid processing the first |ρs0 | symbols
in u. Similarly, if s0 ∈ Sε and Ω(ρs0 ) contains at least an occurrence of a, but
Ω(ργ(s0 ) ) does not, then we can start searching for an occurrence of a in u from
the position (1 + |ργ(s0 ) |). For every state s ∈ S and every symbol a ∈ Σ, the
length of ρs and the number of occurrences of a in Ω(ρs ) can be computed in
polynomial time with respect to the number of states by exploiting the definition
Towards Compact and Tractable Automaton-Based Representations
79
of ρs . Furthermore, such values can be pre-computed and stored into appropriate data structures for M . On the grounds of the above observations, we can
define an algorithm, called SeekAtOccurrence, which returns the configuration
reached by simulating transitions of M from a given configuration (s, C) until
the n-th occurrence of a symbol in a distinguished set A ⊆ Σ has been read. As
a side effect, SeekAtOccurrence(M, s, C, A, n, counter ) returns in counter [a] the
number of processed occurrences for each symbol a ∈ Σ.
In spite of the simplicity of this idea, SeekAtOccurrence turns out to be rather
complex and the formal analysis of its complexity is even more involved [5]. However, it is not difficult to show that the worst-case time for SeekAtOccurrence(M,
s, C, A, n, counter ) is asymptotically equivalent to a suitable complexity measure,
defined in terms of the nesting structure of the transition functions of M . We
use M to denote such a measure, which is defined, according to the principle
of γ-induction, as follows:
∗
M = max {CM
s0 ,t : (s0 , t) ∈ δ },
where, for each pair of states (s, t) ∈ δ ∗ , CM
s,t is defined to be
• 1, if s = t;
• 1 + CM
δ(s),t , if s ∈ SΣ and s = t;
M
• max {1 + CM
δ(s),t , Cγ(s),s }, if s ∈ Sε and s = t.
As for relationships between the complexities of automaton-based and stringbased representations, there exist a number of cases that account for the compactness and tractableness of RLA with respect to granspecs. As an example, it
is not difficult to provide an RLA representing the granularity Month in terms of
days and having complexity 520, which is significantly less than the size of any
equivalent granspec.
We now give an intuitive account of how to decide whether or not two
given RLA represent the same granularity. Details of the algorithm, which exploits noticeable properties of equivalent representations and extensively uses
SeekAtOccurrence, are given in [5]. The basic ingredients are the following ones.
First, it holds that two RLA M and N represent the same granularity if and
only if ultimately periodic words u and v, recognized respectively by M and N
and having prefix lengths pu and pv and period lengths qu and qv , are G-aligned.
Two ultimately periodic words u and v are said to be G-aligned if and only if
all occurrences of the filler symbol in u and v lie at the same positions and are
interleaved by the same number of occurrences of the separator symbol. Such a
characterization of equivalent representations can be exploited by showing that
a sufficient condition for u and v to be G-aligned is that two prefixes of u and v
(not shorter than max (pu + qu , pv + qv ) + lcm(qu , qv )) are G-aligned. Algorithm
SeekAtOccurrence can be used to check the G-alignment property on words recognized by two RLA M and N in time O(( M + N )n), where n bounds the
number of occurrences of in the prefix and period of u and v.
Consider now the problem of converting temporal intervals from a given granularity to a coarser or finer one. RLA can be exploited to solve many conversion
80
Ugo Dal Lago, Angelo Montanari, and Gabriele Puppis
problems in polynomial time with respect to the number of states of the involved automata. In particular, we can define two functions mapping intervals
of temporal points to intervals of labels of a given granularity covering the input
interval, and vice versa. It is worth pointing out that such functions are similar
to the conversion operators introduced by Snodgrass et al. [7] and that they
can be computed on RLA by exploiting the algorithm SeekAtOccurrence. As an
example, the following two algorithms solve conversion problems by requiring
only a finite number of calls to SeekAtOccurrence. It is not difficult to show that
such algorithms, as well as many others which compute similar functions, can
be executed in time O( M ), where M is the RLA representing the involved
granularity.
UpConversion(M, t1 , t2 )
1: let M = (SΣ , Sε , Σ, Ω, δ, γ, s0 , C0 )
2: (s, C) ← (s0 , C0 )
3: SeekAtOccurrence(M, s, C, {, }, t1 − 1, counter 1 )
4: SeekAtOccurrence(M, s, C, {}, 1, counter 2 )
5: x1 ← counter 1 [≀] + counter 2 [≀] + 1
6: (s, C) ← (s0 , C0 )
7: SeekAtOccurrence(M, s, C, {, }, t2 , counter 3 )
8: (s, C) ← (s0 , C0 )
9: SeekAtOccurrence(M, s, C, {}, counter 3 [], counter 4 )
10: x2 ← counter 4 [≀]
11: return (x1 , x2 )
DownConversion(M, x1 , x2 )
1: let M = (SΣ , Sε , Σ, Ω, δ, γ, s0 , C0 )
2: (s, C) ← (s0 , C0 )
3: SeekAtOccurrence(M, s, C, {≀}, x1 − 1, counter 1 )
4: SeekAtOccurrence(M, s, C, {}, 1, counter 2 )
5: t1 ← counter 1 [] + counter 2 [] + counter 1 [] + counter 2 []
6: (s, C) ← (s0 , C0 )
7: SeekAtOccurrence(M, s, C, {≀}, x2 , counter 3 )
8: (s, C) ← (s0 , C0 )
9: SeekAtOccurrence(M, s, C, {}, counter 3 [], counter 4 )
10: t2 ← counter 4 [] + counter 4 []
11: return (t1 , t2 )
5
Optimality of Automaton-Based Representations
In the previous section we briefly summarized the main features of two basic algorithms working on RLA M , whose worst-case time complexity linearly depends
on M . It immediately follows that it is worth to minimize M . Furthermore,
there exists a widespread recognition of the fact that state minimization is a
crucial problem in classical automata theory as well as in the theory of reactive systems. Another goal of practical interest is thus the minimization of the
Towards Compact and Tractable Automaton-Based Representations
81
Fig. 3. Size-optimal and complexity-optimal automata.
number of states of M (let us denote it by |M |), so that smaller representations,
in place of bigger ones, can be used. The former problem is called complexityminimization problem, while the latter is called size-minimization problem.
Size and complexity of an RLA are obviously related one to the other; however, corresponding problems are not equivalent at all. In particular, the sizeminimization problem seems to be harder than the complexity-minimization
problem and it will not be discussed in detail in this paper. Furthermore, optimal automata are not guaranteed to be unique (up to isomorphisms) as it
happens, for instance, for Deterministic Finite Automata. As an example, Figure 3 depicts two size-optimal automata (M and N ) and two complexity-optimal
automata (M and O) recognizing the finite word .
Automata minimization problems can be addressed in many different ways,
e.g., by partitioning the state space or by exploiting noticeable relations between
automata and expressions encoding recognized words. In this paper, we cope with
the minimization problem for RLA by using dynamic programming, that is, by
computing an optimal automaton starting from smaller (optimal) automata in
a bottom-up fashion. The key point of such a solution is the proof that the
problem enjoys an optimal-substructure property. In the following we describe
three operations on RLA, and we prove closure properties for them; then, we
compare the complexity of compound automata with that of their components.
In the next section we will take advantage of these results to give an optimal
substructure property for RLA.
The class of RLA is closed with respect to the operations of concatenation, repetition, and iteration of words. Given two RLA M and N , which respectively recognize a (finite) word u and a (not necessarily finite) word v, let
Concatenate(M, N ), Iterate(M ), and Repeat(M, k) respectively be the concatenation of M and N , which recognizes the word u · v, the iteration of M , which
recognizes the word uω , and the k-repetition of M , which recognizes the word
uk . The resulting automata can be computed as follows:
• the automaton Concatenate(M, N ) can be obtained in the usual way by
linking the final state of M , namely, the state reached at the end of the
computation of M , to the initial state of N by means of a primary transition;
• the automaton Iterate(M ) can be obtained by linking the final state of M
to the initial state of M by means of a primary transition;
• the automaton Repeat(M, k) can be obtained by introducing a new nonlabeled state sloop and by adding (i) a primary transition from the final
state of M to sloop , (ii) a secondary transition from sloop to the initial state
of M , and (iii) a counter on sloop , with initial valuation equal to k.
82
Ugo Dal Lago, Angelo Montanari, and Gabriele Puppis
Moreover, the complexity of these automata can be given in terms of the complexities of the component automata as follows:
• Concatenate(M, N ) has complexity max { M , n + N }, where n is the cardinality of the set of states reachable from the initial state of M by means
of primary transitions only;
• Iterate(M ) has complexity M ;
• Repeat(M, k) has complexity M + 1.
As a matter of fact, we can actually give the status of algorithms running in
linear time to Concatenate, Iterate, and Repeat, as it can be easily checked.
Finally, let Σ be a finite alphabet and let BΣ be the set {Ma : a ∈ Σ},
where Ma is the single-state RLA recognizing a ∈ Σ. We denote by CΣ the
class of all the RLA which can be obtained from BΣ by applying the operations
of Concatenate, Iterate, and Repeat. CΣ is properly included in the class of all
the RLA, that is, there exist some RLA, including size-optimal and complexityoptimal ones (e.g., the automaton M in Figure 3), which cannot be generated
from automata in BΣ by applying the operations of concatenation, iteration, and
k-repetition. Nevertheless, it turns out that, for every RLA M , CΣ always contains at least one RLA which is equivalent to M and has the same complexity.
This property can be used to prove that a complexity-optimal automaton for a
given string can be generated by appropriately composing smaller (complexityoptimal) automata using the operators Concatenate, Iterate, and Repeat. Unfortunately, similar properties do not hold for the size of RLA.
6
6.1
Computing Complexity-Optimal Automata
Sharing Free Automata
For every RLA M = (SΣ , Sε , Σ, Ω, δ, γ, s0 , C0 ) and for every pair of states r, s
i
such that (r, s) ∈ δ ∗ , let ∆M
r,s denote the set {δ (r) : 0 ≤ i ≤ n}, where n is the
n
least natural number such that s = δ (r).
Definition 4. Given an RLA M = (SΣ , Sε , Σ, Ω, δ, γ, s0 , C0 ) and a state s ∈
Sε , s is said to be sharing if there is a state t ∈
/ ∆M
γ(s),s \ {s0 } such that the set
M
M
∆t,s ∩ ∆γ(s),s contains states other than s itself. M is sharing-free if Sε does not
contain sharing states.
As a matter of fact, any automaton in CΣ is sharing free. The following lemma
shows that sharing states can be eliminated by replicating states, without increasing complexity.
Lemma 1. For every RLA M , there exists an equivalent sharing-free RLA,
denoted SharingFree(M ), such that M = SharingFree(M ) .
6.2
An Optimal Substructure Property
In this section, we prove that RLA satisfy the optimal substructure property.
Lemma 1 implies that for any (finite or ultimately periodic) word u ∈ Σ ∞ , there
Towards Compact and Tractable Automaton-Based Representations
83
Fig. 4. Relationship between partial periods and borders.
is at least one sharing free complexity-optimal automaton M which recognizes
u. In fact, we are going to show that we can choose M in such a way that it
belongs to CΣ and it is decomposable into complexity-optimal automata.
As a preliminary step, we characterize repeating patterns of words through
the notions of period, partial period, and border.
Definition 5. A word u has period p if there is a positive integer k such that
u = u[1, p]k . The period of u is the minimum period of u. By analogy we define
the prefix length and the period of an ultimately periodic word u to be the
integers l and q such that u = u[1, l] · u[l + 1, l + q]ω and l + q is minimum.
Furthermore, p is said to be a partial period of u provided u is a prefix of
u[1, p]ω . Finally a border of a finite word u is a non-empty string v different
from u such that v is both a prefix and a suffix of u.
It is worth noticing that u[1, q] is a (maximum) border of u if and only if p =
|u| − q is a (minimum) partial period of u (see Figure 4).
The following two theorems state optimal substructure properties for finite
and ultimately periodic words, respectively. Notice that both theorems provide
only a finite number of ways to build a complexity-optimal automaton for u from
(optimal) automata for substrings of u.
Theorem 1. Given a finite word u such that |u| > 1, one of the following
conditions holds:
i) for every pair of complexity-optimal automata M and N recognizing respectively the prefix u[1] and the suffix u[2, n] of u, Concatenate(M, N ) is a
complexity-optimal automaton recognizing u;
ii) there exists an integer r ∈ [1, |u| − 1] such that whenever M and N are
two complexity-optimal automata recognizing respectively the prefix u[1, p]
(with p being the period of u[1, r]) and the suffix u[r + 1, |u|] of u, then
Concatenate(Repeat(M, pr ), N ) is a complexity-optimal automaton recognizing u.
iii) for every complexity-optimal automaton M recognizing u[1, p], with p < |u|
being the period of u, Repeat(M, np ) is a complexity-optimal automaton recognizing u;
Theorem 2. Given an ultimately periodic word u with minimum prefix length
l and minimum period q, one of the following conditions holds:
84
Ugo Dal Lago, Angelo Montanari, and Gabriele Puppis
i) l > 0 and for every pair of complexity-optimal automata M and N recognizing respectively the prefix u[1] and the suffix u[2, ω] of u, Concatenate(M, N )
is a complexity-optimal automaton recognizing u;
ii) l > 0 and there is an integer r ∈ [1, 2l + 2q] such that whenever M and
N are two complexity-optimal automata recognizing respectively the prefix
u[1, p] (with p being the period of u[1, r]) and the suffix u[r + 1, ω[ of u, then
Concatenate(Repeat(M, pr ), N ) is a complexity-optimal automaton recognizing u.
iii) l = 0 and for every complexity-optimal automaton M recognizing u[1, q],
Iterate(M ) is a complexity-optimal automaton recognizing u;
Theorems 1 and 2 suggest a simple dynamic programming algorithm which,
given a finite string u or a string-based representation of an ultimately periodic
word u, computes in polynomial time a complexity-optimal RLA recognizing
u. This algorithm heavily uses information on periods of all the substrings of
u. For any finite string v (or any finite prefix v of a given ultimately periodical word), the periods of all the substrings of v can be efficiently computed in
time Θ(|v|2 ) by exploiting noticeable properties of periods and borders (the
approach is somehow similar to the one used by Knuth, Morris, and Pratt
in order to compute the prefix function of a pattern in the context of stringmatching problems [8]). In particular, it turns out that the length q(j) of the
maximum border of v[1, j] satisfies the equations q(1) = 0 and, for every j > 1,
q(j) = max ({0} ∪ {l : v[l] = v[j] ∧ l − 1 ∈ q + (j − 1)}), where q + denotes the
transitive closure of the function q. Since to each maximum border corresponds
a minimum partial period, it turns out that the minimum partial periods of all
the prefixes of v can be computed in linear time. The above mentioned bound
easily follows.
7
Further Work
In this paper we gave a polynomial time algorithm that determines a complexityoptimal representation for RLA. We believe that such an algorithm can actually
be improved, by exploiting subtle relationships between repeating patterns of
strings and secondary transition functions of complexity-optimal RLA. As a
matter of fact, we conjecture that loops of primary and secondary transition
functions of a complexity-optimal RLA can be related to maximal repetitions in
the recognized word (a maximal repetition of u is a periodical substring u[i, j]
whose minimum period increases as soon as u[i, j] is prolonged to the right, e.g.,
u[i, j + 1], or to the left, e.g., u[i − 1, j].
Another interesting research direction is the development of an algorithm
that efficiently solves the size-minimization problem. To this end, we conjecture
that size-optimal automata can be built up from smaller components, as we did
for complexity-optimal ones, via concatenation, repetition, iteration, and a new
operator which collapses “non-distinguishable” states of RLA (at the moment,
the major stumbling block is the problem of finding an appropriate definition of
RLA distinguishable states).
Towards Compact and Tractable Automaton-Based Representations
85
References
1. C. Bettini, S. Jajodia, and X.S. Wang. Time Granularities in Databases, Data
Mining, and Temporal Reasoning. Springer, July 2000.
2. H. Calbrix, M. Nivat, and A. Podelski. Ultimately periodic words of rational
ω-languages. In Proceedings of the 9th International Conference on Mathematical
Foundations of Programming Semantics, volume 802 of Lecture Notes in Computer
Science, pages 554–566. Springer, 1994.
3. C. Combi, M. Franceschet, and A. Peron. A logical approach to represent and
reason about calendars. In Proceedings of the 9th International Symposium on
Temporal Representation and Reasoning, pages 134–140. IEEE Computer Society
Press, 2002.
4. U. Dal Lago and A. Montanari. Calendars, time granularities, and automata. In
Proceedings of the 7th International Symposium on Spatial and Temporal Databases
(SSTD), volume 2121 of Lecture Notes in Computer Science, pages 279–298.
Springer, July 2001.
5. U. Dal Lago, A. Montanari, and G. Puppis. Time granularities, calendar algebra,
and automata. Technical Report 4, Dipartimento di Matematica e Informatica,
Università degli Studi di Udine, Italy, February 2003.
6. U. Dal Lago, A. Montanari, and G. Puppis. Towards compact and tractable
automaton-based representations of time granularities. Technical Report 17, Dipartimento di Matematica e Informatica, Università degli Studi di Udine, Italy,
July 2003.
7. C.E. Dyreson, W.S. Evans, H. Lin, and R.T. Snodgrass. Efficiently supporting
temporal granularities. IEEE Transactions on Knowledge and Data Engineering,
12(4):568–587, July/August 2000.
8. D.E. Knuth, J.H. Morris, and V.R. Pratt. Fast pattern matching in strings. SIAM
Journal on Computing, 6:323–350, 1977.
9. P. Ning, S. Jajodia, and X.S. Wang. An algebraic representation of calendars.
Annals of Mathematics and Artificial Intelligence, 36:5–38, 2002.
10. W. Thomas. Languages, automata, and logic. In G. Rozemberg and A. Salomaa,
editors, Handbook of Formal Languages, volume 3, pages 389–455. Springer, 1997.
11. J. Wijsen. A string-based model for infinite granularities. In C. Bettini and A. Montanari, editors, Proceedings of the AAAI Workshop on Spatial and Temporal Granularities, pages 9–16. AAAI Press, 2000.