23
Finite-state Methods Featuring
Semantics
Tim Fernando
23.1
Introduction
“It may turn out to be very useful for semantic representations too.” So
concludes the abstract of Lauri Karttunen’s COLING 84 paper, Features and values (F&V), referring to “the new Texas version of the
‘DG (directed graph)’ package” which was “primarily intended for representing morphological and syntactic information” (page 28). That the
directed graph was essentially a finite automaton may have been too
obvious an observation for F&V to state — or, assuming it had already
been stated, restate. Be that as it may, this observation is used in Fernando 2016 to extract typed features structures from Robin Cooper’s
record type approach to frames (Fillmore 1982, Cooper 2012). I restate
the observation here to develop its uses for semantic representations
further, egged on by the aforementioned statement from F&V, and (as
with Kornai 2017) the prospects of bringing together “two lines of research that lie at the opposite ends on the field” (Karttunen 2007).
Just how to view feature structures as finite automata is detailed in
the next section (section 2); why this view might pay off is explored in
section 3.
As bottom-dwellers in the Chomsky hierarchy, finite automata have
well-known limitations to test the maxim keep it simple. Finite-state
methods are structured below around semantic notions |= of satisfaction between models and sentences, kept simple through Leibniz’s Law,
Identity of Indiscernibles. A logical formalism well-known from HenTokens of Meaning: Papers in Honor of Lauri Karttunen.
Cleo Condoravdi and Tracy Holloway King (eds.).
Copyright
2018, CSLI Publications.
�
525
526 / Tim Fernando
nessy & Milner 1985 (among other papers) is applied to feature structures in section 2 broadly along the lines of Blackburn 1993, but with
particular attention to certain sets of strings to which directed graphs
can be reduced, called trace sets. A set Σ of attributes is paired with a
trace set T ⊆ Σ∗ for a signature (Σ, T ), picking out the set Mod (Σ, T )
of trace sets L sandwiched between T and Σ∗
T ⊆ L ⊆ Σ∗ .
A trace set L ∈ Mod (Σ, T ) is a (Σ, T )-model , as the notation Mod (Σ, T )
suggests, against which to evaluate a Σ-sentence. A (Σ, T )-model L can
be construed as a record of record type (Σ, T ) with an s-component
Ls , for every string s ∈ T , that is a trace set satisfying a Σ-sentence ϕ
precisely if L satisfies the Σ-sentence hsiϕ
L |= hsiϕ
⇐⇒
Ls |= ϕ.
To analyze satisfaction |=, it suffices to keep the set Σ in a signature
(Σ, T ) finite, and integrate different signatures within a category (following the so-called Grothendieck construction). Behind the somewhat
technical details below is the intuition that signatures are bounded
granularities that simplify calculations of satisfaction |=. That simplification, called the Translation Axiom in Barwise 1974 (page 235) and
the Satisfaction Condition in Goguen & Burstall 1992 (page 102), applies to unification in F&V with negative and disjunctive constraints
that refine the sets Mod (Σ, T ).
Supposing a typed feature structure can be viewed as a finite automaton (which section 2 takes pains to show), so what ? To make the
view compelling, we turn in section 3 to runs of finite automata with the
eventual goal of understanding these runs as uses of linguistic resources
encoded by typed feature structures. An approach to temporality in
which time arises from running automata is presented paralleling section 2, with Monadic Second Order logic in place of Hennessy-Milner
logic, and superposition in place of unification (for building models
bottom-up, subject to constraints). Careful attention is paid to shifts
in bounded granularity and to the assorted forces that take shape as
granularity is refined. This is in contrast to the practice of fixing some
space of possible worlds once and for all, without any provisions for
varying granularity.
23.2
From Features to Strings and Types
Some notions taken up in F&V are collected in the first column of Table
1, which we analyze in this section according to the second column.
Finite-state Methods Featuring Semantics / 527
path ha1 · · · an i of attributes
(rooted) directed graph G
generalize(G, G′ )
constraints C
unifyC (G, G′ )
string a1 · · · an
set L(G) of strings
L(G) ∩ L(G′ )
set ΦC of sentences with ¬, ∨
L(G) ∪ L(G′ ) if it satisfies ΦC
Table 1
We take for granted in Table 1 a set Σ of attributes (a, ai , . . .), and
define a Σ-deterministic system to be a partial function δ : Q × Σ ⇁ Q
to some set Q of nodes from the set Q × Σ of node-attribute pairs.
a
We picture a triple (q, a, δ(q, a)) in δ as a deterministic transition q →
δ(q, a), and formulate a (rooted) directed graph G as a pair (δ, q) of
a Σ-deterministic system δ and a node q ∈ Q. To define the language
L(G) ⊆ Σ∗ , let δq : Σ∗ ⇁ Q be the ⊆-smallest subset F of Σ∗ × Q such
that
(i) (ǫ, q) ∈ F , and
(ii)(sa, q ′′ ) ∈ F whenever (s, q ′ ) ∈ F and (q ′ , a, q ′′ ) ∈ δ.
Now, if the directed graph G is the pair (δ, q), then its language L(G)
is the set dom(δq ) of strings s for which δq (s) is defined. The language
dom(δq ) is called the trace set of (δ, q), and strings in dom(δq ) are
called traces of (δ, q).
Next, with Σ fixed, we express constraints through the set sen(Σ)
of sentences ϕ generated from attributes a ∈ Σ by the grammar
ϕ ::= ⊤ | haiϕ | ¬ϕ | ϕ ∨ ϕ′
interpreted against a Σ-deterministic system δ : Q×Σ ⇁ Q by a binary
relation |=δ ⊆ Q × sen(Σ) that treats ⊤ as a tautology
q |=δ ⊤
for every q ∈ Q,
hai as the Diamond modal operator with accessibility relation δ(·, a)
q |=δ haiϕ ⇐⇒ (q, a) ∈ dom(δ) and δ(q, a) |=δ ϕ,
¬ as Boolean negation
q |=δ ¬ϕ
and ∨ as Boolean disjunction
q |=δ ϕ ∨ ϕ′
⇐⇒
⇐⇒
not q |=δ ϕ
q |=δ ϕ or q |=δ ϕ′
(Hennessy & Milner 1985, Blackburn 1993). Collecting the sentences in
sen(Σ) that q |=δ -satisfies in
sen Σ (δ, q) := {ϕ ∈ sen(Σ) | q |=δ ϕ},
528 / Tim Fernando
it turns out that directed graphs satisfy the same subset of sen(Σ)
precisely if they have the same trace set
sen Σ (δ, q) = sen Σ (δ ′ , q ′ )
1
⇐⇒
dom(δq ) = dom(δq′ ′ )
(23.28)
(Hennessy & Milner 1985). Under Leibniz’s Identity of Indiscernibles,
with discernibility based on sen(Σ), (23.1) reduces a directed graph
(δ, q) to its trace set dom(δq ). The trace set captures a fragment of
sen(Σ)
dom(δq ) = {s ∈ Σ∗ | hsi⊤ ∈ sen Σ (δ, q)}
consisting of sentences of the form hsi⊤, where for every ϕ ∈ sen(Σ),
the sentence hsiϕ in sen(Σ) is defined by induction on s ∈ Σ∗ , starting
with the null string ǫ,
hǫiϕ := ϕ
and using modal operators hai elsewhere
hasiϕ := haihsiϕ
so that ha1 · · · an iϕ is ha1 i · · · han iϕ and
q |=δ hsiϕ ⇐⇒ s ∈ dom(δq ) and δq (s) |=δ ϕ.
(23.29)
In the remainder of this section, we replace q and δ in (23.2) by a
prefix-closed language over Σ with derivatives (in §2.1), and expand Σ
to flesh out Table 1 (in §2.2), systematized category-theoretically (in
§2.3) to link up with section 3.
23.2.1 Languages and Transitions to Derivatives
Given a set L ⊆ Σ∗ of strings over Σ and a string s over Σ, the sderivative of L is the set
Ls := {s′ | ss′ ∈ L}
of strings that put after s belong to L (Brzozowski 1964). For any Σdeterministic system δ : Q × Σ ⇁ Q and node q, one can check that
dom(δq ) is the set of strings s such that the null string ǫ is in the
s-derivative of dom(δq )
s ∈ dom(δq ) ⇐⇒ ǫ ∈ (dom(δq ))s
and that for every s ∈ dom(δq ), the s-derivative of dom(δq ) is the trace
set of (δ, δq (s))
(dom(δq ))s = dom(δq′ ) where q ′ = δq (s).
1 Readers familiar with, for example, Barwise & Moss 1996 will note that determinism simplifies matters considerably, reducing bisimulation equivalence between
(δ, q) and (δ′ , q ′ ) to trace equivalence dom (δq ) = dom (δq′ ′ ), and allowing us to talk
of sets of strings instead of non-well-founded sets.
Finite-state Methods Featuring Semantics / 529
Indeed, the chain of equivalences
a1 a2 · · · an ∈ L ⇐⇒ a2 · · · an ∈ La1
⇐⇒ · · · ⇐⇒ ǫ ∈ La1 ···an
from a1 · · · an to the null string ǫ means that L is accepted by the
deterministic automaton with
-
s-derivatives Ls as states
initial state L = Lǫ
a-transitions from Ls to Lsa (for every symbol a ∈ Σ)
final (accepting) states Ls such that ǫ ∈ Ls .
The s-derivative of L equals the s′ -derivative of L precisely if s and s′
concatenate with the same strings to produce strings in L
Ls = Ls′ ⇐⇒ (∀w ∈ Σ∗ ) (sw ∈ L ⇐⇒ s′ w ∈ L)
so that the Myhill-Nerode Theorem says that for finite Σ,
L is regular
⇐⇒
{Ls | s ∈ Σ∗ } is finite
(e.g. Hopcroft & Ullman 1979). Note that Ls is non-empty precisely if
s is the prefix of some string in L. Moreover, if Ls is empty then so
is Lsa for every a ∈ Σ. That is, ∅ is a sink state that we may safely
exclude from the states of the automaton above, at the cost of making
the transition function partial.
Let us call a language L prefix-closed if for all sa ∈ L, s ∈ L. Note
that trace sets are prefix-closed and non-empty. Let Mod (Σ) denote the
set
Mod (Σ) := {L ⊆ Σ∗ | L 6= ∅ and L is prefix-closed}
of non-empty prefix-closed subsets of Σ∗ , and let us refer to an element
of Mod (Σ) as a Σ-state. Not only are trace sets Σ-states, but conversely,
if δ̂ is the Σ-deterministic system
{(L, a, La ) | L ∈ Mod (Σ) and a ∈ Σ ∩ L}
then every Σ-state L is the trace set of (δ̂, L). Keeping δ̂ implicit, a Σstate L makes an s-transition to its s-derivative Ls precisely if s ∈ L,
specializing the biconditional (2) from the previous page to
L |= hsiϕ ⇐⇒ s ∈ L and Ls |= ϕ.
23.2.2 Adding Attributes, Types and Constraints
Identity as indiscernibility relative to sen(Σ) presupposes that all differences which matter are captured by the set Σ. An obvious problem
is that the single trace set {ǫ} cannot differentiate between atomic values. But it is easy enough to introduce for every atomic value v, a fresh
530 / Tim Fernando
attribute av to Σ for say, the trace set {av , ǫ}. At least two objections
can be made to this move. The first is that a trace set of {ǫ} is arguably what it means for a value v to be atomic; any larger trace set
would make v non-atomic. If “atomic” is understood this way, identity
as indiscernibility leaves us no choice but to differentiate between values by making all but perhaps one of them non-atomic. A more serious
objection is that if the alphabet Σ is to be finite, then we cannot introduce fresh attributes to Σ indefinitely. Or can we? Given any set A,
no matter how large, we can form its set Fin(A) of finite subsets
Fin(A) := {Σ ⊆ A | Σ is finite}
and let Σ vary over members of Fin(A); each attribute a ∈ A−Σ added
to Σ leads to the different member Σ ∪ {a} of Fin(A). The challenge
then becomes to implement the variations in Σ systematically. This is
where signatures and institutions enter.
But first, it will prove convenient to expand sen(Σ) with a modal
operator ✸ for a sentence ✸ϕ equivalent to the disjunction over all
s ∈ Σ∗ of the sentences hsiϕ. More precisely,
q |=δ ✸ϕ ⇐⇒ (∃s ∈ Σ∗ ) q |=δ hsiϕ
(23.30)
for any Σ-deerministic system δ : Q × Σ ⇁ Q and node q ∈ Q. Incorporating ✸ into sen(Σ) and sen Σ (δ, q) for sen ✸ (Σ) and sen ✸
Σ (δ, q)
respectively, it is not dfficult to verify that trace equivalence remains
indiscernibility up to sen ✸ (Σ)
✸ ′ ′
′
sen ✸
Σ (δ, q) = sen Σ (δ , q ) ⇐⇒ dom(δq ) = dom(δq′ ).
Thus, we can again reduce (δ, q) to its trace set dom(δq ) and |=δ to a
binary relation |=Σ ⊆ Mod (Σ) × sen ✸ (Σ) between a Σ-state L and a
sentence ϕ ∈ sen ✸ (Σ), simplifying (23.3) to
L |=Σ ✸ϕ ⇐⇒ (∃s ∈ Σ∗ ) L |=Σ hsiϕ
⇐⇒ (∃s ∈ L) Ls |=Σ ϕ
(adding the subscript Σ to prepare for the aforementioned variations).
As usual, we let ✷ϕ abbreviate ¬✸¬ϕ for
L |=Σ ✷ϕ ⇐⇒ (∀s ∈ L) Ls |=Σ ϕ
alongside the Boolen conventions ϕ ⊃ ψ for ψ ∨ ¬ϕ, and ϕ ∧ ψ for
¬(¬ϕ ∨ ¬ψ). Given a subset Φ of sen ✸ (Σ), we say a Σ-state L is a
Σ-model of Φ, and write L |=Σ Φ, if it satisfies every sentence in Φ
L |=Σ Φ ⇐⇒ (∀ϕ ∈ Φ) L |=Σ ϕ.
Now, to pick out a particular Σ-state through a sentence ϕ, let Uniq Σ (ϕ)
Finite-state Methods Featuring Semantics / 531
be the set
Uniq Σ (ϕ) := {✸(ϕ ∧ ψ) ⊃ ✷(ϕ ⊃ ψ) | ψ ∈ sen(Σ)}
of implications ✸(ϕ ∧ ψ) ⊃ ✷(ϕ ⊃ ψ) ensuring that if ψ should ever
occur with ϕ, it always occurs with ϕ. Since trace equivalence is indiscernibility with respect to sen(Σ), it follows that the sentences ψ
appearing in Uniq Σ (ϕ) can be restricted to those of the form hsi⊤ for
s ∈ Σ∗ without changing the Σ-models of Uniq Σ (ϕ), and that
(†) for any Σ-model L of Uniq Σ (ϕ), and s, s′ ∈ L,
if Ls |=Σ ϕ and Ls′ |=Σ ϕ then Ls = Ls′ .
If we are to introduce an attribute av to name a particular value v
through the sentence hav i⊤, then we must restrict our (Σ∪{av })-states
to (Σ ∪ {av })-models of Uniq Σ∪{av } (hav i⊤). An attribute a might also
be introduced to name a type that applies to more than one (Σ ∪ {a})state, implicating a (Σ∪{a})-state that fails to satisfy some sentence in
Uniq Σ∪{a} (hai⊤). There is a curious twist here on treatments of “identity and mere likeness” (F&V, page 29) and re-entrancy (connected
with a feature path s that appears in the present set-up as a subscript
in Ls and inside a modal operator in hsiϕ). Σ-states L and L′ can be
distinct only if some sentence in sen(Σ) differentiates them (shifting, as
it were, the burden of proof from identification to differentiation, and
suggesting refinements of identity through expansions of Σ).
Additional attributes may serve purposes other then refining discernibility. For example, they may provide representations of sentences
in sen ✸ (Σ) as follows. Given a subset Φ of sen ✸ (Σ), a sentence ϕ ∈
sen ✸ (Σ), and a string s ∈ Σ∗ , let us agree that s (Σ, Φ)-represents ϕ if
every Σ-model of Φ satisfies
✷(ϕ ≡ hsi⊤)
where ϕ ≡ ψ is (ϕ ⊃ ψ) ∧ (ψ ⊃ ϕ), and consequently, for any Σ-state
L,
L |=Σ ✷(ϕ ≡ ψ) ⇐⇒ (∀s ∈ L)(Ls |=Σ ϕ ⇐⇒ Ls |=Σ ψ).
Because we can build ϕ with the connectives ¬ and ∨, we cannot expect there to be a string that (Σ, ∅)-represents ϕ. But we can always
introduce an attribute aϕ 6∈ Σ and set Φ to {✷(ϕ ≡ haϕ i⊤)} so that
aϕ (Σ ∪ {aϕ }, Φ)-represents ϕ. And we can put together attributes aϕ
and aψ that (Σ, Φ)-represent ϕ and ψ respectively, as Σ-models of Φ
satisfy
✷ ((ϕ ∧ ψ) ≡ (haϕ i⊤ ∧ haψ i⊤)).
532 / Tim Fernando
We can then avoid the addition of aϕ∧ψ , provided we generalize our
notion of representation to a language L̂ ⊆ Σ∗ as follows. We say L̂
(Σ, Φ)-represents ϕ if for every Σ-model L of Φ and s ∈ L,
Ls |=Σ ϕ ⇐⇒ (∀s′ ∈ L̂) Ls |= hs′ i⊤
⇐⇒ L̂ ⊆ Ls
(23.31)
Clearly, a string s (Σ, Φ)-represents ϕ iff the singleton language {s}
(Σ, Φ)-represents ϕ. But why should we care about representing sentences by languages?
Table 1 at the beginning of the present section mentions not only
directed graphs G and G′ but also constraints C. Directed graphs are
formulated here as Σ-states (models), and constraints as subsets of
sen ✸ (Σ). A Σ-state can be viewed as a token, and a sentence ϕ in
sen ✸ (Σ) as the type
Mod Σ (ϕ) := {L ∈ Mod (Σ) | L |=Σ ϕ}
of Σ-states satisfying
ϕ. A set Φ ⊆ sen ✸ (Σ) of sentences amounts to
V
the conjunction Φ specifying the type
\
Mod Σ (Φ) :=
Mod Σ (ϕ)
ϕ∈Φ
of Σ-states satisfying every sentence in Φ. Inclusion ⊆ between sets
of strings over Σ in (23.4) is easily confused with that between sets
Mod Σ (ϕ) and Mod Σ (ψ) of such sets
Mod Σ (ϕ) ⊆ Mod Σ (ψ) ⇐⇒ (∀L ∈ Mod (Σ)) L |=Σ ϕ ⊃ ψ
signifying an entailment from ϕ to ψ (and reversing the direction in
(23.4) from the less informative L̂ to the more informative Ls ). Converting a sentence ϕ to a Σ-state that (Σ, Φ)-represents it requires a
set Φ of constraints that we can find in, if necessary, an expansion of Σ.
Resorting to Φ as {✷(ϕ ≡ haϕ i⊤)} with aϕ thrown into Σ is perhaps
too easy, shoving all the work over to Φ. But there is surely a role for
Φ, since L can only (Σ, ∅)-represent a sentence with the same Σ-models
as {hsi⊤ | s ∈ L}, leaving out many sentences formed with negation ¬
and disjunction ∨. The models of a sentence ϕ that a language (Σ, ∅)represents are closed under inclusion ⊆
(∀L ∈ Mod Σ (ϕ))(∀L′ ∈ Mod (Σ)) L ⊆ L′ implies L′ |=Σ ϕ
and intersection ∩
(∀L ∈ Mod Σ (ϕ))(∀L′ ∈ Mod Σ (ϕ)) L ∩ L′ |=Σ ϕ.
But closure under intersection fails for the negation ¬hai⊤, and closure under intersection fails for the disjunction hai⊤ ∨ ha′ i⊤ (with two
Finite-state Methods Featuring Semantics / 533
different ⊆-minimal models, for a 6= a′ ).
As a binary operation on directed graphs, unification in F&V is
defined on Σ-states, and, pace Blackburn 1993, not on sentences (in
terms of the connective ∧). The constraints determining when two directed graphs are unifiable does, however, bring in sen ✸ (Σ), as does
talk of negative and disjunctive features inasmuch as these involve
the sen ✸ (Σ)-connectives ¬ and ∨. Evidently, a mix of Σ-states and
Σ-sentences is required. Accordingly, let us pair Σ with a language
T ⊆ Σ∗ , revising Mod (Σ) to
Mod (Σ, T ) := {L ∈ Mod (Σ) | T ⊆ L}
and Mod Σ (Φ), for Φ ⊆ sen ✸ (Σ), to
Mod Σ,T (Φ) := {L ∈ Mod Σ (Φ) | T ⊆ L}.
Then relative to constraints Φ, we can analyze the unification of Σstates L and L′ in terms of Mod Σ,L∪L′ (Φ), which may be empty even
if neither Mod Σ,L (Φ) nor Mod Σ,L′ (Φ) is, accounting for the partiality
of unification
L and L′ are unifiable relative to Φ
⇐⇒
Mod Σ,L∪L′ (Φ) 6= ∅.
Negation and disjunction in features may (or may not) require expanding Σ with a¬ϕ and aϕ∨ψ , and Φ with constraints
for ϕ′ ∈ {¬ϕ, ϕ ∨ ψ}.
✷(ϕ′ ≡ haϕ′ i⊤)
Fixing some large set A to which all the required attributes belong, we
let Σ vary over the set Fin(A) of finite subsets of A, and note
Fact 1 Let Σ′ ∈ Fin(A), Σ ⊆ Σ′ , ϕ ∈ sen(Σ), and L′ ∈ Mod (Σ′ ).
Then L′ ∩ Σ∗ ∈ Mod (Σ) and
L′ |=Σ′ ϕ
⇐⇒
L′ ∩ Σ∗ |=Σ ϕ
and moreover, for every s ∈ L′ ∩ Σ∗ , (L′ ∩ Σ∗ )s ∈ Mod (Σ) and
L′ |=Σ′ hsiϕ ⇐⇒ (L′ ∩ Σ∗ )s |=Σ ϕ.
The first part of Fact 1 says that the attributes that matter in satisfying
ϕ are only those that appear in ϕ,2 while the second part interprets
the modal operator hsi against Σ′ -states L′ under the presupposition
that s belongs to L′ .
2 Fact 1 leaves ✸ out of ϕ precisely because ✸ does not identify the attributes
relevant to the satisfaction of sentences built with ✸. To bring ✸ into ϕ in Fact 1,
we can add subscripts X ranging over subsets of Σ to make the pertinent attributes
in ✸X explicit, with
q |=δ ✸X ψ ⇐⇒ (∃s ∈ X ∗ ) q |=δ hsiψ
(Fernando 2016).
534 / Tim Fernando
23.2.3 The Grothendieck Construction and an Institution
Some category-theoretic structure lurking in Fact 1 will resurface in
section 3 under a different guise and is worth spelling out. We fix a
large set A of attributes, and for each finite subset Σ ∈ Fin(A) of
A, turn the set Mod (Σ) of Σ-states into a category Q(Σ) as follows. A
Q(Σ)-morphism from Σ-state L to Σ-state L′ is a pair (L, s) with s ∈ L
and Ls = L′ . Q(Σ)-morphisms compose by concatenating strings
(L, s); (Ls , s′ ) := (L, ss′ )
and (L, ǫ) is the identity morphism for L. Whenever Σ ⊆ Σ′ ∈ Fin(A),
we define the functor Q(Σ′ , Σ) : Q(Σ′ ) → Q(Σ) from Q(Σ′ ) to Q(Σ)
mapping
- a Σ′ -state L′ to the Σ-state L′ ∩ Σ∗ , and
- a Q(Σ′ )-morphism (L′ , s) to the Q(Σ)-morphism (L′ ∩ Σ∗ , πΣ (s))
where πΣ (s) is the longest prefix of s in Σ∗
πΣ (ǫ) := ǫ
a πΣ (s)
πΣ (as) :=
ǫ
if a ∈ Σ
otherwise.
Construing Fin(A) as a category with morphisms given by inclusion
⊆, the foregoing defines a contravariant functor Q : Fin(A)op → Cat
into the category Cat of small categories. The Grothendieck construction (e.g.,R Tarlecki, Burstall & Goguen 1991) applied to Q yields the
category Q where
- an object is a pair (Σ, L) ∈ Fin(A) × Mod (Σ), and
- a morphism from (Σ′ , L′ ) to (Σ, L) is a pair
((Σ′ , Σ), (L′′ , s))
of a Fin(A)op -morphism (Σ′ , Σ) and a Q(Σ)-morphism (L′′ , s) such
that
L′′ = L′ ∩ Σ∗ and L = L′′s .
R
Reversing the morphisms in Q for the category Sign of signatures
(Σ, L), we define two functors from Sign, one covariant and the other
contravariant
(i) sen : Sign → Set with sen(Σ, L) := sen(Σ) and
sen((Σ, Σ′ ), (L′′ , s)) : ϕ 7→ hsiϕ
(ii)Mod : Signop → Cat where the set Mod (Σ, L) of Σ-states that
⊆-contain L is turned into a full subcategory of Q(Σ), and
Mod ((Σ′ , Σ), (L′′ , s)) : L̂ 7→ (L̂ ∩ Σ∗ )s .
Finite-state Methods Featuring Semantics / 535
To build an institution (Goguen & Burstall 1992) from Sign, sen, and
Mod , it remains to form, for every signature (Σ, L), a relation |=Σ,L
by intersecting |=Σ with Mod (Σ, L) × sen(Σ). Fact 1 is essentially the
Satisfaction Condition characterizing institutions
for every signature (Σ′ , L′ ), subset Σ of Σ′ , string s ∈ L′ ∩ Σ∗ , sentence
ϕ ∈ sen(Σ), and L̂ ∈ Mod (Σ′ , L′ ),
L̂ |=Σ′ ,L′ hsiϕ
⇐⇒
(L̂ ∩ Σ∗ )s |=Σ,L ϕ
where the subscript L above is short for (L′ ∩ Σ∗ )s .
Introduced by Goguen and Burstall to cope with the proliferation of
logical systems in computer science, the notion of an institution has
attracted considerable attention and found numerous applications (e.g.
Diaconescu 2012, Kutz et al 2010). Under Fact 1, features and values
can be seen as part of that body of work.
23.3
Time for and from Running Automata
It is one thing to encode a linguistic resource as a feature structure
equivalent to a finite automaton. It is quite another matter to understand the use of such a resource as the use of a finite automaton. To use
a finite automaton is (arguably first and foremost) to run it, accepting
strings that end in a final/accepting state. But such runs take place in
isolation, whereas it is only in combination with other resources that
the encoding or use of a linguistic resource is interesting. The whole
point of the category-theoretic approach from the previous section is
to relate different feature structures. Similarly, the present section considers runs of an automaton not so much in isolation as in combination
with other automata, constructing a notion of time from such runs.
A simple way to superpose runs of two finite automata is defined in
§3.1, and related to the approximation of Priorean temporal models
in §3.2 by strings constructed from temporal propositions. We adopt
the custom from Artificial Intelligence of referring to temporal propositions as fluents. We fix some large set Θ of fluents much as we fixed
a large set A of attributes in the previous section. The plan roughly is
to fill out Table 2, embracing Leibniz’ Identity of Indiscernibles (as in
section 2), with granularity given by a finite subset A of Θ (analogous
to Σ ∈ Fin(A) in section 2) to form strings over the alphabet 2A of
subsets of A. The ⊆-larger the subset A, the more refined the A-models
and the more expressive the A-sentences can be.
536 / Tim Fernando
section 2
unify graphs
A of attributes
Σ ∈ Fin(A)
language over Σ
Hennessy-Milner
information merge
large set
grain/signature
model
sentence
section 3
superpose strings
Θ of fluents
A ∈ Fin(Θ)
string over 2A
Monadic Second-Order
Table 2
Helpful examples for orientation are provided by representations of a
calendar year at various granularities. The set A = {Jan, Feb, . . ., Dec}
of months suggests the string
sA := Jan Feb · · · Dec
of length 12. Enlarging A with days d1,d2,. . .,d31
A′ := A ∪ {d1,d2. . .,d31}
refines sA to the string
sA′ := Jan,d1 Jan,d2 · · · Jan,d31 Feb,d1 · · · Dec,d31
of length 366 for a leap year. We draw boxes (instead of the usual curly
braces { and }) around sets qua symbols to suggest a film strip. A
change in A can cause a box to split (much like hairs in Shan 2015), as
Jan in sA does (30 times) on adding days
Jan
❀
Jan,d1 Jan,d2 · · · Jan,d31
in sA′ . Similarly, a common Reichenbachian account of the progressive
puts a reference time R inside the event time E, splitting E into 3
boxes
E ❀ E E,R E
(one before, one simultaneous, and one after R). This and many other
examples in tense and aspect are taken up at length in Fernando 2015.
The aim of the present section is to link that work with the previous
section through the notion of an institution. The hope is that this might
contribute to understanding the use of linguistic resources encoded as
feature structures in terms of runs of finite automata — runs that give
rise to time at bounded granularities.
23.3.1 From Superposition to Reducts and MSO
Given two equally long strings s = α1 · · · αn and s′ = α′1 · · · α′n of sets
αi and α′i , let use define the superposition s&s′ of s and s′ to be the
string obtained by their componentwise unions αi ∪ α′i
α1 · · · αn & α′1 · · · α′n := (α1 ∪ α′1 ) · · · (αn ∪ α′n ).
Finite-state Methods Featuring Semantics / 537
For example,
E E E &
R
= E E,R E .
Extending the operation to sets L and L′ of strings of sets, the superposition L&L′ of L and L′ is the set of superpositions of strings of the
same length from L and L′
L & L′ := {s&s′ | (s, s′ ) ∈ L × L′ and length(s) = length(s′ )}
allowing us to conflate a string s with its singleton language {s} (making s&s′ = ∅ in case s and s′ differ in length). Given finite automata
accepting L and L′ , the usual product construction on finite automata
for their intersection L ∩ L′ (e.g. Hopcroft & Ullman 1979) can be
adjusted to combine transitions →L for L and →L′ for L′ to form nondeterministic transitions
α∪α′
(q, q ′ ) → (r, r′ )
⇐⇒
α
α′
q →L r and q ′ →L′ r′
for L&L′ in lockstep but with labels that may differ. We will loosen
the lockstep requirement in §3.2, but first consider constraints that we
might impose on superposition (analogous to C on unifyC (G, G′ ) in
section 2).
For example, we may wish to require that a fluent a is never followed
by a fluent b, as expressed by the predicate logic formula
(∀x)(∀y)(Pa (x) ∧ S(x, y) ⊃ ¬Pb (y))
where x and y range over string positions, and
S(x, y) says: next after position x is y
while for every fluent a ∈ Θ,
Pa (x) says: a occurs at position x.
More precisely, given a string s = α1 · · · αn of n sets αi of fluents, we
can interpret S as the binary relation
on the set
Sn := {(1, 2), (2, 3), . . . , (n − 1, n)}
[n] := {1, 2, . . . , n}
of integers from 1 to n, and Pa as the subset
Pas := {i ∈ [n] | a ∈ αi }
(where s = α1 · · · αn )
of [n], for each fluent a. That is, a string s ∈ (2Θ )n specifies a structure
Ms := h[n], Sn , {Pas }a∈Θ i
against which to interpret predicate logic formulas built from S and
the Pa ’s such as the formulas ϕ of Monadic Second-Order Logic (MSO;
538 / Tim Fernando
e.g. Libkin 2010) generated by the seven clauses
ϕ
::=
S(x, y) | Pa (x) | X(x) | ϕ ∨ ϕ′ | ¬ϕ | ∃xϕ | ∃Xϕ
from three disjoint infinite sets Var 1 , Var 2 and Θ of first-order variables
x, y ∈ Var 1 , second-order variables X ∈ Var 2 , and fluents a ∈ Θ,
respectively. For any such MSO-formula ϕ, only finitely many fluents
may occur in ϕ, which we collect in ϕ’s vocabulary, voc(ϕ) ∈ Fin(Θ)
voc(S(x, y)) = voc(X(x)) = ∅
voc(Pa (x)) = {a}
voc(ϕ ∨ ϕ′ ) = voc(ϕ) ∪ voc(ϕ′ )
voc(¬ϕ) = voc(∃xϕ) = voc(∃Xϕ) = voc(ϕ).
An MSO-sentence is understood to be an MSO-formula in which all
variable occurrences are bound. For every A ∈ Fin(Θ), we put every
MSO sentence with vocabulary contained in A into the set MSO (A)
MSO (A) := {ϕ | ϕ is an MSO-sentence and voc(ϕ) ⊆ A}
and define a binary relation
|=A ⊆ (2A )∗ × MSO (A)
between (2A )∗ and MSO (A) in the usual Tarskian manner, associating
a string s ∈ (2A )∗ with Ms . (Apologies for reusing the symbol |=.)
For any string s of sets of fluents, let the A-reduct ρA (s) of s be the
componentwise intersection of s with A
ρA (α1 · · · αn ) := (α1 ∩ A) · · · (αn ∩ A)
(so-called because ρA (s) is precisely the part of s needed to extract
from Ms its A-reduct h[n], Sn , {Pas }a∈A i).
Fact 2 For all A ∈ Fin(Θ), ϕ ∈ MSO(A) and s ∈ (2A )∗ ,
s |=A ϕ
⇐⇒
ρvoc(ϕ) (s) |=voc(ϕ) ϕ .
With Fact 2, the relations {|=A }A∈F in(Θ) become an institution with
signature category Fin(Θ) provided we
(i) extend the map A 7→ MSO(A) to pairs (A, A′ ) such that A ⊆ A′ ∈
Fin(Θ), setting MSO(A, A′ ) to the inclusion MSO (A) ֒→ MSO (A′ )
mapping ϕ ∈ MSO(A) ⊆ MSO(A′ ) to itself, and
(ii)turn the map A 7→ (2A )∗ into a contravariant functor M from
′
Fin(Θ) so that whenever A ⊆ A′ ∈ Fin(Θ), M(A′ , A) : (2A )∗ →
′
(2A )∗ is the restriction of ρA to (2A )∗
M(A′ , A)(s) = ρA (s)
′
for all s ∈ (2A )∗ .
Finite-state Methods Featuring Semantics / 539
Büchi’s theorem equating sentences inMSO (A) with regular languages
over A (e.g. Libkin 2010, page 124) holds also in the present set-up for
languages over 2A (the advantage of 2A over A being the availability
of reducts for Fact 2).
23.3.2 Compression, Branching and Superposition Modified
A string s ∈ (2A )∗ is understood above to have granularity A. Variations in A are described in Fact 2 that preserve string length using
A-reducts. It is natural, however, to expect the length of a string to
grow with A, as hinted by the discussion above of
sA := Jan Feb · · · Dec
and
sA′ := Jan,d1 Jan,d2 · · · Jan,d31 Feb,d1 · · · Dec,d31 .
Put the other way around, the A-reduct of sA′
ρA (sA′ ) = Jan
has substrings such as Jan
31
31
Feb
29
· · · Dec
d31
which we might compress to Jan for
bc(ρA (sA′ )) = Jan Feb · · · Dec = sA
where for any string s, bc(s) compresses blocks αn of n > 1 consecutive
occurrences in s of the same symbol α to a single α, leaving s otherwise
unchanged
if s = ααs′
bc(αs′ )
′ ′
bc(s) :=
α bc(α s ) if s = αα′ s′ with α 6= α′
s
otherwise.
To require that time progress only with change (discernible at some
bounded granularity A), let us work with strings α1 α2 · · · αn that are
stutter-free in that αi 6= αi+1 for i from 1 to n − 1. That is,
a string s is stutter-free
⇐⇒
s = bc(s).
The restriction of bc to any finite alphabet is computable by a finitestate transducer, as are, for all A′ ∈ Fin(Θ) and A ⊆ A′ , the composition ρA ; bc for bcA
bcA (s) := bc(ρA (s))
′
for s ∈ (2A )∗ .
Without the compression bc in bcA , we are left with the map ρA that
leaves the ontology intact (insofar as the domain of an MSO-model
is given by the string length), whilst restricting the vocabulary (for Areducts). The institution described by Fact 2 can be adjusted to another
institution in which
540 / Tim Fernando
- the models are stutter-free strings3
- the reducts ρA are replaced by bcA , and
- the satisfaction relations |=′A are given by explicitly referring to the
sentence’s vocabulary
s |=′A ϕ ⇐⇒ bcvoc(ϕ) (s) |=voc(ϕ) ϕ.
Compressing strings via bcA allows us to lengthen the strings by inversion. The inverse limit IL(Θ, bc) of Θ, bc consists of functions a :
Fin(Θ) → Fin(Θ)∗ that respect the projections bcA
a(A) = bcA (a(A′ )) whenever A ⊆ A′ ∈ Fin(Θ).
The prefix relation on strings
s prefix s′ ⇐⇒ s′ = sŝ for some ŝ
lifts to maps a and a′ in IL(Θ, bc) by universal quantification for an
irreflexive relation
a ≺ a′ ⇐⇒ a 6= a′ and (∀A ∈ Fin(Θ)) a(A) prefix a′ (A)
that is tree-like on IL(Θ, bc) — i.e., transitive and left linear: for every
a ∈ IL(Θ, bc), and all a1 ≺ a and a2 ≺ a,
a1 ≺ a2 or a2 ≺ a1 or a2 = a1 .
In other words, time branches at the inverse limit IL(Θ, bc).
Even if the strings we are interested in are stutter-free, strings that
are not stutter-free can be useful. For instance, to relax the requirement
of L&L′ that L and L′ run in lockstep, let us collect the strings bcequivalent to a string in L in
Lbc := {s ∈ (2Θ )∗ | (∃s′ ∈ L) bc(s) = bc(s′ )}
and define the bc-superposition L&bc L′ of L and L′ to be the image
bc
under bc of the superposition of Lbc and L′
bc
L &bc L′ := {bc(s) | s ∈ Lbc & L′ }
(a regular language, if L and L′ are). Then for any two fluents a, a′ ∈ Θ,
the bc-superposition a &bc a′ is the set
′
{bc(s) | s ∈ (2{a,a } )∗ and bc{a} (s) =
a
and bc{a′ } (s) =
a′
}
consisting of 13 strings, one for each interval relation in Allen 1983.
More generally, for any finite set A = {a1 , . . . , an } ∈ Fin(Θ) of fluents,
3 Apart from applying bc, a string can also be made stutter-free by superposition
with ( tic )∗ ( + ǫ) for some fresh fluent tic. The crucial point is that stutterfreeness ensures the vocabulary is large enough to express the distinctions of interest
(lengthening a string if necessary).
References / 541
the bc-superposition
a1
&bc · · · &bc
an
represents the event structures over A in the sense of Russell-Wiener
(Kamp & Reyle 1993, Fernando 2015).
23.3.3 Taking Stock
What are we to make of the difference between the institutions in sections 2 and 3? At its simplest, the difference is between, on the one
hand, a program or automaton (as piece of code) and, on the other
hand, an execution or run of it — a modern incarnation of the Aristotelian dichotomy between potentiality and actuality. Focusing on applications to natural language semantics, Table 3 lists contrasts based
not only on the widespread encoding of linguistic resources as feature
structures (including frames), but also on the notion defended in Carlson 1995 that the truth of a generic statement rests not on “the episodic
instances but rather the causal forces behind those instances” (page
225), as well as the distinction between individual-level and stage-level
predicates (Carlson 1977).
section 2
automata
resource
generic
causal
force
universal
individual-level
section 3
run
use
episodic
temporal
event
particular/instance
stage-level
Table 3
Much work remains to flesh out Table 3, and win over the skeptical
reader. At stake in Table 3 is justification for viewing the directed
graphs in F&V as finite automata.4
References
Allen, James F. 1983. Maintaining knowledge about temporal intervals. In
Communications of the ACM , vol. 26, pages 832–843.
Barwise, Jon. 1974. Axioms for abstract model theory. Annals of Mathematical Logic 7:221–265.
4 My thanks to Cleo Condoravdi for inviting me to contribute to this Festschrift,
András Kornai for feedback on this paper, and, not to forget, Lauri Karttunen for
setting standards towards which to aspire.
542 / Tim Fernando
Barwise, Jon and Larry Moss. 1996. Vicious Circles: On the Mathematics of
Non-Wellfounded Phenomena.. CSLI.
Blackburn, Patrick. 1993. Modal logic and attribute value structures. In
M. de Rijke, ed., Diamonds and Defaults, pages 19–65. Kluwer.
Brzozowski, Janusz A. 1964. Derivatives of regular expressions. Journal of
the ACM pages 481–494.
Carlson, Greg N. 1977. A unified analysis of the English bare plural. Linguistics & Philosophy 1:413–458.
Carlson, Greg N. 1995. Truth conditions of generic sentences: Two contrasting views. In The Generic Book , pages 224–237. University of Chicago
Press.
Cooper, Robin. 2012. Type theory and semantics in flux. In R. Kempson,
T. Fernando, and N. Asher, eds., Philosophy of Linguistics, pages 271–323.
North-Holland.
Diaconescu, Răzvan. 2012. Three decades of institution theory. In J.-Y.
Beziau, ed., Universal Logic: An Anthology, pages 309–322. Springer.
Fernando, Tim. 2015. The semantics of tense and aspect: A finite-state
perspective. In S. Lappin and C. Fox, eds., The Handbook of Contemporary
Semantic Theory, Second Edition, pages 203–236. Wiley.
Fernando, Tim. 2016.
Types from frames as finite automata.
In
A. Foret, G. Morrill, R. Muskens, and R. Osswald, eds., Formal Grammar 2015/2016 , pages 19–40. Springer.
Fillmore, Charles J. 1982. Frame semantics. In Linguistics in the Morning
Calm, pages 111–137. Hanshin Publishing Co.
Goguen, Joseph and Rod Burstall. 1992. Institutions: Abstract model theory
for specification and programming. Journal of the ACM 39:95–146.
Hennessy, Matthew and Robin Milner. 1985. Algebraic laws for nondeterminism and concurrency. Journal of the ACM 32:137–161.
Hopcroft, John and Jeffrey Ullman. 1979. Inroduction to Automata Theory,
Languages, and Computation. Addison-Wesley.
Kamp, Hans and Uwe Reyle. 1993. From Discourse to Logic. Kluwer.
Karttunen, Lauri. 1984. Features and values (F&V). In COLING ’84 , pages
28–33.
Karttunen, Lauri. 2007. Word play. Computational Linguistics 33:443–467.
Kornai, András. 2017. Truth or dare. This volume.
Kutz, Oliver, Till Mossakowski, and Dominik Lücke. 2010. Carnap, goguen,
and the hyperontologies: Logical pluralism and heterogeneous structuring
in ontology design. Logica Universalis 4:255–333.
Libkin, Leonid. 2010. Elements of Finite Model Theory. Springer.
Shan, Chung-chieh. 2015. Splitting hairs. In Proceedings of the 20th Amsterdam Colloquium, pages 363–367.
Tarlecki, Andrzej, Rod Burstall, and Joseph Goguen. 1991. Some fundamental algebraic tools for the semantics of computation: Part 3 indexed
categories. Theoretical Computer Science pages 239–264.