Finite-state Methods Featuring Semantics

Tim Fernando

Finite-state Methods Featuring Semantics

Tim Fernando

2020

visibility

…

description

18 pages

link

1 file

23 Finite-state Methods Featuring Semantics Tim Fernando 23.1 Introduction “It may turn out to be very useful for semantic representations too.” So concludes the abstract of Lauri Karttunen’s COLING 84 paper, Features and values (F&V), referring to “the new Texas version of the ‘DG (directed graph)’ package” which was “primarily intended for representing morphological and syntactic information” (page 28). That the directed graph was essentially a ﬁnite automaton may have been too obvious an observation for F&V to state — or, assuming it had already been stated, restate. Be that as it may, this observation is used in Fernando 2016 to extract typed features structures from Robin Cooper’s record type approach to frames (Fillmore 1982, Cooper 2012). I restate the observation here to develop its uses for semantic representations further, egged on by the aforementioned statement from F&V, and (as with Kornai 2017) the prospects of bringing together “two lines of research that lie at the opposite ends on the ﬁeld” (Karttunen 2007). Just how to view feature structures as ﬁnite automata is detailed in the next section (section 2); why this view might pay oﬀ is explored in section 3. As bottom-dwellers in the Chomsky hierarchy, ﬁnite automata have well-known limitations to test the maxim keep it simple. Finite-state methods are structured below around semantic notions |= of satisfaction between models and sentences, kept simple through Leibniz’s Law, Identity of Indiscernibles. A logical formalism well-known from HenTokens of Meaning: Papers in Honor of Lauri Karttunen. Cleo Condoravdi and Tracy Holloway King (eds.). Copyright 2018, CSLI Publications. � 525 526 / Tim Fernando nessy & Milner 1985 (among other papers) is applied to feature structures in section 2 broadly along the lines of Blackburn 1993, but with particular attention to certain sets of strings to which directed graphs can be reduced, called trace sets. A set Σ of attributes is paired with a trace set T ⊆ Σ∗ for a signature (Σ, T ), picking out the set Mod (Σ, T ) of trace sets L sandwiched between T and Σ∗ T ⊆ L ⊆ Σ∗ . A trace set L ∈ Mod (Σ, T ) is a (Σ, T )-model , as the notation Mod (Σ, T ) suggests, against which to evaluate a Σ-sentence. A (Σ, T )-model L can be construed as a record of record type (Σ, T ) with an s-component Ls , for every string s ∈ T , that is a trace set satisfying a Σ-sentence ϕ precisely if L satisﬁes the Σ-sentence hsiϕ L |= hsiϕ ⇐⇒ Ls |= ϕ. To analyze satisfaction |=, it suﬃces to keep the set Σ in a signature (Σ, T ) ﬁnite, and integrate diﬀerent signatures within a category (following the so-called Grothendieck construction). Behind the somewhat technical details below is the intuition that signatures are bounded granularities that simplify calculations of satisfaction |=. That simpliﬁcation, called the Translation Axiom in Barwise 1974 (page 235) and the Satisfaction Condition in Goguen & Burstall 1992 (page 102), applies to uniﬁcation in F&V with negative and disjunctive constraints that reﬁne the sets Mod (Σ, T ). Supposing a typed feature structure can be viewed as a ﬁnite automaton (which section 2 takes pains to show), so what ? To make the view compelling, we turn in section 3 to runs of ﬁnite automata with the eventual goal of understanding these runs as uses of linguistic resources encoded by typed feature structures. An approach to temporality in which time arises from running automata is presented paralleling section 2, with Monadic Second Order logic in place of Hennessy-Milner logic, and superposition in place of uniﬁcation (for building models bottom-up, subject to constraints). Careful attention is paid to shifts in bounded granularity and to the assorted forces that take shape as granularity is reﬁned. This is in contrast to the practice of ﬁxing some space of possible worlds once and for all, without any provisions for varying granularity. 23.2 From Features to Strings and Types Some notions taken up in F&V are collected in the ﬁrst column of Table 1, which we analyze in this section according to the second column. Finite-state Methods Featuring Semantics / 527 path ha1 · · · an i of attributes (rooted) directed graph G generalize(G, G′ ) constraints C unifyC (G, G′ ) string a1 · · · an set L(G) of strings L(G) ∩ L(G′ ) set ΦC of sentences with ¬, ∨ L(G) ∪ L(G′ ) if it satisﬁes ΦC Table 1 We take for granted in Table 1 a set Σ of attributes (a, ai , . . .), and deﬁne a Σ-deterministic system to be a partial function δ : Q × Σ ⇁ Q to some set Q of nodes from the set Q × Σ of node-attribute pairs. a We picture a triple (q, a, δ(q, a)) in δ as a deterministic transition q → δ(q, a), and formulate a (rooted) directed graph G as a pair (δ, q) of a Σ-deterministic system δ and a node q ∈ Q. To deﬁne the language L(G) ⊆ Σ∗ , let δq : Σ∗ ⇁ Q be the ⊆-smallest subset F of Σ∗ × Q such that (i) (ǫ, q) ∈ F , and (ii)(sa, q ′′ ) ∈ F whenever (s, q ′ ) ∈ F and (q ′ , a, q ′′ ) ∈ δ. Now, if the directed graph G is the pair (δ, q), then its language L(G) is the set dom(δq ) of strings s for which δq (s) is deﬁned. The language dom(δq ) is called the trace set of (δ, q), and strings in dom(δq ) are called traces of (δ, q). Next, with Σ ﬁxed, we express constraints through the set sen(Σ) of sentences ϕ generated from attributes a ∈ Σ by the grammar ϕ ::= ⊤ | haiϕ | ¬ϕ | ϕ ∨ ϕ′ interpreted against a Σ-deterministic system δ : Q×Σ ⇁ Q by a binary relation |=δ ⊆ Q × sen(Σ) that treats ⊤ as a tautology q |=δ ⊤ for every q ∈ Q, hai as the Diamond modal operator with accessibility relation δ(·, a) q |=δ haiϕ ⇐⇒ (q, a) ∈ dom(δ) and δ(q, a) |=δ ϕ, ¬ as Boolean negation q |=δ ¬ϕ and ∨ as Boolean disjunction q |=δ ϕ ∨ ϕ′ ⇐⇒ ⇐⇒ not q |=δ ϕ q |=δ ϕ or q |=δ ϕ′ (Hennessy & Milner 1985, Blackburn 1993). Collecting the sentences in sen(Σ) that q |=δ -satisﬁes in sen Σ (δ, q) := {ϕ ∈ sen(Σ) | q |=δ ϕ}, 528 / Tim Fernando it turns out that directed graphs satisfy the same subset of sen(Σ) precisely if they have the same trace set sen Σ (δ, q) = sen Σ (δ ′ , q ′ ) 1 ⇐⇒ dom(δq ) = dom(δq′ ′ ) (23.28) (Hennessy & Milner 1985). Under Leibniz’s Identity of Indiscernibles, with discernibility based on sen(Σ), (23.1) reduces a directed graph (δ, q) to its trace set dom(δq ). The trace set captures a fragment of sen(Σ) dom(δq ) = {s ∈ Σ∗ | hsi⊤ ∈ sen Σ (δ, q)} consisting of sentences of the form hsi⊤, where for every ϕ ∈ sen(Σ), the sentence hsiϕ in sen(Σ) is deﬁned by induction on s ∈ Σ∗ , starting with the null string ǫ, hǫiϕ := ϕ and using modal operators hai elsewhere hasiϕ := haihsiϕ so that ha1 · · · an iϕ is ha1 i · · · han iϕ and q |=δ hsiϕ ⇐⇒ s ∈ dom(δq ) and δq (s) |=δ ϕ. (23.29) In the remainder of this section, we replace q and δ in (23.2) by a preﬁx-closed language over Σ with derivatives (in §2.1), and expand Σ to ﬂesh out Table 1 (in §2.2), systematized category-theoretically (in §2.3) to link up with section 3. 23.2.1 Languages and Transitions to Derivatives Given a set L ⊆ Σ∗ of strings over Σ and a string s over Σ, the sderivative of L is the set Ls := {s′ | ss′ ∈ L} of strings that put after s belong to L (Brzozowski 1964). For any Σdeterministic system δ : Q × Σ ⇁ Q and node q, one can check that dom(δq ) is the set of strings s such that the null string ǫ is in the s-derivative of dom(δq ) s ∈ dom(δq ) ⇐⇒ ǫ ∈ (dom(δq ))s and that for every s ∈ dom(δq ), the s-derivative of dom(δq ) is the trace set of (δ, δq (s)) (dom(δq ))s = dom(δq′ ) where q ′ = δq (s). 1 Readers familiar with, for example, Barwise & Moss 1996 will note that determinism simpliﬁes matters considerably, reducing bisimulation equivalence between (δ, q) and (δ′ , q ′ ) to trace equivalence dom (δq ) = dom (δq′ ′ ), and allowing us to talk of sets of strings instead of non-well-founded sets. Finite-state Methods Featuring Semantics / 529 Indeed, the chain of equivalences a1 a2 · · · an ∈ L ⇐⇒ a2 · · · an ∈ La1 ⇐⇒ · · · ⇐⇒ ǫ ∈ La1 ···an from a1 · · · an to the null string ǫ means that L is accepted by the deterministic automaton with - s-derivatives Ls as states initial state L = Lǫ a-transitions from Ls to Lsa (for every symbol a ∈ Σ) ﬁnal (accepting) states Ls such that ǫ ∈ Ls . The s-derivative of L equals the s′ -derivative of L precisely if s and s′ concatenate with the same strings to produce strings in L Ls = Ls′ ⇐⇒ (∀w ∈ Σ∗ ) (sw ∈ L ⇐⇒ s′ w ∈ L) so that the Myhill-Nerode Theorem says that for ﬁnite Σ, L is regular ⇐⇒ {Ls | s ∈ Σ∗ } is ﬁnite (e.g. Hopcroft & Ullman 1979). Note that Ls is non-empty precisely if s is the preﬁx of some string in L. Moreover, if Ls is empty then so is Lsa for every a ∈ Σ. That is, ∅ is a sink state that we may safely exclude from the states of the automaton above, at the cost of making the transition function partial. Let us call a language L preﬁx-closed if for all sa ∈ L, s ∈ L. Note that trace sets are preﬁx-closed and non-empty. Let Mod (Σ) denote the set Mod (Σ) := {L ⊆ Σ∗ | L 6= ∅ and L is preﬁx-closed} of non-empty preﬁx-closed subsets of Σ∗ , and let us refer to an element of Mod (Σ) as a Σ-state. Not only are trace sets Σ-states, but conversely, if δ̂ is the Σ-deterministic system {(L, a, La ) | L ∈ Mod (Σ) and a ∈ Σ ∩ L} then every Σ-state L is the trace set of (δ̂, L). Keeping δ̂ implicit, a Σstate L makes an s-transition to its s-derivative Ls precisely if s ∈ L, specializing the biconditional (2) from the previous page to L |= hsiϕ ⇐⇒ s ∈ L and Ls |= ϕ. 23.2.2 Adding Attributes, Types and Constraints Identity as indiscernibility relative to sen(Σ) presupposes that all differences which matter are captured by the set Σ. An obvious problem is that the single trace set {ǫ} cannot diﬀerentiate between atomic values. But it is easy enough to introduce for every atomic value v, a fresh 530 / Tim Fernando attribute av to Σ for say, the trace set {av , ǫ}. At least two objections can be made to this move. The ﬁrst is that a trace set of {ǫ} is arguably what it means for a value v to be atomic; any larger trace set would make v non-atomic. If “atomic” is understood this way, identity as indiscernibility leaves us no choice but to diﬀerentiate between values by making all but perhaps one of them non-atomic. A more serious objection is that if the alphabet Σ is to be ﬁnite, then we cannot introduce fresh attributes to Σ indeﬁnitely. Or can we? Given any set A, no matter how large, we can form its set Fin(A) of ﬁnite subsets Fin(A) := {Σ ⊆ A | Σ is ﬁnite} and let Σ vary over members of Fin(A); each attribute a ∈ A−Σ added to Σ leads to the diﬀerent member Σ ∪ {a} of Fin(A). The challenge then becomes to implement the variations in Σ systematically. This is where signatures and institutions enter. But ﬁrst, it will prove convenient to expand sen(Σ) with a modal operator ✸ for a sentence ✸ϕ equivalent to the disjunction over all s ∈ Σ∗ of the sentences hsiϕ. More precisely, q |=δ ✸ϕ ⇐⇒ (∃s ∈ Σ∗ ) q |=δ hsiϕ (23.30) for any Σ-deerministic system δ : Q × Σ ⇁ Q and node q ∈ Q. Incorporating ✸ into sen(Σ) and sen Σ (δ, q) for sen ✸ (Σ) and sen ✸ Σ (δ, q) respectively, it is not dﬃcult to verify that trace equivalence remains indiscernibility up to sen ✸ (Σ) ✸ ′ ′ ′ sen ✸ Σ (δ, q) = sen Σ (δ , q ) ⇐⇒ dom(δq ) = dom(δq′ ). Thus, we can again reduce (δ, q) to its trace set dom(δq ) and |=δ to a binary relation |=Σ ⊆ Mod (Σ) × sen ✸ (Σ) between a Σ-state L and a sentence ϕ ∈ sen ✸ (Σ), simplifying (23.3) to L |=Σ ✸ϕ ⇐⇒ (∃s ∈ Σ∗ ) L |=Σ hsiϕ ⇐⇒ (∃s ∈ L) Ls |=Σ ϕ (adding the subscript Σ to prepare for the aforementioned variations). As usual, we let ✷ϕ abbreviate ¬✸¬ϕ for L |=Σ ✷ϕ ⇐⇒ (∀s ∈ L) Ls |=Σ ϕ alongside the Boolen conventions ϕ ⊃ ψ for ψ ∨ ¬ϕ, and ϕ ∧ ψ for ¬(¬ϕ ∨ ¬ψ). Given a subset Φ of sen ✸ (Σ), we say a Σ-state L is a Σ-model of Φ, and write L |=Σ Φ, if it satisﬁes every sentence in Φ L |=Σ Φ ⇐⇒ (∀ϕ ∈ Φ) L |=Σ ϕ. Now, to pick out a particular Σ-state through a sentence ϕ, let Uniq Σ (ϕ) Finite-state Methods Featuring Semantics / 531 be the set Uniq Σ (ϕ) := {✸(ϕ ∧ ψ) ⊃ ✷(ϕ ⊃ ψ) | ψ ∈ sen(Σ)} of implications ✸(ϕ ∧ ψ) ⊃ ✷(ϕ ⊃ ψ) ensuring that if ψ should ever occur with ϕ, it always occurs with ϕ. Since trace equivalence is indiscernibility with respect to sen(Σ), it follows that the sentences ψ appearing in Uniq Σ (ϕ) can be restricted to those of the form hsi⊤ for s ∈ Σ∗ without changing the Σ-models of Uniq Σ (ϕ), and that (†) for any Σ-model L of Uniq Σ (ϕ), and s, s′ ∈ L, if Ls |=Σ ϕ and Ls′ |=Σ ϕ then Ls = Ls′ . If we are to introduce an attribute av to name a particular value v through the sentence hav i⊤, then we must restrict our (Σ∪{av })-states to (Σ ∪ {av })-models of Uniq Σ∪{av } (hav i⊤). An attribute a might also be introduced to name a type that applies to more than one (Σ ∪ {a})state, implicating a (Σ∪{a})-state that fails to satisfy some sentence in Uniq Σ∪{a} (hai⊤). There is a curious twist here on treatments of “identity and mere likeness” (F&V, page 29) and re-entrancy (connected with a feature path s that appears in the present set-up as a subscript in Ls and inside a modal operator in hsiϕ). Σ-states L and L′ can be distinct only if some sentence in sen(Σ) diﬀerentiates them (shifting, as it were, the burden of proof from identiﬁcation to diﬀerentiation, and suggesting reﬁnements of identity through expansions of Σ). Additional attributes may serve purposes other then reﬁning discernibility. For example, they may provide representations of sentences in sen ✸ (Σ) as follows. Given a subset Φ of sen ✸ (Σ), a sentence ϕ ∈ sen ✸ (Σ), and a string s ∈ Σ∗ , let us agree that s (Σ, Φ)-represents ϕ if every Σ-model of Φ satisﬁes ✷(ϕ ≡ hsi⊤) where ϕ ≡ ψ is (ϕ ⊃ ψ) ∧ (ψ ⊃ ϕ), and consequently, for any Σ-state L, L |=Σ ✷(ϕ ≡ ψ) ⇐⇒ (∀s ∈ L)(Ls |=Σ ϕ ⇐⇒ Ls |=Σ ψ). Because we can build ϕ with the connectives ¬ and ∨, we cannot expect there to be a string that (Σ, ∅)-represents ϕ. But we can always introduce an attribute aϕ 6∈ Σ and set Φ to {✷(ϕ ≡ haϕ i⊤)} so that aϕ (Σ ∪ {aϕ }, Φ)-represents ϕ. And we can put together attributes aϕ and aψ that (Σ, Φ)-represent ϕ and ψ respectively, as Σ-models of Φ satisfy ✷ ((ϕ ∧ ψ) ≡ (haϕ i⊤ ∧ haψ i⊤)). 532 / Tim Fernando We can then avoid the addition of aϕ∧ψ , provided we generalize our notion of representation to a language L̂ ⊆ Σ∗ as follows. We say L̂ (Σ, Φ)-represents ϕ if for every Σ-model L of Φ and s ∈ L, Ls |=Σ ϕ ⇐⇒ (∀s′ ∈ L̂) Ls |= hs′ i⊤ ⇐⇒ L̂ ⊆ Ls (23.31) Clearly, a string s (Σ, Φ)-represents ϕ iﬀ the singleton language {s} (Σ, Φ)-represents ϕ. But why should we care about representing sentences by languages? Table 1 at the beginning of the present section mentions not only directed graphs G and G′ but also constraints C. Directed graphs are formulated here as Σ-states (models), and constraints as subsets of sen ✸ (Σ). A Σ-state can be viewed as a token, and a sentence ϕ in sen ✸ (Σ) as the type Mod Σ (ϕ) := {L ∈ Mod (Σ) | L |=Σ ϕ} of Σ-states satisfying ϕ. A set Φ ⊆ sen ✸ (Σ) of sentences amounts to V the conjunction Φ specifying the type \ Mod Σ (Φ) := Mod Σ (ϕ) ϕ∈Φ of Σ-states satisfying every sentence in Φ. Inclusion ⊆ between sets of strings over Σ in (23.4) is easily confused with that between sets Mod Σ (ϕ) and Mod Σ (ψ) of such sets Mod Σ (ϕ) ⊆ Mod Σ (ψ) ⇐⇒ (∀L ∈ Mod (Σ)) L |=Σ ϕ ⊃ ψ signifying an entailment from ϕ to ψ (and reversing the direction in (23.4) from the less informative L̂ to the more informative Ls ). Converting a sentence ϕ to a Σ-state that (Σ, Φ)-represents it requires a set Φ of constraints that we can ﬁnd in, if necessary, an expansion of Σ. Resorting to Φ as {✷(ϕ ≡ haϕ i⊤)} with aϕ thrown into Σ is perhaps too easy, shoving all the work over to Φ. But there is surely a role for Φ, since L can only (Σ, ∅)-represent a sentence with the same Σ-models as {hsi⊤ | s ∈ L}, leaving out many sentences formed with negation ¬ and disjunction ∨. The models of a sentence ϕ that a language (Σ, ∅)represents are closed under inclusion ⊆ (∀L ∈ Mod Σ (ϕ))(∀L′ ∈ Mod (Σ)) L ⊆ L′ implies L′ |=Σ ϕ and intersection ∩ (∀L ∈ Mod Σ (ϕ))(∀L′ ∈ Mod Σ (ϕ)) L ∩ L′ |=Σ ϕ. But closure under intersection fails for the negation ¬hai⊤, and closure under intersection fails for the disjunction hai⊤ ∨ ha′ i⊤ (with two Finite-state Methods Featuring Semantics / 533 diﬀerent ⊆-minimal models, for a 6= a′ ). As a binary operation on directed graphs, uniﬁcation in F&V is deﬁned on Σ-states, and, pace Blackburn 1993, not on sentences (in terms of the connective ∧). The constraints determining when two directed graphs are uniﬁable does, however, bring in sen ✸ (Σ), as does talk of negative and disjunctive features inasmuch as these involve the sen ✸ (Σ)-connectives ¬ and ∨. Evidently, a mix of Σ-states and Σ-sentences is required. Accordingly, let us pair Σ with a language T ⊆ Σ∗ , revising Mod (Σ) to Mod (Σ, T ) := {L ∈ Mod (Σ) | T ⊆ L} and Mod Σ (Φ), for Φ ⊆ sen ✸ (Σ), to Mod Σ,T (Φ) := {L ∈ Mod Σ (Φ) | T ⊆ L}. Then relative to constraints Φ, we can analyze the uniﬁcation of Σstates L and L′ in terms of Mod Σ,L∪L′ (Φ), which may be empty even if neither Mod Σ,L (Φ) nor Mod Σ,L′ (Φ) is, accounting for the partiality of uniﬁcation L and L′ are uniﬁable relative to Φ ⇐⇒ Mod Σ,L∪L′ (Φ) 6= ∅. Negation and disjunction in features may (or may not) require expanding Σ with a¬ϕ and aϕ∨ψ , and Φ with constraints for ϕ′ ∈ {¬ϕ, ϕ ∨ ψ}. ✷(ϕ′ ≡ haϕ′ i⊤) Fixing some large set A to which all the required attributes belong, we let Σ vary over the set Fin(A) of ﬁnite subsets of A, and note Fact 1 Let Σ′ ∈ Fin(A), Σ ⊆ Σ′ , ϕ ∈ sen(Σ), and L′ ∈ Mod (Σ′ ). Then L′ ∩ Σ∗ ∈ Mod (Σ) and L′ |=Σ′ ϕ ⇐⇒ L′ ∩ Σ∗ |=Σ ϕ and moreover, for every s ∈ L′ ∩ Σ∗ , (L′ ∩ Σ∗ )s ∈ Mod (Σ) and L′ |=Σ′ hsiϕ ⇐⇒ (L′ ∩ Σ∗ )s |=Σ ϕ. The ﬁrst part of Fact 1 says that the attributes that matter in satisfying ϕ are only those that appear in ϕ,2 while the second part interprets the modal operator hsi against Σ′ -states L′ under the presupposition that s belongs to L′ . 2 Fact 1 leaves ✸ out of ϕ precisely because ✸ does not identify the attributes relevant to the satisfaction of sentences built with ✸. To bring ✸ into ϕ in Fact 1, we can add subscripts X ranging over subsets of Σ to make the pertinent attributes in ✸X explicit, with q |=δ ✸X ψ ⇐⇒ (∃s ∈ X ∗ ) q |=δ hsiψ (Fernando 2016). 534 / Tim Fernando 23.2.3 The Grothendieck Construction and an Institution Some category-theoretic structure lurking in Fact 1 will resurface in section 3 under a diﬀerent guise and is worth spelling out. We ﬁx a large set A of attributes, and for each ﬁnite subset Σ ∈ Fin(A) of A, turn the set Mod (Σ) of Σ-states into a category Q(Σ) as follows. A Q(Σ)-morphism from Σ-state L to Σ-state L′ is a pair (L, s) with s ∈ L and Ls = L′ . Q(Σ)-morphisms compose by concatenating strings (L, s); (Ls , s′ ) := (L, ss′ ) and (L, ǫ) is the identity morphism for L. Whenever Σ ⊆ Σ′ ∈ Fin(A), we deﬁne the functor Q(Σ′ , Σ) : Q(Σ′ ) → Q(Σ) from Q(Σ′ ) to Q(Σ) mapping - a Σ′ -state L′ to the Σ-state L′ ∩ Σ∗ , and - a Q(Σ′ )-morphism (L′ , s) to the Q(Σ)-morphism (L′ ∩ Σ∗ , πΣ (s)) where πΣ (s) is the longest preﬁx of s in Σ∗ πΣ (ǫ) := ǫ a πΣ (s) πΣ (as) := ǫ if a ∈ Σ otherwise. Construing Fin(A) as a category with morphisms given by inclusion ⊆, the foregoing deﬁnes a contravariant functor Q : Fin(A)op → Cat into the category Cat of small categories. The Grothendieck construction (e.g.,R Tarlecki, Burstall & Goguen 1991) applied to Q yields the category Q where - an object is a pair (Σ, L) ∈ Fin(A) × Mod (Σ), and - a morphism from (Σ′ , L′ ) to (Σ, L) is a pair ((Σ′ , Σ), (L′′ , s)) of a Fin(A)op -morphism (Σ′ , Σ) and a Q(Σ)-morphism (L′′ , s) such that L′′ = L′ ∩ Σ∗ and L = L′′s . R Reversing the morphisms in Q for the category Sign of signatures (Σ, L), we deﬁne two functors from Sign, one covariant and the other contravariant (i) sen : Sign → Set with sen(Σ, L) := sen(Σ) and sen((Σ, Σ′ ), (L′′ , s)) : ϕ 7→ hsiϕ (ii)Mod : Signop → Cat where the set Mod (Σ, L) of Σ-states that ⊆-contain L is turned into a full subcategory of Q(Σ), and Mod ((Σ′ , Σ), (L′′ , s)) : L̂ 7→ (L̂ ∩ Σ∗ )s . Finite-state Methods Featuring Semantics / 535 To build an institution (Goguen & Burstall 1992) from Sign, sen, and Mod , it remains to form, for every signature (Σ, L), a relation |=Σ,L by intersecting |=Σ with Mod (Σ, L) × sen(Σ). Fact 1 is essentially the Satisfaction Condition characterizing institutions for every signature (Σ′ , L′ ), subset Σ of Σ′ , string s ∈ L′ ∩ Σ∗ , sentence ϕ ∈ sen(Σ), and L̂ ∈ Mod (Σ′ , L′ ), L̂ |=Σ′ ,L′ hsiϕ ⇐⇒ (L̂ ∩ Σ∗ )s |=Σ,L ϕ where the subscript L above is short for (L′ ∩ Σ∗ )s . Introduced by Goguen and Burstall to cope with the proliferation of logical systems in computer science, the notion of an institution has attracted considerable attention and found numerous applications (e.g. Diaconescu 2012, Kutz et al 2010). Under Fact 1, features and values can be seen as part of that body of work. 23.3 Time for and from Running Automata It is one thing to encode a linguistic resource as a feature structure equivalent to a ﬁnite automaton. It is quite another matter to understand the use of such a resource as the use of a ﬁnite automaton. To use a ﬁnite automaton is (arguably ﬁrst and foremost) to run it, accepting strings that end in a ﬁnal/accepting state. But such runs take place in isolation, whereas it is only in combination with other resources that the encoding or use of a linguistic resource is interesting. The whole point of the category-theoretic approach from the previous section is to relate diﬀerent feature structures. Similarly, the present section considers runs of an automaton not so much in isolation as in combination with other automata, constructing a notion of time from such runs. A simple way to superpose runs of two ﬁnite automata is deﬁned in §3.1, and related to the approximation of Priorean temporal models in §3.2 by strings constructed from temporal propositions. We adopt the custom from Artiﬁcial Intelligence of referring to temporal propositions as ﬂuents. We ﬁx some large set Θ of ﬂuents much as we ﬁxed a large set A of attributes in the previous section. The plan roughly is to ﬁll out Table 2, embracing Leibniz’ Identity of Indiscernibles (as in section 2), with granularity given by a ﬁnite subset A of Θ (analogous to Σ ∈ Fin(A) in section 2) to form strings over the alphabet 2A of subsets of A. The ⊆-larger the subset A, the more reﬁned the A-models and the more expressive the A-sentences can be. 536 / Tim Fernando section 2 unify graphs A of attributes Σ ∈ Fin(A) language over Σ Hennessy-Milner information merge large set grain/signature model sentence section 3 superpose strings Θ of ﬂuents A ∈ Fin(Θ) string over 2A Monadic Second-Order Table 2 Helpful examples for orientation are provided by representations of a calendar year at various granularities. The set A = {Jan, Feb, . . ., Dec} of months suggests the string sA := Jan Feb · · · Dec of length 12. Enlarging A with days d1,d2,. . .,d31 A′ := A ∪ {d1,d2. . .,d31} reﬁnes sA to the string sA′ := Jan,d1 Jan,d2 · · · Jan,d31 Feb,d1 · · · Dec,d31 of length 366 for a leap year. We draw boxes (instead of the usual curly braces { and }) around sets qua symbols to suggest a ﬁlm strip. A change in A can cause a box to split (much like hairs in Shan 2015), as Jan in sA does (30 times) on adding days Jan ❀ Jan,d1 Jan,d2 · · · Jan,d31 in sA′ . Similarly, a common Reichenbachian account of the progressive puts a reference time R inside the event time E, splitting E into 3 boxes E ❀ E E,R E (one before, one simultaneous, and one after R). This and many other examples in tense and aspect are taken up at length in Fernando 2015. The aim of the present section is to link that work with the previous section through the notion of an institution. The hope is that this might contribute to understanding the use of linguistic resources encoded as feature structures in terms of runs of ﬁnite automata — runs that give rise to time at bounded granularities. 23.3.1 From Superposition to Reducts and MSO Given two equally long strings s = α1 · · · αn and s′ = α′1 · · · α′n of sets αi and α′i , let use deﬁne the superposition s&s′ of s and s′ to be the string obtained by their componentwise unions αi ∪ α′i α1 · · · αn & α′1 · · · α′n := (α1 ∪ α′1 ) · · · (αn ∪ α′n ). Finite-state Methods Featuring Semantics / 537 For example, E E E & R = E E,R E . Extending the operation to sets L and L′ of strings of sets, the superposition L&L′ of L and L′ is the set of superpositions of strings of the same length from L and L′ L & L′ := {s&s′ | (s, s′ ) ∈ L × L′ and length(s) = length(s′ )} allowing us to conﬂate a string s with its singleton language {s} (making s&s′ = ∅ in case s and s′ diﬀer in length). Given ﬁnite automata accepting L and L′ , the usual product construction on ﬁnite automata for their intersection L ∩ L′ (e.g. Hopcroft & Ullman 1979) can be adjusted to combine transitions →L for L and →L′ for L′ to form nondeterministic transitions α∪α′ (q, q ′ ) → (r, r′ ) ⇐⇒ α α′ q →L r and q ′ →L′ r′ for L&L′ in lockstep but with labels that may diﬀer. We will loosen the lockstep requirement in §3.2, but ﬁrst consider constraints that we might impose on superposition (analogous to C on unifyC (G, G′ ) in section 2). For example, we may wish to require that a ﬂuent a is never followed by a ﬂuent b, as expressed by the predicate logic formula (∀x)(∀y)(Pa (x) ∧ S(x, y) ⊃ ¬Pb (y)) where x and y range over string positions, and S(x, y) says: next after position x is y while for every ﬂuent a ∈ Θ, Pa (x) says: a occurs at position x. More precisely, given a string s = α1 · · · αn of n sets αi of ﬂuents, we can interpret S as the binary relation on the set Sn := {(1, 2), (2, 3), . . . , (n − 1, n)} [n] := {1, 2, . . . , n} of integers from 1 to n, and Pa as the subset Pas := {i ∈ [n] | a ∈ αi } (where s = α1 · · · αn ) of [n], for each ﬂuent a. That is, a string s ∈ (2Θ )n speciﬁes a structure Ms := h[n], Sn , {Pas }a∈Θ i against which to interpret predicate logic formulas built from S and the Pa ’s such as the formulas ϕ of Monadic Second-Order Logic (MSO; 538 / Tim Fernando e.g. Libkin 2010) generated by the seven clauses ϕ ::= S(x, y) | Pa (x) | X(x) | ϕ ∨ ϕ′ | ¬ϕ | ∃xϕ | ∃Xϕ from three disjoint inﬁnite sets Var 1 , Var 2 and Θ of ﬁrst-order variables x, y ∈ Var 1 , second-order variables X ∈ Var 2 , and ﬂuents a ∈ Θ, respectively. For any such MSO-formula ϕ, only ﬁnitely many ﬂuents may occur in ϕ, which we collect in ϕ’s vocabulary, voc(ϕ) ∈ Fin(Θ) voc(S(x, y)) = voc(X(x)) = ∅ voc(Pa (x)) = {a} voc(ϕ ∨ ϕ′ ) = voc(ϕ) ∪ voc(ϕ′ ) voc(¬ϕ) = voc(∃xϕ) = voc(∃Xϕ) = voc(ϕ). An MSO-sentence is understood to be an MSO-formula in which all variable occurrences are bound. For every A ∈ Fin(Θ), we put every MSO sentence with vocabulary contained in A into the set MSO (A) MSO (A) := {ϕ | ϕ is an MSO-sentence and voc(ϕ) ⊆ A} and deﬁne a binary relation |=A ⊆ (2A )∗ × MSO (A) between (2A )∗ and MSO (A) in the usual Tarskian manner, associating a string s ∈ (2A )∗ with Ms . (Apologies for reusing the symbol |=.) For any string s of sets of ﬂuents, let the A-reduct ρA (s) of s be the componentwise intersection of s with A ρA (α1 · · · αn ) := (α1 ∩ A) · · · (αn ∩ A) (so-called because ρA (s) is precisely the part of s needed to extract from Ms its A-reduct h[n], Sn , {Pas }a∈A i). Fact 2 For all A ∈ Fin(Θ), ϕ ∈ MSO(A) and s ∈ (2A )∗ , s |=A ϕ ⇐⇒ ρvoc(ϕ) (s) |=voc(ϕ) ϕ . With Fact 2, the relations {|=A }A∈F in(Θ) become an institution with signature category Fin(Θ) provided we (i) extend the map A 7→ MSO(A) to pairs (A, A′ ) such that A ⊆ A′ ∈ Fin(Θ), setting MSO(A, A′ ) to the inclusion MSO (A) ֒→ MSO (A′ ) mapping ϕ ∈ MSO(A) ⊆ MSO(A′ ) to itself, and (ii)turn the map A 7→ (2A )∗ into a contravariant functor M from ′ Fin(Θ) so that whenever A ⊆ A′ ∈ Fin(Θ), M(A′ , A) : (2A )∗ → ′ (2A )∗ is the restriction of ρA to (2A )∗ M(A′ , A)(s) = ρA (s) ′ for all s ∈ (2A )∗ . Finite-state Methods Featuring Semantics / 539 Büchi’s theorem equating sentences inMSO (A) with regular languages over A (e.g. Libkin 2010, page 124) holds also in the present set-up for languages over 2A (the advantage of 2A over A being the availability of reducts for Fact 2). 23.3.2 Compression, Branching and Superposition Modiﬁed A string s ∈ (2A )∗ is understood above to have granularity A. Variations in A are described in Fact 2 that preserve string length using A-reducts. It is natural, however, to expect the length of a string to grow with A, as hinted by the discussion above of sA := Jan Feb · · · Dec and sA′ := Jan,d1 Jan,d2 · · · Jan,d31 Feb,d1 · · · Dec,d31 . Put the other way around, the A-reduct of sA′ ρA (sA′ ) = Jan has substrings such as Jan 31 31 Feb 29 · · · Dec d31 which we might compress to Jan for bc(ρA (sA′ )) = Jan Feb · · · Dec = sA where for any string s, bc(s) compresses blocks αn of n > 1 consecutive occurrences in s of the same symbol α to a single α, leaving s otherwise unchanged  if s = ααs′  bc(αs′ ) ′ ′ bc(s) := α bc(α s ) if s = αα′ s′ with α 6= α′  s otherwise. To require that time progress only with change (discernible at some bounded granularity A), let us work with strings α1 α2 · · · αn that are stutter-free in that αi 6= αi+1 for i from 1 to n − 1. That is, a string s is stutter-free ⇐⇒ s = bc(s). The restriction of bc to any ﬁnite alphabet is computable by a ﬁnitestate transducer, as are, for all A′ ∈ Fin(Θ) and A ⊆ A′ , the composition ρA ; bc for bcA bcA (s) := bc(ρA (s)) ′ for s ∈ (2A )∗ . Without the compression bc in bcA , we are left with the map ρA that leaves the ontology intact (insofar as the domain of an MSO-model is given by the string length), whilst restricting the vocabulary (for Areducts). The institution described by Fact 2 can be adjusted to another institution in which 540 / Tim Fernando - the models are stutter-free strings3 - the reducts ρA are replaced by bcA , and - the satisfaction relations |=′A are given by explicitly referring to the sentence’s vocabulary s |=′A ϕ ⇐⇒ bcvoc(ϕ) (s) |=voc(ϕ) ϕ. Compressing strings via bcA allows us to lengthen the strings by inversion. The inverse limit IL(Θ, bc) of Θ, bc consists of functions a : Fin(Θ) → Fin(Θ)∗ that respect the projections bcA a(A) = bcA (a(A′ )) whenever A ⊆ A′ ∈ Fin(Θ). The preﬁx relation on strings s preﬁx s′ ⇐⇒ s′ = sŝ for some ŝ lifts to maps a and a′ in IL(Θ, bc) by universal quantiﬁcation for an irreﬂexive relation a ≺ a′ ⇐⇒ a 6= a′ and (∀A ∈ Fin(Θ)) a(A) preﬁx a′ (A) that is tree-like on IL(Θ, bc) — i.e., transitive and left linear: for every a ∈ IL(Θ, bc), and all a1 ≺ a and a2 ≺ a, a1 ≺ a2 or a2 ≺ a1 or a2 = a1 . In other words, time branches at the inverse limit IL(Θ, bc). Even if the strings we are interested in are stutter-free, strings that are not stutter-free can be useful. For instance, to relax the requirement of L&L′ that L and L′ run in lockstep, let us collect the strings bcequivalent to a string in L in Lbc := {s ∈ (2Θ )∗ | (∃s′ ∈ L) bc(s) = bc(s′ )} and deﬁne the bc-superposition L&bc L′ of L and L′ to be the image bc under bc of the superposition of Lbc and L′ bc L &bc L′ := {bc(s) | s ∈ Lbc & L′ } (a regular language, if L and L′ are). Then for any two ﬂuents a, a′ ∈ Θ, the bc-superposition a &bc a′ is the set ′ {bc(s) | s ∈ (2{a,a } )∗ and bc{a} (s) = a and bc{a′ } (s) = a′ } consisting of 13 strings, one for each interval relation in Allen 1983. More generally, for any ﬁnite set A = {a1 , . . . , an } ∈ Fin(Θ) of ﬂuents, 3 Apart from applying bc, a string can also be made stutter-free by superposition with ( tic )∗ ( + ǫ) for some fresh ﬂuent tic. The crucial point is that stutterfreeness ensures the vocabulary is large enough to express the distinctions of interest (lengthening a string if necessary). References / 541 the bc-superposition a1 &bc · · · &bc an represents the event structures over A in the sense of Russell-Wiener (Kamp & Reyle 1993, Fernando 2015). 23.3.3 Taking Stock What are we to make of the diﬀerence between the institutions in sections 2 and 3? At its simplest, the diﬀerence is between, on the one hand, a program or automaton (as piece of code) and, on the other hand, an execution or run of it — a modern incarnation of the Aristotelian dichotomy between potentiality and actuality. Focusing on applications to natural language semantics, Table 3 lists contrasts based not only on the widespread encoding of linguistic resources as feature structures (including frames), but also on the notion defended in Carlson 1995 that the truth of a generic statement rests not on “the episodic instances but rather the causal forces behind those instances” (page 225), as well as the distinction between individual-level and stage-level predicates (Carlson 1977). section 2 automata resource generic causal force universal individual-level section 3 run use episodic temporal event particular/instance stage-level Table 3 Much work remains to ﬂesh out Table 3, and win over the skeptical reader. At stake in Table 3 is justiﬁcation for viewing the directed graphs in F&V as ﬁnite automata.4 References Allen, James F. 1983. Maintaining knowledge about temporal intervals. In Communications of the ACM , vol. 26, pages 832–843. Barwise, Jon. 1974. Axioms for abstract model theory. Annals of Mathematical Logic 7:221–265. 4 My thanks to Cleo Condoravdi for inviting me to contribute to this Festschrift, András Kornai for feedback on this paper, and, not to forget, Lauri Karttunen for setting standards towards which to aspire. 542 / Tim Fernando Barwise, Jon and Larry Moss. 1996. Vicious Circles: On the Mathematics of Non-Wellfounded Phenomena.. CSLI. Blackburn, Patrick. 1993. Modal logic and attribute value structures. In M. de Rijke, ed., Diamonds and Defaults, pages 19–65. Kluwer. Brzozowski, Janusz A. 1964. Derivatives of regular expressions. Journal of the ACM pages 481–494. Carlson, Greg N. 1977. A uniﬁed analysis of the English bare plural. Linguistics & Philosophy 1:413–458. Carlson, Greg N. 1995. Truth conditions of generic sentences: Two contrasting views. In The Generic Book , pages 224–237. University of Chicago Press. Cooper, Robin. 2012. Type theory and semantics in ﬂux. In R. Kempson, T. Fernando, and N. Asher, eds., Philosophy of Linguistics, pages 271–323. North-Holland. Diaconescu, Răzvan. 2012. Three decades of institution theory. In J.-Y. Beziau, ed., Universal Logic: An Anthology, pages 309–322. Springer. Fernando, Tim. 2015. The semantics of tense and aspect: A ﬁnite-state perspective. In S. Lappin and C. Fox, eds., The Handbook of Contemporary Semantic Theory, Second Edition, pages 203–236. Wiley. Fernando, Tim. 2016. Types from frames as ﬁnite automata. In A. Foret, G. Morrill, R. Muskens, and R. Osswald, eds., Formal Grammar 2015/2016 , pages 19–40. Springer. Fillmore, Charles J. 1982. Frame semantics. In Linguistics in the Morning Calm, pages 111–137. Hanshin Publishing Co. Goguen, Joseph and Rod Burstall. 1992. Institutions: Abstract model theory for speciﬁcation and programming. Journal of the ACM 39:95–146. Hennessy, Matthew and Robin Milner. 1985. Algebraic laws for nondeterminism and concurrency. Journal of the ACM 32:137–161. Hopcroft, John and Jeﬀrey Ullman. 1979. Inroduction to Automata Theory, Languages, and Computation. Addison-Wesley. Kamp, Hans and Uwe Reyle. 1993. From Discourse to Logic. Kluwer. Karttunen, Lauri. 1984. Features and values (F&V). In COLING ’84 , pages 28–33. Karttunen, Lauri. 2007. Word play. Computational Linguistics 33:443–467. Kornai, András. 2017. Truth or dare. This volume. Kutz, Oliver, Till Mossakowski, and Dominik Lücke. 2010. Carnap, goguen, and the hyperontologies: Logical pluralism and heterogeneous structuring in ontology design. Logica Universalis 4:255–333. Libkin, Leonid. 2010. Elements of Finite Model Theory. Springer. Shan, Chung-chieh. 2015. Splitting hairs. In Proceedings of the 20th Amsterdam Colloquium, pages 363–367. Tarlecki, Andrzej, Rod Burstall, and Joseph Goguen. 1991. Some fundamental algebraic tools for the semantics of computation: Part 3 indexed categories. Theoretical Computer Science pages 239–264.

Log In

Finite-state Methods Featuring Semantics

Related papers