Papers in Historical Phonology
http://journals.ed.ac.uk/pihph
ISSN 2399-6714
Volume 7, 44–80
DOI: 10.2218/pihph.7.2022.7405
Licensed under a
Creative
Commons
Attribution 4.0
International
License
What are cognates?
MARIEKE MEELEN
University of Cambridge
NATHAN W. HILL
Trinity College Dublin
HANNES FELLNER
University of Vienna
Abstract
1
The popularity of computational methods in historical linguistics has
primarily been motivated by mere access to the new methods themselves,
rather than by looking for tools to solve problems. Investigators have looked
for problems with which to showcase their tools. This dynamic is one reason
why eye-catching but long-solved problems, such as the homeland of the
Indo-Europeans (Gray & Atkinson 2003) have received more attention than
genuinely unsolved or controversial questions, such as how to incorporate the Hittite ḫi-conjugation into an understanding of the Indo-European
verbal system (Jasanoff 2003). One assumption of Bayesian methods is that
cognacy can be conceptualized as binary. Although this is how historical
linguists themselves often speak, it is not how they work. The goal of this
article is to more precisely delimit what is meant when we call two words
cognate, to emphasize that this is not a binary relation, but to suggest that
this relationship can still be modeled formally.
Introduction
The idea of ‘cognates’ is fundamental to research in historical linguistics, both that carried out in a traditional framework and that making use
of recent computational methodologies (cf. Labat & Lefever 2019). The
term ‘cognate’ can be used both for languages and for linguistic material,
usually words. According to Crystal (2008, 83), for instance, a cognate
is a ‘language or a linguistic form which is historically derived from the
same source as another language/form’. This definition, along with similar
formulations (e.g. Trask 2015, von Mengden 2008, Bynon 1977 etc.) raises
many questions. To say ‘historically derived’ assumes a method, i.e. the
‘Comparative Method’, to reconstruct the sounds of the forms in and the
What are cognates?
45
overall phonological system of the proto-language, from which the cognate
derives. Nonetheless, systematic, implementable accounts of the Comparative Method are scarce; those coming close generally start with ‘compare
words from cognate languages that have similar meanings’. In both the
traditional and computational contexts the means of deciding whether
two words are cognates or not remains largely opaque; there are no clear
and explicit heuristics for determining the cognacy of two words. Basic
textbooks teach via anecdotal examples (Anttila 1989, Campbell 2004,
Hock 1991) and more advanced methodological works have a conceptual
or theoretical focus that is not aimed at providing a practically implementable formalization of methodology as a series of steps (Hale 2007,
Hoenigswald 1960). In our view, one reason that the historical linguistics
of non-Indo-European languages lags behind work on the Indo-European
family is precisely because so much of Indo-European practice remains
tacit, implicitly absorbed as disciplinary norms, and consequently not
communicated to those working elsewhere (Schwink 1994, 29, Fellner &
Hill 2019a). Remarkably, in computational phylogenetic contexts, cognacy
is typically left to human experts (e.g. Gray & Jordan 2000, Chang, Cathcart,
Hall & Garret 2015).
To rigorously formalize the comparative method would yield two paramount benefits: (1) to better teach the method to practitioners, particularly
those working outside of Indo-European, (2) to potentially automate certain stages of the workflow and thereby spare the time of researchers to
concentrate on the conceptually more challenging steps. However, before
we can formalize the workflow of the comparative method it is necessary
to formalize what is meant by the key concepts that this workflow makes
use of; this paper focuses on determining what is meant by the relationship
of cognacy between two words or morphemes. We propose a systematic
method for diagnosing cognates and a practical workflow that is easy to
implement. We present:
• a workflow for establishing cognate sets (section 3)
• a typology of cognates and a hierarchy of cognacy (section 4)
• diagnostics for categorising cognates (section 5)
We begin by exploring the boundaries of what may legitimately be called
cognates (section 2) by examining two pairs of words, the famous comparison of Greek θεο� ς and Latin deus, a pair that looks related but are not
(section 2.1), and the comparison of English tooth and Old Irish dofúaid
‘he has eaten’, words distant in form and not close in meaning but that
Meelen, Hill and Fellner
46
descend from the same Indo-European root (section 2.2). Next we discuss
the value of clear diagnostics in other areas of linguistics, demonstrating
that there are currently no diagnostics to determine the limits of cognacy
(section 2.3). The bulk of the paper establishes a typology of cognacy
(section 4), dividing individual cases among strict (section 4.1), medium
(section 4.2), and weak (section 4.3). The subsequent section discusses
the challenges to explicitly diagnosing the different types of cognacy, also
proposing some solutions (section 5). We end the paper with some preliminary conclusions, including directions for future research (section 6).
2
The limits of cognacy
In this section we explore the boundaries of what may be called cognates
by looking at two examples. The main heuristic here is the phonological
form, measured by the regularity of sound laws: words or morphemes are
cognates if, and only if, their phonology can be reconstructed following
regular sound changes back to the proto-language (or whichever intermediate stage at which two cognate languages started to diverge). In addition,
the meaning and function of the two forms should be the same or at least
similar. This second heuristic is less black-and-white and will be discussed
in more detail in sections 5.2 and 5.3 below.
The first example is the famous comparison of Greek θεο� ς and Latin
deus, which looks obvious but turns out to be false (section 2.1). The
second, the comparison of English tooth and Old Irish dofúaid ‘he has
eaten’ is prima facie ridiculous but turns out to be genuine (section 2.2).
Consideration of these two cases sheds light on the fact that ‘cognacy’ is
not once and for all, but instead that words come in and out of cognacy as
scientific understanding deepens.
2.1
Greek θεός ‘god’ and Latin deus ‘god’: worse than it looks
The comparison of Latin deus and Greek θεο� ς, so familiar from the handbooks (Fortson 2010, 25), is the Paradebeispiel for the methodological
principle that form trumps meaning in etymology. The two words look
similar and mean the same thing, but an initial Latin d- should correspond
to Greek δ- (e.g. Lat. domus ‘house’ and Gk. δο� μος ‘house’) and not to θ-.
The comparison of θεο� ς and deus dates to before Franz Bopp (1791-1867);
Bopp’s student, August Friedrich Pott included both deus and θεο� ς as cognates of Sanskrit deva ‘god’ in his Etymologische Forschungen auf dem
Gebiete der indogermanischen Sprachen (1833-6), but noted the phonological irregularity (Davies 1998, 173-174). Scholars such as Theodor Benfey
47
What are cognates?
(1837) and Georg Curtius (1862) slowly brought the opinio communis to
reject this proposal, with Max Mü ller holding on to the defunct comparison
as late as 1875.
The power of the comparative method is to show that obvious looking
cognates such as these are in fact impossible from the point of view of
historical phonology. But with the benefit of hindsight it is easy to forget
that in its day the comparison was not foolish, but necessary; in less well
studied language families similarly plausible, but unjustifiable comparisons are rife. Consider from the Trans-Himalayan family the comparison
of Old Tibetan dmyig ‘eye’ with Burmese myak, and Chinese 目 mjuwk <
*C.muk. The vowel correspondence i : a : u is unique to this example, but
to dismiss the cognacy of the items at the current state of research would
be premature.
2.2
English tooth and Old Irish dofúaid ‘he has eaten’: better than it
looks
At the opposite end of the conceptual spectrum from the ‘god’ example
is when two words do have a shared history, but that the naı̈ve proposal
of their common ancestry would be unwarranted. The initial t- in English
tooth and the final -d in Old Irish dofúaid ‘he has eaten’ both continue the
*d of the Indo-European root *h₁ed- ‘eat’. On the one hand, Old Irish dofúaid
‘he has eaten’ is a suppletive third person singular perfect deuterotonic
stem (see Thurneysen 1946, 27-29, 351-352) of ithid ‘eat’. A transponant
of dofúaid for Proto-Insular-Celtic would be *dī-wo-ād, in turn from ProtoCeltic *dē-uɸo-ād-e, and, if ultimately projected back into Indo-European,
*dē + *upo + *h₁e-h₁od-e, two preverbs before a third person singular perfect. The form *h₁e-h₁od-e is itself straightforwardly the singular perfect
of the root *h₁ed. On the other hand, English tooth is from Old English tōþ,
from Proto-Germanic *tanþs (cf. Old Saxon tand, Dutch tand, Go. tunþus),
from Proto-Indo-European *h₁dónts; cognates include Latin dent-, Aeolic
Greek ἔ δοντ- (see Ringe 2006, 70), Old Irish dét, and Lithuanian dantìs.
The stem *h₁dónt- is itself straightforwardly the active participle of the
root *h₁ed ‘eat’. Nonetheless, it is not in accord with normal practice in
historical linguistics to regard tooth and dofúaid as cognate. Permitting
such examples would countenance all of the excesses of the long rangers.
From the ontological perspective such cases are clearly cognate, but from
the epistemological perspective this loose a notion of cognacy has little
practical methodological value, unless we define a clear hierarchical typology of cognacy (for this see section 4). The knowledge that these words
Meelen, Hill and Fellner
48
are in any sense related is the end product of a vast amount of research, it
is not the starting point for an investigation.
2.3
The benefits of clear diagnostics
Like in other sciences, a linguistic study often starts with an observation
and an attempt to accurately describe the object of study: a sound, word,
sentence etc. in one or more language(s). Ideally we then move beyond
the description to try and explain how and why we observe the patterns,
constructions, forms in the way we do. In the introduction to a collection
of papers Diagnosing Syntax, Cheng & Corver (2013) compare the study
of syntax and the discovery of ‘underlying’ or ‘hidden’ structures to the
work of physicians: the nature of the illness or disorder can be identified
based on a patient’s signs and symptoms. Similarly, a careful and rigorous
study of the properties and characteristics (i.e. the symptoms) of a syntactic phenomenon, for example, can identify it: ‘the nature of an object
or phenomenon is understood by means of the ability to discern relevant
features of that object or phenomenon’ (Cheng & Corver 2013, 1). In order
to diagnose a physical condition, physicians conduct a range of tests or diagnostic procedures (e.g. blood or urine tests). A good syntactician should
thus be a good diagnostician: capable of designing and consistently conducting the right tests to identify a phenomenon within a language or even
cross-linguistically. Within a language, these tests can be quite specific. In
a language like Dutch for example one can distinguish unaccusative and
unergative intransitive verbs1 by testing which auxiliary they take in the
perfect, i.e. ‘be’ or ‘have’ respectively:
(1)
a.
b.
Hij is
vertrokken.
he be.3SG departed
‘He has departed.’
Hij heeft
gedanst.
he have.3SG danced
‘He has danced.’
(‘be’ AUX: unaccusative)
(‘have’ AUX: unergative)
This, along with a number of other tests, helps identify the type of intransitivity of certain verbs in Dutch (and some other languages, e.g. German,
1
Unaccusative verbs are intransitive verbs whose subject is not considered to be the
‘semantic agent’ or ‘external argument’ (in generative grammar). This subject is therefore
structurally and semantically similar to the direct object or patient of a transitive verb,
e.g. arrive, die. They contrast with unergative intransitive verbs whose subject voluntarily
initiate the action, e.g. dance, run. See Alexiadou, Anagnostopoulou & Everaert (2004).
What are cognates?
49
French, Italian, etc.). However, as is clear from the translation, the same
test will not work for English, because Present-Day English only has one
auxiliary for the perfect (‘have’). To diagnose unaccusative verbs in English,
other tests are needed, such as nominal modifiers or resultative adjuncts.
To give an example of the former, past participles of unaccusative verbs in
English can be used as active nominal modifiers, whereas those of unergatives cannot:
(2)
a. The departed guests. / The melted snow. (OK: unaccusative)
b. *The danced girl. / *The slept child.
(NOT OK: unergative)
Like syntacticians, historical linguists, when looking for answers to ‘how’
and ‘why’ questions, are confronted with ‘hidden structures’. Phonological comparison and reconstruction are good examples of such hidden
structures. Our objective in this paper is to identify the nature of the phenomenon, in this case, the level of inherited similarity or ‘cognacy’. In order
to do that, as historical linguists, we also need a set of clearly defined tests.
A lack of clear heuristics and diagnostics makes it difficult to verify and
compare results consistently. Just like theoretical syntacticians or linguists
in other subdisciplines like neuro- and psycholinguistics, historical linguistics would benefit from more well-defined ways to make predictions
and to test results. If we were to say with greater precision what is meant
with the claim that two words are ‘cognate’ and provide clear methods for
identifying whether two forms are cognate, this would help those areas of
historical linguistics where progress is currently stymied.
3
Workflow for establishing cognate sets
The aforementioned two pairs of examples illustrating the limits of cognacy (section 2.1 and section 2.2) highlight a distinction between what
we can conveniently call comparanda, words suspected of descent from a
single form (Latin deus and Greek θεο� ς), and comparata, words that have
been shown to descend from a single form (English tooth and Old Irish
dofúaid). A suspicion can be more or less strong, a demonstration more or
less secure, as such both suspicion and demonstration are scalar rather
than binary predicates. Therefore, being a comparandum or a comparatum
is a concomitantly complex affair. The best-behaved cognates are those
where ex ante any observer would have a strong suspicion of their shared
origin and their shared origin has been demonstrated in an ex post facto
straightforward and watertight way, i.e. where a good comparandum is a
good comparatum. As etymological research progresses one relies less on
comparanda and more on comparata; the machinery of known historical
Meelen, Hill and Fellner
50
phenomena become more powerful as they become more finely stated. The
exactness of science replaces the groping of guess work. The rest of this
section attempts to answer the question of how we change comparanda
into a comparata at a specific moment in the history of research.
3.1
Step 1: Heuristics for finding comparanda
There are three elements essential to establishing cognate sets: a set of
cognate languages, a set of potential cognates (comparanda) and, ideally,
a body of existing knowledge to test against, namely, a set of sound correspondences (C) thought to be regular at a particular point in time (t);
we call this set of regular sound correspondences Ct . For languages with
well-established phylogenies and a large body of secondary literature, all
three necessary elements are readily available. The existence of a welldeveloped set of sound correspondences (Ct ), in particular, permits one
to go straight to the diagnostics that help determine the type and level of
cognacy (see section 5).
For under-researched languages families for which no such literature
and resources exist, we need initial heuristics to get the workflow started.
In such cases, the three essential elements may instead be conceptualized
as steps in a preliminary mini-workflow:
1. Choose languages to compare
2. Choose words to compare
3. Choose a set of allowable correspondences (Ct )
The first step of the mini-workflow is not of methodological interest,
since in principle all of the world’s languages could be compared pairwise. In practice, languages will be compared to languages of presumed
genetic affiliation or geographic proximity. As for the second step, existing
computational methods for finding potential cognates may not yet be wellequipped to diagnose the type and level of cognacy, but they certainly
permit the identification of potential comparanda, faute de mieux. LexStat
is a prominent example of such an automatic cognate detection algorithm
(List 2012); when computation time is costly, because words from many
languages are compared, BipSkip is an alternative that performs faster,
but less well (Rama & List 2019).
The third step is more difficult, when investigating comparanda from
two languages that have never been looked at together before, no set
of established sound correspondences exists. As a further heuristic in
51
What are cognates?
these cases, one can look to well-established sound correspondence sets
in other language families to identify plausible candidate correspondences.
The phoneme /t/ for instance, often corresponds with /t/, /d/ or /th /,
but a correspondence between /t/ and a vowel or approximant, or even
other stops like /p/ or /k/, is unprecedented, or at least rare. Since in this
case Ct is the allowed correspondence patterns at the very beginning of
research (t = 0), we refer to this set of correspondence patterns as C0 . The
heuristic under discussion populates C0 with correspondence patterns
that are widespread across the world’s languages. To give a simple example,
correspondences of identity such as {m, m, m}, {n, n, n}, {h, h, h} and {s, s,
s}, one will certainly want to include in C0 at this point. This initial step
of populating C0 results in a set of hypotheses only, which is important to
bear in mind when phonological segments are aligned and morphemes
are tested in the next step of the main workflow.
3.2
Step 2: Aligning and checking phonological segments
Once we have a set of comparanda and at least a start on a set of sound
correspondences, we continue with the alignment of the phonological
segments. It is important to note that although this may seem trivial to a
trained historical linguist, this is a non-trivial task for a computer when the
length of the phonological segments of the comparanda differs; particular
difficulties are, for example, knowing when to permit a segment to compare
to zero and whether to compare a diphthong with a vowel, or a sequence
of vowel and glide (List 2014 and List, Walworth, Greenhill, Tresoldi &
Forkel 2018). At this step we refine the ‘historically derived from the same
source’ part of the initial definition of cognates we cited in section 1. In
order to develop a straightforwardly implementable method, we propose
to re-define the ‘historical derivation’ in terms of minimum requirements
of cognacy: for comparanda to be cognate, at the very least they need to
have one aligned phonological segment that can be found in the set of
established sound correspondences between the respective languages.
In addition, at a minimum it is necessary to tell an informal, intuitively
plausible story about how one single meaning can develop into those seen
in the comparanda.
After aligning all phonological segments, we check whether the resulting sound correspondences exist in our set of established correspondences
(Ct ) for the languages in question.2 If the phonological segments of a root
2
In keeping with the adage falsely attributed to Voltaire that etymology is ‘où la voyelle ne
fait rien, et la consonne fort peu de chose’ (‘where vowels count for nothing, and conson-
Meelen, Hill and Fellner
52
morpheme can be aligned and the resulting correspondences exist in Ct ,
the comparanda pass this initial test and can be called ‘cognate’. If the
concept of ‘roots’ in unclear in the languages under investigation, then this
test can be relaxed to apply to any morpheme. If there is no single morpheme for which phonological segments can be aligned, the comparanda are
rejected as cognates at this stage (see figure 2). Note that this immediately
rules out the cognacy of the roots of Latin deus and Greek θεο� ς, although
it leaves open the possibility of considering the endings -os and -us as
potential cognate candidates, which is the desired result. It does not a
priori rule out the cognacy of English tooth and Old Irish dofúaid ‘he has
eaten’, but the difficulty of aligning the one corresponding phonological
segment immediately reveals the weakness of the cognacy — again, the
desired result.
For newly compared languages, C0 is pre-populated only with sound
correspondences well attested across the globe. There may well be valid
correspondences that have not yet been countenanced by any existing
research tradition. In this case the comparanda that evince correspondence patterns not found in C0 should not be rejected out of hand. Instead,
one uses the comparanda to add to and verify the set of sound correspondences. This process of identifying correspondence patterns makes
up the backbone of the comparative method and for this reason alignment
is particularly essential in the early stage of research (Anttila 1972, 230,
Koch 1996, 221, Dimmendaal 2011, 13, Weiss 2014, 128, Trask 2015, 196).
3.3
Step 3: Diagnosing cognacy type and level
Once we have established that we are in fact dealing with cognates, we
can establish the type and level of cognacy. We present a number of diagnostic tests that evaluate the form (phonology), meaning (semantics) and
function (morpho-syntax and pragmatics) of the comparanda. These diagnostics will first of all determine the level of cognacy ranging from strong
(‘strict cognates’) to weak. Second, the diagnostics establish whether
cognates are synonymous or non-synonymous in meaning and function.
Among any level of cognates we can distinguish ‘synonymous cognates’
(Koch & Hercus 2013, 34), for those comparisons where both members
maintain the inherited meaning unchanged, and ‘non-synonymous cognates’ for those comparisons where one or both members of the comparison have undergone semantic change. Strict synonymous cognates such
ants for very little’), one could assign greater weight to correspondences of consonants
than those of vowels.
What are cognates?
53
as German Herz ‘heart’ and English heart and German Distle ‘thistle’ and
English thistle, etc. are the most straightforward type and play a unique
role in the early days of research on a particular family (List 2019). It is no
coincidence that automatic cognate detection algorithms, such as LexStat,
require synonymous comparanda as their input.3 As for the strict nonsynonymous cognates, we can allow for major semantic changes along
the lines of German Zimmer ‘room’ and English timber, or German Zaun
‘fence’ and English town, but what we cannot permit is complete semantic
laxness.
And finally, diagnostics can classify comparanda belonging to certain
sub-types, e.g. ‘partial cognates’, ‘oblique cognates’ or ‘quasi-strict cognates’. These types and levels are treated in detail in section 4, and the
diagnostics in section 5. After the application of these diagnostics, new
cognates are labeled and categorised, for instance as ‘strict synonymous’,
‘weak synonymous’ or ‘quasi-strict non-synonymous’ cognates; they are
now comparata. Where relevant, their sound correspondences, including
any additional features such as their phonological context or conditioning,
can now be added to the set of established sound correspondences (Ct ).
3.4
Step 4: Extending and refining the sound correspondence set
After newly established cognates or comparata emerge from Step 2 and
are labeled in Step 3, the sound correspondence set is re-calibrated to
reflect the knowledge gained (e.g. the addition of a set of features from
new examples to an established correspondence pattern).
m a n
m æ n
m a n
s o: n
s ʌ n
s o: n
h ɛ m
h əʊ m
h aɪ̯ m
h oː r
h ɛ r
h aː r
h ʉː s
h aʊ s
h aʊ̯ s
m ʉː s
m aʊ s
m aʊ̯ s
Figure 1: Aligned cognates from Swedish, English, and German apud Anttila 1972, 230.
Orthographic forms are rewritten as broad phonemic IPA transcription.
Suppose that we were for the first time investigating the relationship
among Germanic languages. Figure 1 shows a few aligned comparanda,
namely words meaning ‘man’, ‘son’, ‘home’, ‘hair’, ‘house’, and ‘mouse’ in
3
It is also not a coincidence that the words in figure 1 (discussed anon) are strict synonymous cognates.
Meelen, Hill and Fellner
54
modern Swedish, English, and German (following Anttila 1972, 230).These
words exhibit the correspondences m:m:m (3x), n:n:n (2x), h:h:h (3x), s:s:s
(3x), ʉː:aʊ:aʊ (2x). The first four correspondence patterns pass muster,
since we included the correspondence patterns {m, m, m}, {n, n, n}, {h, h,
h} and {s, s, s} ex hypothesi in C0 . Step 3 will have classified these comparisons as quasi-strict synonymous cognates, quasi-strict because the fifth
correspondence pattern, ʉː:aʊ:aʊ, was not included in C0 . However, on the
basis of these, and other, examples, the pattern can be added to C1 . In a
future iteration of the workflow, these comparisons will come out as strict
synonymous cognates.
The major conceptual hurdle is that it is not always clear how to distinguish the major sound correspondences relating two languages from
those that should be regarded as one-offs. For example, when we consider
the Latin word quinque ‘five’, and contemplate the reason for it lacking the
*p- of its progenitor *pénkʷe ‘five’ (Gk. πε� ντε, Skt. páñca, Lithuanian penkì),
it is reasonable to see the non-alignable element as contamination from
*kʷetu̯ or- (Gk. τε� σσαρες, Lat. quattuor, Av. caθβar). However, the same
form can be explained with a sound change *p ... *ku > *kʷ ... kʷ, whereby
the *p- is assimilated to the labiovelar of a following syllable (Weiss 2009,
73). The latter explanation has the advantage of also explaining Lat. coquit
‘cooks’ as the outcome of *pekʷ-e-ti ‘cooks’ (cf. Skt. pacati, Gk. πε� σσω). In
practice, what we can do is set an arbitrary frequency threshold to accept
only the commonly attested patterns into Ct , e.g. those occurring in 15 or
more cognates, but to lower the threshold as Ct becomes populated with
more and more refined information about the historical phonology of the
languages in question as the workflow goes through progressive iterations.
Since the workflow itself will weed out spurious comparisons and add
in non-obvious comparisons, the threshold chosen to accept new correspondence patterns after any given iteration really does not matter. To play
it safe, the most common pattern not yet in Ct would be the only pattern
examined, and if it is accepted, the whole workflow would be rerun.
Figure 2 presents a schematic overview of the entire workflow proposed in this paper.
4
The typology of cognates
In this section we present a typology of cognates, based on the level of
similarity of their three core variables: form, meaning and function. In the
previous section we have already established a crucial minimum level of
similarity in form: two cognates must both contain at least one segment
55
What are cognates?
Figure 2: Schematic workflow of establishing cognates.
that continues the same inherited segment in a common ancestral ‘root’
morpheme.4
The definition of ‘root’ morpheme can vary depending on the language
family, but in principle cognacy can always be established (or rejected)
on the level of the morpheme. Morphemes such as derivational suffixes,
case endings, or verb inflections can thus also be compared and tested for
cognacy (see note 6).
Prototypical examples of the main types of cognacy are discussed in
three broad subsections ranging from phonologically strong to weak cognates. Alongside the phonological form, most definitions of cognacy refer
to a similarity in meaning (semantics), to which we add a potential similarity in function (morpho-syntax and pragmatics). We adopt the notions
of ‘synonymous’ and ‘non-synonymous’ cognates to reflect the latter variables and argue that the level of similarity in meaning and function can
be established for each pair of cognates, no matter how strong or weak in
phonological form.
Admittedly Fr. être and Sp. ser are not cognate by this measure, but many forms of their
paradigm are (e.g. Fr. sommes and Sp. somos). The correct theorization of such tricky
cases is best addressed in future research.
4
Meelen, Hill and Fellner
4.1
Strict cognates
56
We propose the term strict cognates for those words or word parts for
which we can demonstrate that their change was following the regular
‘laws’ of sound change. Such cases contrast with words whose histories
include additional factors such as morphological derivations that impact
directly on pronunciation, or sporadic sound changes due to analogy, assimilation, metathesis, etc.
The strict level of cognacy is only common at a fairly shallow time depth
for a small selection of languages, since at a greater time depth erratic
analogical developments more and more affect an ever greater portion
of the vocabulary. After enough time such analogical development even
compromise the ability for an analyst today to find the regular correspondences.
Afro-Asiatic is, for example, a family so ancient as to make the identification of regular phonological correspondences exceedingly difficult
(Huehnergard 2004, 141); we face similar problems in the deep reconstruction of Trans-Himalayan (Hill 2019), and, as discussed for some examples
in detail below, even in Indo-European linguistics, it is often difficult to find
reflexes of well-established proto-forms which are regular in all respects.
Adding further languages to the comparison asymptotically increases the
number of exceptional analogical developments in a comparison (since 1
analogical innovation among n languages leads to n irregular comparisons
so), strict cognates are much easier to identify in a pairwise fashion, two
languages at the time. Strict cognates (cognates, such as the examples in
figure 1, where all phonological segments can be aligned and for which
the correspondences can be found in Ct , see section 4.1) have a unique
importance for the discovery of correspondence patterns (List 2019).
4.2
Medium cognates
To classify some types of relationships as medium strength cognates is
primarily an expository exercise, in other words ‘medium cognates’ are all
those that are neither strong nor weak. We distinguish two types: ‘quasistrict cognates’ (section 4.2.1) and ‘word equations’ (section 4.2.2). In this
section we first discuss the three types of quasi-strict cognates and then
we provide examples of word equations.
4.2.1
Quasi-strict cognates
In some cases two related words have mostly been affected by regular
phonological change, but one or both of them have also been affected by
57
What are cognates?
a non-phonological change that has resulted in an exceptional status for
an individual segment. Since such cases are almost as straightforward
as strict cognates, we refer to them as ‘quasi-strict’. Nonetheless, the exceptional segment needs to be located and a reasonable explanation put
forward to account for its existence. Quasi-strict cognates arise particularly due to three causes: paradigm internal analogy, contamination, and
inter-dialectal borrowing.
Paradigm-internal analogy Some single segmental exceptions to regularity result from paradigm internal analogies. Because of the grammatical
motivation for the analogy, such cases may have the appearance of grammatically conditioned sound changes. For example, Crowley finds that the
failure of initial *l loss in Paamese verbs, such as loh ‘he runs’ ≪ *oh is
‘a clear example of a sound change that does not involve purely phonological conditioning factors but also involves grammatical conditioning’
(2010, 173), but he fails to mention that it is only in non-negated third
singular realis verb forms that *l- would have been word initial (Crowley
1982, 129–130). The paradigmatic pressure to analogically restore l- in
this environment was overwhelming (e.g. navō ‘I stink’ : vō ‘he stinks’ ::
naloh ‘I run’ : X ‘he runs’, with loh ‘he runs’ replacing inherited *oh) (see
Hill 2014, 222).
Contamination In cases of contamination (Hock 1991, 197–199, Trask
2000, 72–73, etc.) the pronunciation of a word is affected by the pronunciation of a word with which it is semantically associated (see esp. Hockett
1967). A well known instance is that of Indo-European *kʷetu̯ ṓr ‘four’ (cf.
Skt. catvā ́ r-) irregularly becoming Proto-Germanic *petu̯ ṓr > *fedu̯ ōr (Go.
fidwōr, OEng. fēower) under the influence of the *p- in *pénkʷe ‘five’ (Gk.
πε� ντε, Skt. páñca, Lith. penkì). If we compare, for example Sanskrit catvā ́ rand Gothic fidwōr, some segments are alignable according to regular phonology (-a-:-i-, -t:-d-, -r:-r) but it would be a mistake to mechanically align
the f - of Gothic with Sanskrit c-, since that correspondence does not regularly recur in other vocabulary. To give another example, also well-known
from textbooks (Hock 1991, 230, Trask 2015, 31, etc.), German Bräutigam
and Dutch bruidegom ‘groom’ are strict cognates, but neither is alignable
with English bridegroom, because the second -r- of the latter has no reflex
in the other two languages, resulting as it does from contamination with
groom and/or the existing -r- in bride.
Meelen, Hill and Fellner
58
Interdialectal borrowing Borrowing between closely related languages
can also lead to what are epistemologically (at least initially) indistinguishable from quasi-strict cognates, but which, as loans are by definition not
cognates. An example for this process is German Damm ‘dam’ (< OHG
tamb), which shows initial d instead of expected t due to contact with
Low German varieties (where d is the regular initial). Although different
processes are at work here (close language contact and replacement of
one item by a similar cognate item from a closely related language variant
vs. language-internal modification of a form due to analogy with a form
of a different meaning in the same language), the resulting patterns are
very similar, in so far as strictness of sound correspondence patterns is
maintained for the most part throughout the word, but one segment does
not follow the expected pattern.
4.2.2
Word equations
In addition to these ‘quasi-strict’ cognates, there is another type of medium
cognate which we label ‘word equations’. These are cognates exhibiting the
same form derived from the same root, continuing at least one (but not all)
inherited grammatical feature(s). In Indo-European linguistics two forms
enter a word equation when they exhibit the same form of the same root
and continue some inherited grammatical feature under discussion (Vine
1993, 49, Jasanoff 2003, 3, 13, et passim, Clackson 2007, 187, 210, Weiss
2009, 430). Two of Jasanoff’s (2003) equations give a feel for how the term
is used. He supports the continuity of the Hittite mi-conjugation present
singular 3rd person personal ending-zi from the Proto-Indo-European
primary active ending*-ti with the following word equations (Jasanoff
2003, 3):
• Hitt. 3rd sing. ēšzi = Vedic ásti = Gk. ἐ στι < PIE *h₁és-ti
• Hitt. kuenzi ‘slays’ = Vedic hánti < PIE *gʷʰén-ti
Note that all strict cognates are necessarily word equations (but not all
word equations are strict cognates),5 although one would tend not to
As an example of an imperfectly alignable word equation, Jasanoff equates Hitt. mimma‘refuse’ (prima facie from *mimne-) and Gk. μι�μνω ‘stand fast’ (< *mimn-), while arguing
that reduplicated presents with the reduplicating vowel -i- are associated with the the
Hittie ḫi-conjugation (Jasanoff 2003, 129). The Hittite stem final vowel -a is not alignable
with anything in Greek; Jasanoff explains it as an analogical innovation on the model
of the 3rd pl. mimmanzi (2003, 131), i.e. danzi ‘they give’ : dāi ‘he gives’ :: mimmanzi
5
What are cognates?
59
refer to a set of monomorphemic strict cognates as word equations. It
is perhaps no surprise that in the research traditions of those language
families, such as Austronesian or Trans-Himalayan, with members less
rich in morphology — or with poorly understood morphology — the term
‘word equation’ does not appear.
4.3
Weak cognates
Weak cognates are morphologically altered with respect to the proto-form.
We sub-categorize weak cognates into two types, namely partial cognates
(section 4.3.1), and oblique cognates (section 4.3.2).
4.3.1
Partial cognates
We define ‘partial cognate’ as forms which contain at least one morpheme
that is strictly cognate and at least one of the comparanda contains an
additional morpheme not present in the other. Thus, Spanish sol (< Vulgar
Lat. sōl; Lat. sōl) and French soleil (< Vulgar Lat. *sōliculus REW, §8067)
are related as partial cognates, as are Atsi mau²¹mjiŋ⁵¹ ‘thunder’ and Maru
mjaŋ³¹kʰa³⁵ ‘wolf’, since they both continue an inherited morpheme for
‘thunder’; the Maru word has the morphological structure ‘thunder’ + ‘dog’
(cf. Maru lə̆ ³¹kʰa⁵ ‘dog’) (Hill & List 2017, 68).
Among partial cognates, one can distinguish a subtype of ‘root cognates’
for cases when the two reflexes exhibit the same form of the same root
(Trask 2000, 290).6 For example, Latin lātus ‘borne’ (< *tl̥h₂-tó-) and Old
Church Slavonic tĭla ‘foundation, bottom’ (< *tl̥h₂-ó-). Both words continue
the same form of the same root (the zero-grade *tl̥h₂-), but also contain
non-cognate concatanating morphology (the suffixes *-tó- and *-ó-). The
meaning of ‘root’ will of course depend on the specific morphological
profiles of particular language families. For any language it is likely possible
to work with a definition such as ‘inherited morpheme in a word that, of
‘they refuse’ : X = mimmai ‘he refuses’. The two members of the word equation are not
alignable but they continue the same form of the same stem (*mimn-) and share a relevant
grammatical feature (here present reduplication with the reduplicating vowel -i-).
6
A proto-language could have had a form Root₁+Suffix₁, where no daughter language
preserves this combination, offering only Root₂+Suffix₁ or Root₁+Suffix₂. As such, there
are partial cognates that are not root cognates, forms that share a stem but have a different root (e.g. 3sg.prs. *gʷm̥-sḱ-é-ti > Skt. gacchati and 1sg.prs. *ǵi-ǵneh₃-sḱ-ó-h₂ >
Gk. γιγνω�σκω ‘know’). Nonetheless, a word equation (e.g. 3sg.prs. *gʷm̥-sḱ-é-ti > Skt.
gacchati ‘go’ and 1sg.prs. *gʷm̥-sḱ-ó-h₂ > Gk. βα� σκω) is much better evidence for the
reconstruction of a suffix.
Meelen, Hill and Fellner
60
the morphemes in the word, has the lowest synchronic frequency across
the lexical entries of that language.’
4.3.2
Oblique cognates
As described above (section 2.2), cognates such as English tooth and Old
Irish dofúaid are not conventionally called cognates because prima facie
there is no wisdom in bringing together these words for comparison. We
next turn to comparisons that are much more fruitful, but perhaps no less
complex in terms of their historical relatedness. Consider English ‘feather’
compared to Greek πτερο� ν ‘feather, wing’. Indo-European had an original
proterokinetic heteroclitic noun with rectus stem *pét-r̥ and obliquus stem
*pt-én- (cf. Hitt. pettar, pettan- ‘wing, feather’). English ‘feather’ derives
from *pét-r-eh₂- ‘collection of feathers’ with the *-eh₂ collective suffixed to
the inherited rectus stem *pét-r-. In turn, Greek πτερο� ν continues *pt-er-ó‘feathery thing’, a possessive *-o- derivative of a stem *pt-er-, which is
an analogically renewed obliquus stem, i.e. rectus *pér-tu- (ON fjǫrðr) :
obliquus *pr̥ -téṷ- (Lat. portus, Eng. ford) ‘crossing’ :: rectus *pét-r : obliquus
X, X = *pt-ér-, or the like. The comparison of ‘feather’ and πτερο� ν is what
Trask uses in his definition of ‘oblique cognate’ (Trask 2000, 235). Trask
defines an oblique cognate as ‘[t]wo or more words in related languages
which continue alternate forms of a single root in the ancestral language’
(2000, 234–5). This definition refers to ‘a single root’, so oblique cognates
could be viewed as a type of root cognates. However, we prefer to use ‘root
cognate’ for those cases where the reflexes inherit the same form of the
root and reserve ‘oblique cognate’ for the cases where this criterion is
not necessarily met. Thus, strictly speaking we regard all cases of root
cognates as also instances of oblique cognates, but practically speaking
one would not typically call cases in which the same form of the root
appears in two reflexes ‘oblique cognates’ because the more precise term
‘root cognate’ is available. As this example shows, oblique cognates are the
result of extensive analogical and derivational developments; no single
état de langue is likely to have contained both *pét-r-eh₂- and *pt-er-ó(Fellner & Hill 2019a, 168-169).
Oblique cognates arise primarily from non-concatenating morphology.
The importance of accent and ablaut patterns to Indo-European morphology means that oblique cognates are very common in this family. The
15 etyma in Allen Nussbaum’s (1986) account of words for ‘head’ and
‘horn’ in the older Indo-European languages all descend ultimately from
*ḱér-h₂/*ḱr-éh₂, but none is entirely lautgesetzlich. The simplest case is
Mycenaean Greek kerā ‘horn (material)’ (< *ḱér-eh₂), either a reflex of
What are cognates?
61
the rectus stem with an analogical full-grade of the suffix or a reflex of
the obliquus stem with an analogical full-grade of the root. In contrast,
the pathway from the same proto-form to Latin cerebrum (< *ḱérh₂sro-)
requires six steps, which include a variety of morphological affixations,
analogical derivational developments, and semantic changes. First, *ḱér-h₂,
oblique *ḱr-éh₂ → *ḱr-ḗh₂, oblique *ḱr̥ -h₂- as a regular productive, so called
‘internal’, derivation (see discussion in Fellner & Hill 2019b, 117 n. 39
and cf., e.g., *si̯éṷH-mn̥ , oblique *si̯uH-mén- in Skt. syū ́ ma ‘band’ → *si̯uHmḗn, oblique *si̯uH-mn- in Gk. ὑ μη� ν ‘membrane’) (Nussbaum 1986, 120,
134), accompanied by a change of meaning to ‘the head bone’. Second, the
meaning shifted further to ‘skull, head’. Third, the analogy *h₂eu̯ s : oblique
*h₂us-es- ‘ear’ :: *ḱr-ḗh₂ : oblique X = *ḱr̥ h₂-es-, led to the obliquus stem becoming *ḱr̥ h₂-es- (Nussbaum 1986, 214). Fourth, in Proto-Indo-European,
in addition to the originally endingless locative stem (with its own ablaut
grade different from the rectus and obliquus), there existed several affixal
markers to characterize the locative, the most prominent being *-i and *-er
(cf. Vedic uṣás-i ‘at dawn’ (paradigmatic locative of uṣas-) < *h₂us-és + *-i
next to (a substantive that arose by paradigmatic split of a locative) uṣar‘thing at dawn’ < *h₂us-s + *-er) the latter of which suffixed to our form
gave *ḱr̥ h₂-s-er ‘on the head’ (Nussbaum 1986, 236). Fifth, this form was
itself turned into an adjective with the adjective forming suffix *-ó- to yield
*ḱr̥ h₂-s-r-ó- ‘adj. in/at/on the head’ (cf. Vedic usrá- ‘early’ < *h₂us-s-r-ó(Nussbaum 1986, 243)). In the final step, this adjective is nominalized
with a change of accent to *ḱérh₂sro- ‘thing on the head’ (Nussbaum 1986,
243); cf. Gk. λευκο� ς ‘white’ : Gk. λεῦ κος ‘white thing > whitefish’; Skt.
kṛśnás ‘black’ : Skt. kṛ ́ śnas ‘black thing > black antelope’. Latin cerebrum is
the direct lautgesetzliche outcome of *ḱérh₂sro-. The somewhat surprising
change *-sr- > -br- is regular in Latin (see Weiss 2009, 163).
In Asian historical linguistics, many investigators reconstruct various
alternate forms of the same root (see Blust 1990, 142-143 for Austronesian
and Matisoff 1973, 123 for Trans-Himalayan). The pervasiveness of such
reconstructed doublets itself suggests an inflectional morphological profile
for the relevant proto-language (pace LaPolla 2017, 40, 51).
4.4
Core cognate dimensions
The level of similarity between two cognates can be measured and visualised in three dimensions. A pair of comparanda may get a perfect score
in phonological form on the y-axis, for instance, if all their phonological
segments can be aligned and their sound correspondences are found in
the correspondence set (Ct ). However, one of the comparanda (or both of
Meelen, Hill and Fellner
62
them), may have undergone various shifts in meaning and function, yielding a much lower similarity score on the x- and z-axes. The next section
presents the diagnostics and proposed scoring metrics in detail.
5
Diagnosing cognates
In this section we propose a number of diagnostics to first of all determine
whether comparanda are cognates and, second, if they are, what type
and level of cognacy they represent. The first diagnostic test, described
in section 5.1, is based on phonological form only. We next zoom in on
the distinction between synonymous and non-synonymous cognates to
establish the similarity of cognate pairs in terms of semantic similarity
(section 5.2) as well as a number of morpho-syntactic and pragmatic
variables (section 5.3).
5.1
Phonological alignment
Operationally the easiest metric to compute the level of cognacy is to focus
on phonological similarity only. Naively, we could thus take the number of
segments in word w1 and word w2 that are alignable (ci ) and found in the
sound correspondence set of the two language comparanda (Ct ) over the
total number of alignable segments (i), i.e.
P
Cog(w1 , w2 ) =
ci
i
This would work fine for examples like the ones shown in figure 1, where
all comparanda have the same number of phonological segments and
aligning the sound correspondences is straightforward. However, if we
want to align Spanish sol with French soleil ‘sun’, the final segments of
French soleil do not have any equivalent in Spanish sol. If we want to align
these segments anyway, they will have zero as equivalents in Spanish. Since
sound correspondences with zero are not found in the Spanish-French set
of sound correspondences Ct , the result of the above equation would tell
us Spanish sol and French soleil are only partially cognate. This in itself is
not a bad result, but problems arise when the alignment of segments is
less obvious.
As discussed in section 3 above, aligning segments of varying length is a
non-trivial task to automate as in principle, without any prior knowledge,
it is impossible to know where the zero segments should be added. A
default ‘end of word’ approach would happen to work for sol-soleil but
What are cognates?
63
sometimes, zeros should be added to the beginning or right in the middle
of words (e.g. in cases of epenthesis).
Ideally, we would calculate the number of (unlautgesetzlich) innovations that separate two forms, but this is only rigorously possible at an
exceedingly advanced stage of research when Ct has been extended, tested
and well-refined in terms of phonological conditioning. In the next sections
we discuss how phonological alignment can be used as a diagnostic to
determine the level of cognacy, ranging from strong (‘strict cognates’) to
medium (‘quasi-strict cognates’) and weak cognates.
We propose instead to make the distinction between medium and weak
cognates based on morphology, rather than phonology. In theory, we could
propose a threshold of a minimum proportion of phonological segments
that can be aligned as a cut-off point for medium cognates. However, this
would make the distinctions more fluid and scalar making it harder to
categorise comparanda. Therefore we propose a simple diagnostic for distinguishing medium cognates from weak cognates: if the comparanda are
morphologically different and derived from different morphological protoforms, they should be categorised as weak cognates. In the aforementioned
comparanda Spanish sol (< sōl) vs. French soleil (< *sōliculus, see REW,
§8067), only the first part of the French word is etymologically derived
from the same stem as the Spanish. The second part of the French soleil is
derived from a Vulgar Latin diminutive -iculus, a morpheme which is not
found in the history of Spanish sol. These can therefore not be medium
cognates since not all morphemes are derived from the same proto-form;
instead, they are weak cognates.
5.1.1
Strict cognates
Strict cognates are the strongest type of cognates, because all phonological
segments of the cognate sets can be aligned, segment by segment, and the
resulting correspondence patterns can be found in the permitted set of
sound correspondences Ct .
If two forms descend from the same ancestor and have been perfectly
transmitted in every segment from the proto-language, their pairwise
segmental differences are explainable by regular sound change alone, and
we can then arrange the words in a matrix where each word is placed in a
row in such a way that regularly corresponding segments are placed in the
same column (see figure 1 above), with segments not corresponding to any
other segments (resulting from loss or epenthesis) being compared with
null-segments (gaps), usually represented by a dash (-) symbol. Stated
more formally:
Meelen, Hill and Fellner
64
• let w1 be a word in language l1 and w2 a word in language l2
• let ci be {n, m} where n is the nth segment of w1 or a gap and m is
the mth segment of w2 or a gap, and where n ∪ m 6= ∅
• let Ct be the predefined set of all phonological correspondence patterns relating l1 and l2 that are deemed regular at time t
• if ∀ci ∈ Ct , then w1 and w2 are strict cognates
Two strict (phonologically alignable) cognates might still not descend from
the same inherited form; this is particularly a risk if the putative cognates
are morphologically derived and the derivational morphology is itself cognate. To take an example, Brugmann (1881, 302) identifies Skt. tyājáyāmi
‘causes to quit, leave’ and Gk. σοβε� ω ‘scare away (birds), shoo (flies)’. The
stems of both words reconstruct straightforwardly to *ti̯ogʷ-éi̯e-. However,
Watkins (1990, 297) suspects, presumably on the basis of its relatively
late attestation and transparent semantics that tyājáyāmi ‘is productively
formed and pace Brugmann does not make a true equation’ with σοβε� ω. In
other words, Skt. tyājáyāmi provides evidence for the reconstruction of a
root *ti̯ogʷ and also provides evidence for the causative suffix *-éi̯e-, but it
does not directly support the reconstruction of a verbal stem *ti̯ogʷ-éi̯e- in
Proto-Indo-European. Rix (LIV, 643), by omitting tyājáyāmi from the descendants of *ti̯ogʷ-éi̯e- concurs with Watkins. The lesson of this example
is that the diagnostic criterion of alignability must be counterbalanced
by the heuristics that late attestation and straightforward semantics (in
morphologically derived words) weigh against a proposal of cognacy. Naturally, tyājáyāmi and σοβε� ω are still correctly regarded as cognates, but
as partial cognates (section 4.3.1) rather than as strict cognates.
5.1.2
Labelling various medium cognates
Medium cognates are those cases where two words derived from the same
proto-form have been affected by a non-phonological change, resulting in
an exceptional status for an individual segment. In section 4.2 we listed
two types of medium cognates: ‘quasi-strict’ (section 4.2.1) and ‘word
equations’ (section 4.2.2). As discussed above, ‘quasi-strict cognates’ have
three types of origins characterised by the manner in which one of their
segments is affected by a non-phonological change, viz. through paradigminternal analogy, contamination or interdialectal borrowing. Word equations form a somewhat separate category: these are cases derived from
the same root continuing furthermore at least some inherited grammatical
feature.
What are cognates?
65
An etiological classification of the quasi-strict cognates could also be
operationalised with the following diagnostics:
1. Check if the form participates in a paradigm that might provide a
motivation for analogical change
2. Check if the form participates in a semantically coherent subsystem
(e.g. numerals) of the type that is known to precipitate contamination
3. If neither of the first two check results in something promising,
conclude it is likely to be a case of dialect borrowing
Ideally, one can provide independent evidence (e.g. facts about the historical phonology of the donor dialect) for the supposition of inter-dialectal
borrowing, but this is often not possible and inter-dialectal borrowing can
be seen as a ‘catch-all’ for the residue of as yet unexplained forms.
A word equation is to some extent discourse specific, since the two
words compared must share an inherited category that the analyst is
attempting to establish as present in the proto-language. As such, the
computational identification of word equations is not necessarily sensible
as a task.
5.1.3
Labelling ‘partial’ and ‘oblique’ weak cognates
As mentioned above (section 4.3), there are two types of weak cognates:
partial and oblique cognates. Both can be distinguished from medium
cognates because unlike medium cognates, not all morphemes are derived
from the same morphological proto-form. Cases like Spanish sol vs Latin
soleil above, where only one morpheme is derived from a different source
are called ‘Partial Cognates’. ‘Oblique Cognates’, on the other hand, are
the result of extensive analogical and derivational developments. English
feather (< *pétr-eh₂-) and Greek πτερο� ν (< *pter-ó-) are good examples of
these as they exhibit different forms of the root as well as different suffixes
(section 4.3.2). The most extreme form of ‘oblique cognates’ are examples
like English tooth and Old Irish dofúaid, where only one phoneme in each
of the forms can still be derived from the same root.
5.2
Semantic alignment
Once the level of cognacy is established based on the historical phonological similarity between the two comparanda, the next dimension of
comparison is the semantics: if two forms are cognate, of whichever level,
Meelen, Hill and Fellner
66
are they synonymous or non-synonymous? Since lexical semantics is inherently biased when comparing two words in different languages, the
only way to automate this process objectively is through distributional
semantics. In theory, this can be done through state-of-the-art NLP methods using diachronic word embeddings tracking the change of words in a
particular language over time (see Hamilton, Leskovec & Jurafsky 2016,
Kutuzov, Øvrelid, Szymanski & Velldal 2018, Bizzoni, Degaetano-Ortlieb,
Menzel, Krielke & Teich 2019, Dubossarsky, Weinshall & Grossman 2017,
Dubossarsky, Tsvetkov, Dyer & Grossman 2015, Dubossarsky, Weinshall &
Grossman 2016). Since usual cross-linguistic methods are biased (as they
rely on pre-established bilingual dictionaries), the only way to compare
the semantic changes between the comparanda using diachronic word
embeddings is by comparing the developments and rates of change in each
of the languages. In practice, however, we face a number of difficulties
working with scarcely attested stages of languages (cf. Meelen 2019, Fonteyn 2020, Felbur, Meelen & Vierthaler 2022). To get good results using
diachronic word embeddings, we need large amounts of data at various
stages/windows of the languages involved. When it comes to phonological
reconstruction, we could therefore perhaps imagine comparing Modern
Spanish sol to French soleil, vectorising stages of the languages all the way
back to Classical Latin. Although this would require a large amount of
preprocessing of the data in various stages (ensuring lemmatised and balanced, comparable corpora from which word embeddings are created), it
is possible as long as there is enough data at each selected stage. When going beyond Latin, however, or when trying to reconstruct any proto-form,
we have no data to work with, making comparison of diachronic word embeddings impossible. Further research in line with the work of Montariol
& Allauzen (2019) on scarce data is necessary before these methods can
be effectively extended to the work on historical reconstruction we are
concerned with here.
When tracing the development of forms back to proto-languages, it
is therefore better to rely on alternative methods for the time being. We
propose that using colexification databases, such as CLICS (Rzymski &
Tresoldi 2019) is currently the best way to diagnose the level of semantic
similarity between cognates. The use of meaning ‘concepts’ is particularly
useful. In its most simplified form, we only check whether cognates are
listed as the same concepts and are thus synonymous (e.g. Dutch stad and
German Stadt ‘town’). Going one step further, we could diagnose different
levels of ‘non-synonymous’ cognates, namely those that have undergone
only slight changes in meaning for which a clear path of semantic change
can be established and those that are completely different. The colexifica-
67
What are cognates?
Figure 3: Subgraph from CLICS database, showing colexification strengths among the
concepts ‘town’, ‘fence’, and ‘garden’.
tions in the CLICS database can help with that, e.g. Dutch land ‘country’
and English land. When looking for the concept COUNTRY, the concept LAND
is the first colexification with 217 links. Dutch land ‘country’ and English
land are thus closely related even though they are not strictly synonymous.
Dutch tuin ‘(fenced) garden’, German Zaun ‘fence’ and English town, on the
other hand, appear completely different at first sight. The concept TOWN,
however, has a number of colexifications (e.g. VILLAGE, FORTRESS), and each
of these colexifications can be linked by subgraphs, e.g. FENCE yielding the
German Zaun. In turn, FENCE can be connected to YARD yielding the Dutch
tuin ‘garden’ as a result (see figure 3 and table 1). These connections can
thus be quantified depending on their colexifications, yielding options
ranging from strict synonymous cognates to less strict (i.e. through one or
more colexifications) and non-synonymous cognates.
As a concrete metric, we propose to count the number of edges that
must be travelled to link the meaning of one cognate with the meaning of
the other. A higher number of edges is a weaker semantic link. However,
we want to count heavy-weighted edges for less, because they are widely
attested co-lexifications. Consequently, we propose that we take the sum
Meelen, Hill and Fellner
Start node
fence
yard
garden
village
End Node
yard
garden
village
town
Number of Colexifications
8
9
3
66
68
Table 1: Colexification strengths along the path from ‘fence’ to ‘town’ in the CLICS
database.
of edges, where each edge is counted as the inverse of its weight. Stated
more formally:
• let w1 be a word in language l1 and w2 a word in language l2
• let w1 and w2 be the concepts in the Conception database that are
mapped to as models of the denotation of w1 and w2
• let ei be the ith edge in the path that starts at w1 and w2
• let Fi be the weight7 assigned to ei in the CLICS database
We can then define the semantic closeness of w1 and w2 as follows:
Sem(w1 , w2 ) =
X1
Fi
i
To give a few examples:
Sem(landDut. , landEng. ) =
Sem(ZaunGer. , tuinDut. ) =
1
= 0.0079
127
1 1
+ = 0.2361
8 9
1
1 1 1
+ + +
= 0.5846
8 9 3 66
Note that Sem(toothEng. ,dofúaidOIr. ) cannot currently be calculated with
this methodology because ‘tooth’ and ‘eat’ are not connected in the CLICS
database. One can presume, however, that if the etymology is correct, then
Sem(ZaunGer. , townEng. ) =
7
We use Fi , inspired by the F (force) of physics, in order to avoid the confusion of using
w, which is already used for ‘word’.
What are cognates?
69
a future edition of the database will link these two graphs, albeit very
weakly, i.e. Sem(toothEng. ,dofúaidOIr. ) will be a large number.
This methodology of course relies on the correct mapping of words to
their closest meanings in the Concepticon Database. The Dutch word tuin
includes the connotation that the garden is enclosed with a fence, thus it is
actually semantically closer to German Zaun then the mapping to GARDEN
in the CLICS database makes clear.
5.3
Syntactic and pragmatic alignment
Apart from similarity in phonological form and meaning, cognates can also
be more or less similar in syntactic and pragmatic function. Reanalyses
and grammaticalisation processes as well as other syntactic and pragmatic
developments between the proto-language and the present-day form can
change the function of cognates, just as much as its phonological form or
meaning can change over time. In principle, the similarity of any number
of functional parameters can be measured. For the present paper, we focus
on the main morpho-syntactic categories as well as their subtypes.
First we determine the core part of speech for each cognate, i.e. its
prevalent morpho-syntactic function in the language at its current state. Although nouns and verbal stems are most commonly compared, in principle
many core parts of speech can occur, as in the following (non-exhaustive)
list:
• verbs (verbal roots or stems)
• nouns
• pronouns
• numerals
• adverbs
• adjectives
• adpositions (prepositions, postpositions)
• determiners (articles, demonstratives)
• particles (negation, focus, question, etc)
Although there are exceptions, especially with weak cognates (e.g. English
tooth vs Old Irish dofúaid), often the core parts of speech of each of the
cognates will be the same. In the following sections we therefore present
Meelen, Hill and Fellner
70
a number of diagnostics to distinguish between various subtypes, which
can reveal more detailed changes of syntactic or pragmatic functions. As
long as all of the cognates under comparison are submitted to the same
diagnostic tests, when testing similarity of function we can go into any
level of detail. In practice a high level of detail may only be useful in automated procedures where the similarity of large amounts of cognates (in
form, meaning and function) is computed. When manually comparing cognates a more superficial level of detail, e.g. a simple comparison on the
part-of-speech level could be sufficient information to determine whether
cognates are similar or not in terms of their function. The following subsections provide some examples of how to classify some parts of speech
further based on their core functions.
5.3.1
Verbs
Verbs can be classified as intransitives, (optionally) transitives or ditransitives depending on the number of arguments (one, two or three respectively) they take. Intransitive verbs can furthermore be split into unergatives and unaccusatives depending on the nature of their one core argument.
Diagnostics for distinguishing between these subcategories can vary from
language to language. In section 2.3 we presented detailed examples from
Dutch and English, but these, to a certain extent, can be applied to other
languages as well, e.g. German or Italian and French.
5.3.2
Nouns
Nouns could be divided into various subtypes as well, but for present
purposes, we limit ourselves to a basic distinction between mass and count
nouns. Diagnosing count nouns can be easily done by testing whether
plural markers (affixes, determiners, etc) and numeral modifiers of two
and three or higher are allowed. In English, the count noun cloth, can
be distinguished from clothing, because three cloths is possible, whereas
*three clothings is not. Note that certain mass and collective nouns in
many languages can be unitised, however, when plural interpretations are
derived from the unit it can be measured in. Examples of these in English
are rice or milk, where two rices/milks in fact denotes ‘two bowls of rice’
and ‘two glasses of milk’ respectively.
5.3.3
Adverbs
Adverbs that are adjuncts (e.g. adverbs or time or place) often exhibit a
certain amount of distributional freedom (cf. Bonami, Godard & Kampers-
What are cognates?
71
Manhe 2004); the function and distribution of scopal adverbs, on the other
hand, is more restricted. Various functions of adverbs could be tested for in
theory, but we limit ourselves to one core example known from traditional
classification of adverbs in the literature (e.g. Jackendoff 1972), i.e. their
scope. Many adverbs in English and other languages have either broad
or narrow scope. Some, however, can have both narrow scope (i.e. just
over the verb phrase: ‘VP scope’) and broad scope (i.e. over the entire
proposition: ‘CP scope’ or so-called ‘S adverbs’). Examples of each of these
types are given in (3), whereas example (4) shows certain adverbs, like
English hopefully, could have either function:
(3)
(4)
a.
b.
a.
b.
He completely ate the cheese.
He evidently ate the cheese.
[VP adverb]
[S adverb]
He hopefully walked home, thinking this time he finally made
a difference.
[VP adverb]
Hopefully, the weather will be nice tomorrow.
[S adverb]
The syntactic position of these adverbs that can function as either VP or S
adverbs determines their scope. The VP adverb hopefully in example (4-a),
which is modifying the verb only, cannot occur sentence-initially. If it does,
as shown in (4-b), its scope widens to modify the entire proposition. In
the same vein, Potsdam (n.d.), for example, gives the following examples
showing broad-scoped adverbs must precede narrow-scoped adverbs in
English:
(5)
a.
Hulk Hogan [evidently]S [completely]V P annihilated his opponent.
b. *Hulk Hogan [completely]V P [evidently]S annihilated his opponent.
In addition to these VP and S adverbs, Jackendoff (1972) identifies a third
type, which have the positional distribution of neither of the former two
classes. Potsdam (n.d.) labels these ‘E(xtent) Adverbs’ because they describe the extent to which a situation holds. Examples of these in English
are merely, hardly, scarcely etc. More detailed distinctions between different types of adverbs cross-linguistically, depending on their positional
distribution are made by, among others, Cinque (1999) and Rizzi (2004).
5.3.4
Other particles
‘Other particles’ come in various shapes and forms and are deliberately not
specified here further to facilitate cross-linguistic comparison. Depending
Meelen, Hill and Fellner
72
on the language, any ‘markers’, ‘operators’, ‘particles’ or any remaining
parts of speech can convey pragmatic functions. We briefly discussinformationstructural and speech-act features here.
There are three core dimensions of information structure:
• focus vs background
• topic vs comment
• given vs new Information
These features can be expressed in the language through phonology
(e.g. intonational phrases indicate certain types of topics in English, Japanese and German, cf. Krifka & Musan 2012, 34), morphology (e.g. suffixes to
mark VP focus such as -go in Chadic, cf. Hartmann & Zimmermann 2007),
syntax (e.g. various V2 and cleft orders in Middle Welsh, cf. Meelen 2016,
chapter 5) and lexical items and particles. In this section, we focus on
lexical and functional items as these are most likely to be reconstructed,
however, establishing the cognacy of morphological affixes is also possible.
In Dutch, for instance, ook ‘also’ often functions as a focus marker. Old
Frisian āk ‘also, even’ and German auch are both adverbs that function
as a focus markers too. These are strict cognates as they exhibit perfect
phonological alignment. They are furthermore synonymous, since their
semantic and pragmatic functions are the same as well. Old English ēac
‘with, besides’, however, is a preposition. It thus scores slightly lower on
the functional similarity scale.
A good example of a functional marker exhibiting speech-act features
are pragmaticalised Norwegian sánn, German son and Dutch zo’n ‘such an
X’ (of the kind that we both know). Speech-act features can be oriented
towards the speaker, hearer or both participants. Kinn & Meelen (forthcoming) argue that in both Norse and Dutch, the new pragmatic function
relates Norwegian sånn and Dutch zo’n to both Speaker and Hearer features, yielding its new pragmatic ‘recognitional’ interpretation to mark
that the noun phrase it modifies is in the common ground of both. Originally, both items are derived from demonstratives with a deictic function.
In this case then German son and Dutch zo’n are not just strict cognates
exhibiting perfect phonological alignment, they are also synonymous in
terms of semantics and pragmatics. Other examples of cognates whose
pragmatic functions can be meaningfully compared in this way are, for example, a number of evidentiality markers that exhibit speech-act features
as well.
73
5.4
What are cognates?
Metrics for automatic cognate detection
Figure 4 presents a simplified representation of the core variables: of
form, meaning and function. Note that this 3-dimensional visualisation is
just a simplification. In practice, each subvariable could be a metric and
each of the variables could get a weight to give prominence to whichever
factors are deemed more important in the comparison, e.g. phonology and
semantics. Synonymous cognates differ from non-synonymous cognates
because they exhibit similarity in meaning and function. In the above sections on semantic, morpho-syntactic and pragmatic similarity, we gave
a number of diagnostics to test whether cognates exhibit functional similarity or not. Each of these could be linked to a clear scoring metric to
facilitate automatic cognate similarity comparison. Semantic similarity can
be measured by using the CLICS database of concepts, checking how distant colexifications are. When it comes to morpho-syntactic and pragmatic
variables, similarity scores can be established for each of the features under comparison, e.g. cognates that are both verbal stems, but differ in level
of transitivity, are more similar than cognate pairs consisting of nouns and
verbs, but less similar than those pairs that are both unergative intransitive
verbs. A scoring scale can thus be established for each cognate language
pair under investigation.
Figure 4 shows the three core dimensions and two samples of resulting
three-dimensional planes: the smaller the surface of the plane (i.e. the
closer all three scores are to (0,0,0) or an arbitrary number, e.g. 100%, the
stricter (qua phonological form) and more synonymous (in meaning and
function) the cognates are.
6
Conclusion
We hope that the foregoing discussion has succeeded in tightening up what
is meant by ‘cognate’ in historical linguistics and partially formalizing the
evaluation of whether two forms are cognate. This (partial) formalization
of cognacy and its associated workflow serves as one small subcomponent
of an overall formalization of the comparative method. The need for such
an overall formalization is now widely recognized, both for its inherent
intellectual merits and because a computational implementation of the
comparative method is the only way that we can resonably hope for nonIndo-European language families to become as well understood as IndoEuropean.
Historical linguists typically speak as if cognancy is a binary relationship. This conceit is perhaps a convenient simplification in the context of
Meelen, Hill and Fellner
74
Figure 4: Cognacy variables in 3 dimensions indicating similarity in form, meaning and
function.
traditional historical linguistics, but it is a dangerous misunderstanding
when taken for granted in machine readable datasets. In particular, future phylogenetic work would merit from deploying a more sophisticated
model of cognacy.
Comments invited
PiHPh relies on post-publication review of the papers that it publishes. If
you have any comments on this piece, please add them to its comments
site. You are encouraged to consult this site after reading the paper, as
there may be comments from other readers there, and replies from the
author. This paper’s site is here:
http://dx.doi.org/10.2218/pihph.7.2022.7405
75
What are cognates?
Acknowledgements
The authors would like to thank Johann Mattis-List for valuable comments on an earlier version of this paper. We also acknowledge the ERC
grant ‘ASIA: Beyond Boundaries: Religion, Region, Language and the State’
(2014-2020, ID: 609823), under the auspices of which Marieke Meelen
and Nathan W. Hill first began working on this paper. Research for this
paper was also supported by the Austrian Science Fund (FWF): project
number Y-1044.
Author contact details
Marieke Meelen
University of Cambridge
Trinity Hall
Trinity Lane
Cambridge CB2 1TJ
United Kingdom
mm986@cam.ac.uk
Nathan W. Hill
Trinity College Dublin
Trinity Centre for Asian Studies
Dublin 2
Ireland
nathan.hill@tcd.ie
Hannes Fellner
University of Vienna
Department of Linguistics
Sensengasse 3a
1090 Vienna
Austria
hannes.fellner@univie.ac.at
References
Alexiadou, Artemis, Elena Anagnostopoulou & Martin Everaert. 2004. The
unaccusativity puzzle: explorations of the syntax-lexicon interface.
Meelen, Hill and Fellner
76
Anttila, Raimo. 1972. An introduction to historical and comparative linguistics. New York: Macmillan.
Anttila, Raimo. 1989. Historical and comparative linguistics, vol. 6. John
Benjamins Publishing.
Bizzoni, Yuri, Stefania Degaetano-Ortlieb, Katrin Menzel, Pauline Krielke
& Elke Teich. 2019. Grammar and meaning: analysing the topology of
diachronic word embeddings. In Proceedings of the 1st International
Workshop on Computational Approaches to Historical Language Change,
175–185.
Blust, Robert. 1990. Patterns of sound change in the Austronesian languages. In Philip Baldi (ed.), Linguistic change and reconstruction methodology, 231–270. Berlin; New York: Mouton de Gruyter.
Bonami, Olivier, Daniè le Godard & Brigitte Kampers-Manhe. 2004. Adverb
classification. Handbook of French semantics. 143–184.
Brugmann, Karl. 1881. Griechische etymologien. Zeitschrift für vergleichende Sprachforschung auf dem Gebiete der Indogermanischen
Sprachen (25). 298–307.
Bynon, Theodora. 1977. Historical Linguistics. Cambridge: Cambridge University Press.
Campbell, Lyle. 2004. Historical linguistics: an introduction. 2nd edn. Edinburgh: Edinburgh University Press.
Chang, Will, Chundra Cathcart, David Hall & Andrew Garret. 2015.
Ancestry-constrained phylogenetic analysis ssupport the IndoEuropean steppe hypothesis. Language 91(1). 194–244.
Cheng, Lisa Lai-Shen & Norbert Corver. 2013. Diagnosing syntax, vol. 46.
Oxford University Press.
Cinque, Guglielmo. 1999. Adverbs and functional heads: a cross-linguistic
perspective. Oxford University Press on Demand.
Clackson, James. 2007. Indo-European linguistics. Cambridge: Cambridge
University Press.
Crowley, Terry. 1982. The Paamese language of Vanuatu. Canberra, A.C.T.,
Australia: Dept. of Linguistics, Research School of Pacific Studies, Australian National University.
Crowley, Terry & Claire Bowern. 2010. An introduction to historical linguistics. 4th edn. Oxford: Oxford University Press.
Crystal, David. 2008. A dictionary of linguistics and phonetics. malden. MA:
Blackwell.
Davies, Anna Morpurgo. 1998. Nineteenth-century linguistics. Giulio
Lepschy (ed.) (History of Linguistics IV). London: Longman.
Dimmendaal, Gerrit J. 2011. Historical linguistics and the comparative study
of African languages. Amsterdam: John Benjamins Publishing Company.
77
What are cognates?
Dubossarsky, Haim, Yulia Tsvetkov, Chris Dyer & Eitan Grossman. 2015.
A bottom up approach to category mapping and meaning change. In
Networds, 66–70.
Dubossarsky, Haim, Daphna Weinshall & Eitan Grossman. 2016. Verbs
change more than nouns: A bottom-up computational approach to semantic change. Unpublished Manuscript. https://www.academia.
edu/25793914.
Dubossarsky, Haim, Daphna Weinshall & Eitan Grossman. 2017. Outta
control: laws of semantic change and inherent biases in word representation models. In Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing, 1136–1145.
Felbur, Rafal, Marieke Meelen & Paul Vierthaler. 2022. Crosslinguistic semantic textual similarity of Buddhist Chinese and Classical Tibetan.
Journal of Open Humanities Data 8(1). 23, 1–14.
Fellner, Hannes & Nathan W. Hill. 2019a. The differing status of reconstruction in Trans-Himalayan and Indo-European. Cahiers de Linguistique
– Asie Orientale 48(2). 159–172. https : / / brill . com / view /
journals/clao/48/2/article-p159_5.xml.
Fellner, Hannes & Nathan W. Hill. 2019b. Word families, allofams, and the
comparative method. Cahiers de linguistique – Asie Orientale 48(2).
91–124.
Fonteyn, Lauren. 2020. What about grammar?: Using BERT embeddings
to explore functional-semantic shifts of semi-lexical and grammatical
constructions. In Proceedings of CHR 2020: Workshop on computational
humanities research, 257–268. CEUR-WS.
Fortson, Benjamin W. 2010. Indo-Eeuropean languages and culture: an
introduction. Malden, Oxford & Victoria: Blackwell.
Gray, Russell D. & Quentin D. Atkinson. 2003. Language-tree divergence
times support the Anatolian theory of Indo-European origin. Nature
426(6965). 435–439.
Gray, Russell D. & F. M. Jordan. 2000. Language trees support the expresstrain sequences of Austronesian expansion. Nature (405). 1052–1055.
Hale, Mark. 2007. Historical Linguistics: Theory and Method. 1st edn.
(Blackwell Textbooks in Linguistics). Malden, Oxford & Victoria: WileyBlackwell.
Hamilton, William L, Jure Leskovec & Dan Jurafsky. 2016. Diachronic word
embeddings reveal statistical laws of semantic change. arXiv preprint
arXiv:1605.09096.
Hartmann, Katharina & Malte Zimmermann. 2007. Focus strategies in
Chadic–the case of Tangale revisited. Studia Linguistica 61(2). 95–129.
Meelen, Hill and Fellner
78
Hill, Nathan W. 2014. Grammatically conditioned sound change. Language
and Linguistics Compass 8(6). 211–229. http://eprints.soas.
ac.uk/18595/.
Hill, Nathan W. 2019. The historical phonology of Tibetan, Burmese, and
Chinese. London: Cambridge University Press.
Hill, Nathan W. & Johann-Mattis List. 2017. Challenges of annotation and
analysis in computer-assisted language comparison: A case study on
Burmish languages. Yearbook of the Poznań Linguistic Meeting 3(1).
47–76.
Hock, Hans Henrich. 1991. Principles of historical linguistics. Berlin:
Mouton de Gruyter.
Hockett, Charles F. 1967. Where the tongue slips, there slip I. In To honor
Roman Jakobson, 910–936. Mouton.
Hoenigswald, Henry Max. 1960. Language change and linguistic reconstruction. 4. Aufl. 1966. Chicago: The University of Chicago Press & Univ. of
Chicago Press.
Huehnergard, John. 2004. Afro-Asiatic and Semitic languages. In Roger
D. Woodard (ed.), The cambridge encyclopedia of the world’s ancient
languages, 225–246. Cambridge: Cambridge University Press.
Jackendoff, Ray S. 1972. Semantic interpretation in generative grammar.
Jasanoff, Jay H. 2003. Hittite and the Indo-European verb. Oxford: Oxford
University Press.
Kinn, Kari & Marieke Meelen. Forthcoming. Formalising pragmaticalisation
in Dutch and Norwegian DPs.
Koch, Harald. 1996. Reconstruction in morphology. In Mark Durie (ed.),
218–263. New York: Oxford University Press.
Koch, Harold & Luise Hercus. 2013. Obscure vs. transparent cognates
in linguistic reconstruction. In Robert Mailhammer (ed.), Lexical and
structural etymology, 33–51. Berlin & New York: de Gruyter.
Krifka, Manfred & Renate Musan. 2012. Information structure: overview
and linguistic issues. The expression of information structure 5. 1–44.
Kutuzov, Andrey, Lilja Øvrelid, Terrence Szymanski & Erik Velldal. 2018.
Diachronic word embeddings and semantic shifts: a survey. arXiv preprint arXiv:1806.03537.
Labat, Sofie & Els Lefever. 2019. A classification-based approach to cognate
detection combining orthographic and semantic similarity information.
In Recent advances in natural language processing 2019, 603–611.
LaPolla, Randy. 2017. Overview of Sino-Tibetan morphosyntax. In Randy J.
Lapolla & Graham Thurgood (eds.), The sino-tibetan languages, 40–69.
Routledge.
79
What are cognates?
List, Johann-Mattis. 2012. Multiple sequence alignment in historical linguistics: A sound class based approach. In Enrico Boone, Kathrin Linke
& Maartje Schulpen (eds.), Proceedings of ConSOLE XIX, 241–260.
List, Johann-Mattis. 2014. Sequence comparison in historical linguistics.
Dü sseldorf: Dü sseldorf University Press.
List, Johann-Mattis. 2019. Automatic inference of sound correspondence
patterns across multiple languages. Computational Linguistics 45(1).
137–161. https://www.aclweb.org/anthology/J19-1004.
List, Johann-Mattis, Mary Walworth, Simon J. Greenhill, Tiago Tresoldi &
Robert Forkel. 2018. Sequence comparison in computational historical
linguistics. Journal of Language Evolution 3(2). 130–144.
Matisoff, J. A. 1973. Tonogenesis in Southeast Asia. In Larry H. Hyman (ed.),
Consonant Types and Tone, 71–95. Los Angeles: UCLA.
Meelen, Marieke. 2016. Why Jesus and Job spoke bad Welsh: The origin and
distribution of V2 orders in Middle Welsh. Utrecht: LOT dissertation
series.
Meelen, Marieke. 2019. Darling, dukeling, duckling: How historical corpora
can verify predicted pathways of language change. Keynote talk at the
Cambridge Language Sciences Symposium, 19 November 2019.
Meyer-Lü bke, Wilhelm (comp.). 1911. Romanisches etymologisches Wörterbuch (Sammlung romanischer Elementar- und Handbü cher 3.3).
Heidelberg: Winter.
Montariol, Syrielle & Alexandre Allauzen. 2019. Empirical study
of diachronic word embeddings for scarce data. arXiv preprint
arXiv:1909.01863.
Nussbaum, Alan J. 1986. Head and horn in Indo-European. The words for
“horn,” “head,” and “hornet”. Berlin & New York: de Gruyter.
Potsdam, Eric. N.d. A syntax for adverbs. In The Proceedings of the 1998
Western Conference on Linguistics (WECOL98).
Rama, Taraka & Johann-Mattis List. 2019. An automated framework for
fast cognate detection and bayesian phylogenetic inference in computational historical linguistics. In 57th Annual Meeting of the Association for
Computational Linguistics, 6225–6235. Association for Computational
Linguistics.
Ringe, Donald. 2006. A linguistic history of English. Vol. 1: From Proto-IndoEuropean to Proto-Germanic. Oxford: Oxford University Press.
Rix, Helmut (ed.). 2001. LIV. Lexikon der Indogermanischen Verben: Die
Wurzeln und ihre Primärstammbildungen. In collab. with Martin Kü mmel, Thomas Zehnder, Reiner Lipp & Brigitte Schirmer. Wiesbaden:
Reichert.
Meelen, Hill and Fellner
80
Rizzi, Luigi. 2004. The structure of CP and IP: The cartography of syntactic
structures volume 2: The cartography of syntactic structures, vol. 2.
Oxford University Press.
Rzymski, Christoph & Tiago Tresoldi. 2019. The database of cross-linguistic
colexifications, reproducible analysis of cross-linguistic polysemies.
https://clics.clld.org/.
Schwink, Frederick. 1994. Linguistic typology, universality and the realism
of reconstruction. Washington: Institute for the Study of Man.
Thurneysen, Rudolf. 1946. A grammar of Old Irish. Trans. by D. A. Binchy
& Osborn Bergin. Dublin: School of Celtic Studies, Dublin Institute for
Advanced Studies.
Trask, Robert Larry (comp.). 2000. The dictionary of historical and comparative linguistics. Edinburgh: Edinburgh University Press.
Trask, Robert Larry. 2015. Trask’s historical linguistics. Robert McColl Millar (ed.). 3rd edn. London & New York: Routledge.
Vine, Brent. 1993. Greek -ι�σϰω and indo-european “*-isk̑ e/o-”. Historische
Sprachforschung / Historical Linguistics 106(1). 49–60. http://www.
jstor.org/stable/40849080.
von Mengden, Ferdinand. 2008. Paul Georg Meyer, Synchronic English
Linguistics: An Introduction. Anglia-Zeitschrift für englische Philologie
126(1). 114–118.
Watkins, Calvert. 1990. Etymologies, equations, and comparanda: Types
and values, and criteria for judgment. In Philip Baldi (ed.), Linguistic
change and reconstruction methodology, 289–303. Berlin; New York:
Mouton de Gruyter.
Weiss, Michael. 2009. Outline of the historical and comparative grammar
of Latin. Ann Arbor: Beech Stave Press.
Weiss, Michael. 2014. The comparative method. In Claire Bowern & Nicholas Evans (eds.), The Routledge Handbook of Historical Linguistics,
1st edn. (Routledge Handbooks in Linguistics), 127–145. New York:
Routledge.