New Pictish Language - Proceedings of The Royal Society A
New Pictish Language - Proceedings of The Royal Society A
New Pictish Language - Proceedings of The Royal Society A
References
P<P
Subject collections
Receive free email alerts when new articles cite this article - sign up in
the box at the top right-hand corner of the article or click here
Advance online articles have been peer reviewed and accepted for publication but have not
yet appeared in the paper journal (edited, typeset versions may be posted when available
prior to final publication). Advance online articles are citable and establish publication
priority; they are indexed by PubMed from initial publication. Citations to Advance online
articles must include the digital object identifier (DOIs) and date of initial publication.
Proc. R. Soc. A
doi:10.1098/rspa.2010.0041
Published online
AND
PAULINE ZIMAN3
1 School
1. Introduction
Among the durable artefacts left by prehistoric societies, there are many instances
of enigmatic scripts. These scripts typically consist of very short sequences of
regularly placed symbols (or single symbols) and range from the inscribed pottery
of the Chinese Neolithic pottery (Li et al. 2002), through the inscribed clay tablets
and seals of the Indus Valley culture (Rao et al. 2009) to the inscribed stones of
Late Iron Age Scotland (Wainwright et al. 1955; Mack 1997). A longstanding
conundrum has been to determine whether any of the symbol sets might be an
example of a written language. A number of problems have impeded progress in
this area: the non-availability of reliable corpuses describing the specic symbols,
a lack of agreement on the denition of individual symbol types, small corpus
sizes ranging from a couple of hundred to a few thousand symbols, the often short
nature of individual inscriptions (one to three symbols in length) and the lack of
*Author for correspondence (r.lee@exeter.ac.uk).
Received 27 January 2010
Accepted 26 February 2010
R. Lee et al.
2. Theory
The problem that the Pictish symbols pose can be broken into a couple of
questions. (i) Are they random in nature (admittedly unlikely since they appear
to have been carved for a purpose)? (ii) If it is unlikely that they are random,
then what type of communication do they convey: (a) semasiography, where
Proc. R. Soc. A
(a)
(b)
Figure 1. Pictish Symbol Stones. (a) Class I stone, Grantown, with two symbolsstag and
rectangle. (b) Aberlemno 2, a Class II stone with two symbolsdivided rectangle with a Z rod
and triple disc, as well as other imagery (a battle, the cross is on the other face).
R. Lee et al.
10
9
8
7
6
F1
5
4
3
2
1
0
10
log2 Nu
Figure 2. Plot of F1 (uni-gram entropy) versus log2 Nu (number of different uni-grams) showing
the 99.9% condence ellipse for prediction of the random data. This gure tests whether the stones
correspond to similar-sized samples from a nite alphabet of equal relative frequency of unigram
occurrence. It is extremely unlikely that the observed values for the Pictish Stones would occur
by chance were they indeed a random dataset. Open squares, random data; lled triangles, Pictish
Symbol Stones; dotted line, upper 99.9% condence ellipse for prediction; solid line, lower 99.9%
condence ellipse for prediction.
Nu
pi log2 pi ,
(2.1)
i=1
i,j
(2.2)
Proc. R. Soc. A
3
F2
2
0
10
100
1000
Tu
10 000
100 000
Figure 3. Plot of F2 (di-gram entropy) versus Tu (text size based on the total number of uni-gram
characters) for a wide range of texts and character types. The di-gram entropy is similar for different
types of characters in datasets with small sample size owing to the incomplete nature of the di-gram
lexicons. Dashes, sematogramsheraldry; lled diamonds, lettersprose, poetry and inscriptions;
grey lled triangles, syllablesprose, poetry, inscriptions; open squares, wordsgenealogical lists;
crosses, code characters; open diamonds, lettersgenealogical lists; lled squares, wordsprose,
poetry and inscriptions.
R. Lee et al.
F2
.
[log2 (Nd /Nu )]
(2.3)
3
F2
2
10
(Nd /Nu)
100
Figure 4. Plot of F2 (di-gram entropy) versus Nd /Nu (degree of di-gram lexicon completeness) using
a log-linear scale. The di-gram entropy for different types of characters is dependent upon the level
of completeness of the di-gram lexicon. Dashes, sematogramsheraldry; lled diamonds, letters
prose, poetry and inscriptions; grey lled triangles, syllablesprose, poetry, inscriptions; open
squares, wordsgenealogical lists; crosses, code characters; open diamonds, lettersgenealogical
lists; lled squares, wordsprose, poetry and inscriptions.
Nd
Sd
,
+a
Nu
Td
(2.4)
R. Lee et al.
1.0
0.9
0.8
0.7
(Sd/Td)
0.6
0.5
0.4
0.3
0.2
0.1
0
1
11
13
15
17
19
21
(Nd/Nu)
Figure 5. Plot of Sd /Td (degree of di-gram repetition) versus Nd /Nu (degree of di-gram
lexicon completeness). The degree of di-gram repetition is also dependent upon the level of
completeness of the di-gram lexicon and that this dependency is different for standard lexigraphic
characters compared with heraldic sematogram characters. Dashes, sematogramsheraldry; lled
diamonds, lettersprose, poetry and inscriptions; grey lled triangles, syllablesprose, poetry,
inscriptions; open squares, wordsgenealogical lists; crosses, code characters; open diamonds,
lettersgenealogical lists; lled squares, wordsprose, poetry and inscriptions.
3. Results
Figure 2 shows the 99.9 per cent condence ellipse for prediction around 40
sets of random data. The datasets plotted in gure 2 were generated as follows:
characters were sampled from a uniform distribution (i.e. with equal relative
frequencies) into small units of text similar to the small units of glyphs seen
on the stones. The key properties (total number of unigrams, number of different
unigrams and the subsequent fraction of unigrams appearing only once) bracketed
the corresponding properties observed in the stones. Figure 2 therefore tests
whether the stones correspond to similar-sized samples from a nite alphabet
of equal relative frequency of unigram occurrence. Texts based on written
Proc. R. Soc. A
Cr 4.89
Cr < 4.89
heraldic sematograms
code characters
repetitive lexigraphic characters
Ur 1.37
Ur < 1.37
words
Ur < 1.09
letters
Ur 1.09
syllables
Figure 6. Two-parameter decision tree that separates repetitive text from non-repetitive text.
This gure classies the character types found in non-repetitive text into the three main
lexigraphic character units (words, syllables, letters). Repetitive text consists of two main categories
of characters: non-lexigraphic heraldic characters and lexigraphic code characters, as well as
non-concordant letter, syllable and word character texts that are repetitive.
10
R. Lee et al.
increasing vocabulary constraint
1.0
0.8
0.6
0.4
0.2
0
0.7
0.8
0.9
1.0
1.1
Ur
Figure 7. The effect on the empirical cumulative distributions of Ur (F2 / log2 (Nd /Nu )) of increasing
the character vocabulary constraint for letters. As the vocabulary becomes constrained, the
distribution of Ur becomes narrower and the mean value decreases. Short-dashed line, empirical
cumulative distribution for letter characters for all prose, poetry and inscriptions; long-dashed line,
empirical cumulative distribution for letter characters for constrained genealogical lists; solid line,
empirical cumulative distribution for letter characters from very constrained lists.
followed by u in English) and (ii) the constrained nature of the letter lexicon
compared with word lexicon (26 letters in English versus word vocabulary of
hundreds for even the most constrained texts). This means that for a given value
of (Nd /Nu ), we should expect F2 for words to be larger than letters and thus Ur
to be larger, and gures 3 and 6 show this to be the case. The separation of the
lexigraphic character types is independent of language or sign type (i.e. alphabet,
syllabogram and logogram scripts).
As a character vocabulary is constrained, it becomes easier to predict the
next character, decreasing F2 and Ur . The effect of constraining the character
vocabulary upon the distribution of Ur is shown in gure 7. Within normal texts,
there is a wide variety of vocabulary constraints. Constraining the character
vocabulary (e.g. King lists and genealogical lists that are constrained to a
vocabulary of names or genealogical lists using an even smaller vocabulary of
familiar, diminutive names) gives a narrower distribution and a decreasing mean
value of Ur .
The tree classier developed suggests that the Pictish symbols are lexigraphic
in nature because they have values of Cr in the interval [5.6, 6.2] (table 1). In
particular, we infer that the Pictish symbols are not drawn from a distribution of
heraldic characters. Table 1 shows that Macks symbol categorization gives values
of Ur that fall in the syllable side of the syllable/word boundary. However, Macks
categorization of the symbol types is much narrower than that of other workers
(Allen & Anderson 1903; Diack 1944; Forsyth 1997). If Macks categorization
Proc. R. Soc. A
11
Table 1. Values of Cr and Ur calculated for the Class I and Class II Pictish symbol stones using
the symbol types given by Mack (1997) and by Allen & Anderson (1903).
stone class
Cr
Ur
character classication
I
II
I
II
Mack
Mack
Allen & Anderson
Allen & Anderson
5.92
6.11
5.64
6.16
1.28
1.36
1.39
1.45
syllable
syllable
word
word
is incorrect, then this will have the effect of articially constraining the symbol
lexicon, lowering F2 and Ur . The larger symbol categorization proposed by Allen
and Anderson in Early Christian Monuments of Scotland implies that the Pictish
symbols are very constrained words, similar in constraint to the genealogical
name lists. Thus, it is likely that the symbols are actually words, but that Macks
categorization has lowered the symbol di-gram entropy such that the data fall in
the syllable band.
4. Discussion
Since there are many complete stones inscribed only with a single symbol, it
seems unlikely (although not impossible) that the symbols are single syllables.
In order to answer the question of whether the symbols are words or syllables,
and thus dene a system from which a decipherment can be initiated, a complete
visual catalogue of the stones and the symbols will need to be created and the
effect of widening the symbol set investigated. However, demonstrating that the
Pictish symbols are writing, with the symbols probably corresponding to words,
opens a unique line of further research for historians and linguists investigating
the Picts and how they viewed themselves.
Having shown that it is possible to use an entropic technique to investigate the
degree of communication in very small and incomplete written systems, it may be
possible to extend this to other areas with similar problems. For example, animal
language studies using Shannon entropies are often hampered by small sample
datasets (McCowan et al. 1999). By building a similar set of data for spoken or
verbal human communication, it should be possible to make similar comparisons
of the level of information communicated by animal languages.
12
R. Lee et al.
13
(l) Random
Randomly generated characters texts, ranging from sets of two to 100 different
characters, were used with texts sizes of 151000 characters. The texts bracketed
the values observed in the stones for the total number of uni-grams (Tu ), the
number of different uni-grams (Nu ) and the fraction of uni-grams appearing
only once.
14
R. Lee et al.
Glossary
Tu : total number of characters (uni-grams) in a text. Tu is the text size for that
character type, thus a text of 200 words may have a letter text size of 900
letters and a syllable text size of 520 syllables.
Nu : the number of different characters (uni-grams) in a text. Thus, a 200 word text
might have 25 different letters, 100 different syllables and 130 different words.
Proc. R. Soc. A
15
References
Allen, J. R. & Anderson, J. 1903 The early Christian monuments of Scotland. Balgavies, Angus:
The Pinkfoot Press. (Reprinted by The Pinkfoot Press 1993.)
Anderson, M. O. 1973 Kings and kingship in early Scotland, pp. 119204. Edinburgh, UK: Scottish
Academic Press.
Atkinson, Q. D., Meade, A., Venditti, C., Greenhill, S. J. & Pagel, M. 2008 Languages evolve on
punctuational bursts. Science 319, 588. (doi:10.1126/science.1149683)
Bouissac, P. A. 1997 In Archaeology and language I (eds R. Blench & M. Spriggs), pp. 5362.
London, UK: Routledge.
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. 1984 Classication and regression trees,
ch. 28. London, UK: Chapman & Hall.
Burke, B. 1962 A genealogical history of the dormant, abeyant, forfeited and extinct peerages of the
British Empire. London, UK: Burkes Peerage Ltd.
Byrne, J. F. 1973 Irish kings and high-kings, pp. 275301. New York, NY: St Martins Press.
Collingwood, R. G. & Wright, R. P. 1965 The Roman inscriptions of Britain, volume I: inscriptions
on stone. Oxford, UK: Oxford University Press.
Diack, F. C. 1944 In The inscriptions of Pictland (eds W. M. Alexander & J. Macdonald), pp. 742.
Aberdeen, UK: Third Spalding Club.
Dunn, M., Terrill, A., Reesink, G., Foley, R. A. & Levinson, S. C. 2005 Structural phylogenetics
and reconstruction of ancient language history. Science 309, 20722075. (doi:10.1126/
science.1114615)
Forsyth, K. F. 1997 In The worm, the germ, and the thorn (ed. D. Henry), pp. 8598. Balgavies,
Angus: The Pinkfoot Press.
Foster, P. & Toth, A. 2003 Toward a phylogenetic chronology of Ancient Gaulish, Celtic and
Indo-European. Proc. Natl Acad. Sci. USA 100, 90799084. (doi:10.1073/pnas.1331158100)
Li, X., Harbottle, G., Zhang, J. & Eang, C. 2002 The earliest writing? Sign use in the seventh
millennium BC at Jihua, Henan Province China. Antiquity 77, 3145.
Lung, M. 2009 Chinese text initiative, University of Virginia library. See
http://etext.virginia.edu/chinese/.
Macalister, R. A. S. 1945 Corpus inscriptionum insularum celticarum, vol. I. Dublin, Ireland: Dublin
Stationery Ofce. (Reprinted by Four Courts Press 1996.)
Macalister, R. A. S. 1949 Corpus inscriptionum insularum celticarum, vol. II. Dublin, Ireland:
Dublin Stationery Ofce.
Mack, A. 1997 Field guide to the Pictish symbol stones. Balgavies, Angus: The Pinkfoot Press
(updated 2006).
Mardia, K. V., Kent, J. T. & Bibby, J. M. 1979 Multivariate analysis, 1st edn, ch. 2, pp. 3840.
London, UK: Academic Press.
Proc. R. Soc. A
16
R. Lee et al.
McCowan, B., Hanser, S. F. & Doyle, L. R. 1999 Quantitative tools for comparing animal
communication systems: information theory applied to bottlenose dolphin whistle repertoires.
Anim. Behav. 57, 409419.
McManus, D. 1991 A guide to Ogam. Maynooth Monographs 4. Maynooth, Ireland: An Sagart.
Montague-Smith, P. W. 1992 The royal line of succession. Andover, UK: Pitkin.
Nash-Williams, V. E. 1950 The early Christian monuments of Wales. Cardiff, UK: University of
Wales Press.
Okasha, E. 1971 Hand-list of Anglo-Saxon non-runic inscriptions. Cambridge, UK: Cambridge
University Press.
Page, R. I. 1995 Runes and runic inscriptions, pp. 207244. Woodbridge, VA: Boydell.
Pagel, M., Atkinson, Q. D. & Meade, A. 2007 Frequency of word-use predicts rates of lexical
evolution throughout Indo-European history. Nature 449, 717720. (doi:10.1038/nature06176)
Palmer, L. R. 1998 The interpretation of Mycenaean Greek texts. Oxford, UK: Oxford University
Press.
Picard, R. & Cook, D. 1984 Cross-validation of regression models. J. Am. Stat. Assoc. 79, 575583.
(doi:10.2307/2288403)
Powell, B. B. 2009 Writing: theory and history of the technology of civilization, pp. 159. Chichester,
UK: Wiley-Blackwell.
Rao, R. P. N., Yadav, N., Vahia, M. N., Joglekar, H., Adhikari, R. & Mahadevan, I. 2009
Entropic evidence for linguistic structure in the Indus script. Science 324, 1165. (doi:10.1126/
science.1170391)
Rosenfeld, R. 2000 Incorporating linguistic structure into statistical language models. Phil. Trans.
R. Soc. Lond. A 358, 13111324. (doi:10.1098/rsta.2000.0588)
Samson, R. 1992 The reinterpretation of the Pictish symbols. J. Brit. Arch. Assoc. 145, 2965.
Sanders, I. J. 1960 English baronies. Oxford, UK: Oxford University Press.
Shannon, C. E. 1993a A mathematical theory of communication. In Claude Shannon collected
papers (eds N. J. A. Sloane & A. D. Wyner), pp. 583. Piscataway, NJ: IEEE Press.
Shannon, C. E. 1993b Prediction and entropy of printed English. In Claude Shannon collected
papers (eds N. J. A. Asloane & A. D. Wyner), pp. 194208. Piscataway, NJ: IEEE Press.
Stone, M. 1974 Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc.
B 36, 11147.
Thomas, C. 19911992 The early Christian inscriptions of southern Scotland. Glasgow Archaeol.
J. 17, 110.
UDHR 2008 The Universal Declaration of Human Rights. Library of translations, Ofce of the
High Commissioner for Human Rights. See http://www.unhchr.org/.
Wainwright, F. T., Feachem, R. W., Jackson, K. H., Pigott, S. & Stevenson, R. B. K. 1955 Problem
of the Picts. Perth, WA: Melven Press. (Reprinted by Melven Press 1980.)
Warnow, T. 1997 Mathematical approaches to comparative linguistics. Proc. Natl Acad. Sci. USA
94, 65856590. (doi:10.1073/pnas.94.13.6585)
Yaglom, A. M. & Yaglom, I. M. 1983 Probability and information, pp. 44100 [transl. V. K. Jain].
Dordrecht, The Netherlands: D. Reidal Publishing Co.
Zauzich, K.-T. 2004 Discovering Egyptian hieroglyphs [transl. A. M. Roth]. London, UK: Thames
and Hudson.
Proc. R. Soc. A