Artículo Especializado

An investigation of speech rhythm in London English
Eivind Nessa Torgersen a,*, Anita Szakay b,1

a
Sør-Trøndelag University College, HiST/ALT, NO-7004 Trondheim, Norway
b
University of British Columbia, UBC Department of Linguistics, 2613 West Mall, Vancouver, British Columbia, V6T 1Z4, Canada
ARTICLE INFO ABSTRACT
Article history: Recent work on London English has found innovation in inner city areas, most likely as the outcome of dialect
Received 30 March 2011 contact. These innovations are shared by speakers of different ethnic backgrounds, and have been identified
Received in revised form 8 October 2011 as features of Multicultural London English (MLE). This study examines whether syllable timing is a feature
Accepted 10 January 2012 Available of MLE, as work on rhythm shows that dialect and language contact may lead to varieties of English becoming
online 16 February 2012 more syllabletimed. Narratives as told by teenagers of different ethnic backgrounds and elderly speakers were
segmented by forced phonemic alignment and measurements of vocalic normalized Pairwise Variability Index
(nPVI), as an indicator of rhythmic patterns, were calculated. The results revealed that young speakers of non-
Keywords:
Anglo background were significantly more syllable-timed than young Anglo speakers and the inner-London
Speech rhythm
speakers were more syllable timed than the outer London speakers. Additionally, there was a correlation
Normalized Pairwise Variability Index
between articulation rate and nPVI for the non-Anglo speakers: speakers with a high vocalic articulation rate
Multicultural London English
Syllable timing were more syllable-timed. Changes in the duration of particular diphthongs and schwa may have influenced
Dialect contact the overall speech rhythm. The relatively low nPVI for all speaker groups may also indicate London’s status
Ethnicity as a center of linguistic innovation due to long-standing migration.
2012 Elsevier B.V. All rights reserved.
1. Introduction
London has for a long time been regarded as the center of linguistic innovation in England (Wells, 1982); it is the largest city in Europe with
high levels of immigration and high levels of dialect and language contact. Until recently there had been no large-scale sociolinguistic investigations
there taking into account the complex demographic and linguistic situation. However, over the last few years, two large research projects have
collected sociolinguistic data in London.2 Around 250 speakers have been interviewed and the orthographically transcribed corpora comprise three
million words. A number of different phonological, morpho-syntactic and discourse features have been investigated including diphthongs (Kerswill
et al., 2008), monophthongs and consonants (Cheshire et al., 2008, 2011), the indefinite article (Cheshire et al., 2011; Gabrielatos et al., 2010), past
tense BE (Cheshire and Fox, 2009; Cheshire et al., 2011), pragmatic markers (Torgersen et al.,
2011) and quotatives (Cheshire et al., 2011; Fox, 2012). All these studies point to one central finding: there is linguistic
* Corresponding author. Tel.: +47 73 55 97 90; fax: +47 73 55 90 51.

E-mail addresses: eivind.n.torgersen@hist.no (E.N. Torgersen), szakay@interchange.ubc.ca (A. Szakay).
1 Tel.: +1 604 822 0415; fax: +1 604 822 9687.
E.N. Torgersen, A. Szakay / Lingua 122 (2012) 822–840 823
2 Linguistic Innovators: The English of Adolescents in London (ESRC 000-23-0680) 2004–2007. PI Paul Kerswill, CI Jenny Cheshire. RAs Sue Fox and Eivind
Torgersen; Multicultural London English: the emergence, acquisition and diffusion of a new variety (ESRC 062-23-0814) 2007–2010. PI Paul Kerswill, CI Jenny Cheshire. RAs
Sue Fox, Arfaan Khan and Eivind Torgersen.
0024-3841/$ – see front matter 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.lingua.2012.01.004
[(ig._1)TD$FIG]
Fig. 1. London map with the locations Hackney and Havering highlighted.
innovation in inner city areas of London, most likely as the outcome of dialect and language contact (Cheshire et al., 2008, 2011; Kerswill et al.,
2008). Accelerated language change and linguistic innovation can be observed in high-contact areas due to people’s diverse friendship networks
(Cheshire et al., 2008; Kerswill and Williams, 2000). The set of different innovations form a new variety, or set of varieties which has been termed
Multicultural London English (MLE) (Cheshire et al., 2008, 2011; Kerswill et al., 2008). The innovations are shared by speakers of different ethnic
backgrounds, but the speakers who are leading the changes are of recent immigrant backgrounds; in a majority of cases they are the children of
first or second generation immigrants to the city. This article focuses on the data from the first project, Linguistic Innovators: The English of
Adolescents in London. The data for this study was collected in two localities: Hackney in the traditional East End and Havering to the east,
formerly in Essex. The localities are shown in Fig. 1.
Hackney and Havering were chosen as fieldwork sites because of differences in demography. Hackney has a large degree of dialect and language
contact due to higher levels of migration and a population consisting of people with different ethnic backgrounds: around 44% of the population is
white British, while the remainder has no particular ethnic group in the majority (2001 Census data).1 The traditional East End of London, of which
Hackney is part, has also been regarded as being in the lead in linguistic innovation in England (Wells, 1982). Havering is much more homogenous
in terms of ethnicity: 92% of the population is white British (2001 Census data).
Torgersen and Kerswill (2004) hypothesized that the outcome of phonological processes that were observed in south-east England, outside of
London itself, was the result of geographical diffusion (Trudgill, 1974, 1986) from central London. Applying the same idea to areas closer to the
center of London, such as Havering, we would expect to find London features there as well. Havering, together with other outer-London boroughs,
also had an increase in population after WW2 because of the resettling of a portion of the inner-city population due to slum clearances (Palmer,
2000). This meant that London features were expected in both Hackney and Havering, but perhaps with further developments and linguistic
innovation in Hackney due to its more complex demography. The young speakers included in the study were 100 college students aged 16–19 who
all had working-class backgrounds. Elderly working-class speakers aged 70 or above were also interviewed as a baseline for apparent-time changes
(Bailey, 2002). The speakers were born in London and had been living there all their lives. The young speakers were divided into two groups:
‘Anglos’ whose families had been living in the area for more than three generations and ‘non-Anglos’ who were born in London, but were the
children of parents with recent immigrant background. Speakers were also asked to name their closest friends and to state whether they were male
or female, whether or not they knew them from the college they attended, and their ethnic background, as the social network has an influence of
1
Data from http://www.statistics.gov.uk/census2001/access_results.asp. Accessed 25 March 2011.
E.N. Torgersen, A. Szakay / Lingua 122 (2012) 822–840
someone’s speech (Milroy and Milroy, 1985). Each speaker was then assigned a score of 1–5 depending on the ethnic makeup of the friendship
network:
824
1 = all friends same ethnicity as self

2 = up to 20% of a different ethnicity than self
3 = 21–40% of a different ethnicity than self

The network score analysis revealed that for Anglo adolescents in Hackney a score of 3 was the minimum, whereas for those in Havering, 3
was the maximum. When we examine the findings from the London studies, we do in fact see an effect of friendship network on some variables,
such as monophthongs (Cheshire et al., 2008). Anglo speakers in an ethnically diverse friendship group had realizations of GOOSE2 that were
intermediate between those of Anglo speakers in less diverse networks who had front/central realizations, and those of the non-Anglo speakers
who had heavily fronted realizations. These findings indicate that London English in Hackney is more influenced by dialect and language contact
than the London English spoken in Havering.
2. Suprasegmentals, language change and multicultural varieties
Compared to the amount of studies on segmental features, investigations into the role of suprasegmental features in language variation and
change are quite low in number. However, such studies that there are have found extra-linguistic effects on the realization of suprasegmentals.
Examples include Bauer’s (1994) investigation of change in Standard English stress placement and Hilton’s (2010) study of stress assignment in
loan words in Norwegian. Both studies link a particular stress assignment to social status and education. Analyses of intonation show that particular
prosodic contours may be linked to gender and style (e.g., Lowry, 2011), gender identity (e.g., Podesva, 2007) and regional varieties (e.g., Grabe
et al., 2000). Particular prosodic contours may also have specific social meanings (e.g., Podesva, 2011). A study of suprasegmentals and ethnicity,
the Intonational Variation in English (IViE) project (Grabe, 2004), investigated intonational contours produced by speakers of different varieties
of British English, including speakers of Afro-Caribbean background living in London and speakers of Punjabi background in Bradford, and
differences in realization between the varieties were observed. However, narrowing down the intonational features that may characterize specific
ethnicities is difficult because of the complex interplay between different acoustic variables (Thomas and Reaser, 2004:80). In fact, we do not know
much about the relationship between suprasegmentals and ethnicity – in particular whether specific realizations function as markers of speaker
ethnicity (Yaeger-Dror and Fagyal, 2010:127).
The studies on phonological variation and innovation in London English have only dealt with segmental features. Studies on other multiethnic
varieties in Europe, which have also discovered linguistic innovation probably with origins in language contact situations, such as Kiezdeutsch in
Berlin (Wiese, 2009) and Rinkebysvenska in Stockholm (Kotsinas, 1988), have mainly focused on grammatical features. However, studies of
multicultural varieties in Denmark (Quist, 2008) and Sweden (Bode´n, 2010) state that these varieties have a characteristic prosody and rhythm, in
that speakers of the multicultural varieties have a more monotonous and ‘staccato’ rhythm, i.e., the syllables are more equidistant in time than other
local varieties, although this has not yet been quantified. This poses the question of whether effects of dialect and language contact on speech
rhythm can be observed elsewhere and quantified in multicultural varieties.
If the changes in suprasegmentals in MLE mirror the developments in the segmental features, and if MLE behaves like other multicultural
varieties in terms of prosody, we should expect to find effects of dialect and language contact on speech rhythm also among the speakers of the
multicultural variety in Hackney; possibly a similar type of ‘staccato’ rhythm as the one that has been observed in Denmark and Sweden. In
addition, we should not be able to find this form of rhythm in the variety of London English spoken in Havering or at least if we do, it should be to
a much smaller degree.
3. Speech rhythm and the measurement of speech rhythm
Traditionally, languages were labeled as either syllable- or stress-timed in terms of speech rhythm (Abercrombie, 1967); that is, having an equal
time interval between individual syllables or between individually stressed syllables, respectively. It was later shown that speech rhythm in
languages could not be assigned to such discrete categories: Roach (1982) showed that it was possible to quantify languages in terms of speech
rhythm. In reality, there turned out to be a continuum from more syllable-timed to more stress-timed languages (Dauer, 1983; Miller, 1984).
Different metrics have been proposed to quantify speech rhythm. Some have focused on the phonological structure of syllables in different
languages while others have focused on differences in vowel phonology between those linguistic systems which have long and short, stressed and
unstressed vowels and those linguistic systems which do not have such differences. The rationale behind the metrics is that particular local structures
in speech can be quantified, e.g., by measuring their durations, and when these local structures are compared it gives an overall picture of the
rhythmic patterns. However, none of the metrics, even when applied in combination with each other, is able to successfully identify a language
solely on its rhythmic structure, although the metrics can provide groupings of different languages (Loukina et al., 2011). Ramus et al. (1999)
proposed a system taking into account a language’s phonotactics, viz. vowel quantity, syllable codas and consonant clusters as part of the whole
utterance and the intervals between vowels and consonants. Low et al. (2000) proposed a system looking at the relationships between pairs of units
2
We are using Wells’ (1982) lexical sets.
such as vowels in adjacent syllables and consonants in adjacent syllables. This is known as the Pairwise Variability Index (PVI). The idea is that a
syllable-timed language will have a more equal relationship in, say, duration between pairs of adjacent units than a stress-timed language. Although
there are several other acoustic influences on speech rhythm such as amplitude, ‘the vast majority of research using the PVI has calculated it for
duration alone’ and ‘[d]uration ... is a natural dimension to consider when seeking rhythm, since rhythm is intimately bound to time’ (Nolan and
Asu, 2009:68). The PVI has been used for calculating relationships between vowels and consonants. The formula was later modified to allow for
differences in articulation rate, as the duration of units is shorter when the articulation rate increases. Without this modification the pairwise
differences would ‘become larger in a way which will falsely imply increasing irregularity’ (Nolan and Asu, 2009:65). This version of the formula
is known as the normalized PVI (nPVI). Grabe and Low (2002), using the nPVI, examined a reading passage in several languages, including
Spanish, Mandarin, French, British English, Dutch and Thai. They found a continuum from syllable-timed to stress-timed languages: Spanish and
Mandarin had the lowest scores (under 30), while Dutch and Thai had the highest (around 65). French was located towards the syllabletimed end
(43.5), while English was towards the stress-timed end (57.2). Even though the nPVI has been widely used to investigate rhythmic patterns in a
number of languages, it can be criticized for merely classifying rather than giving a deeper insight into the rhythmic characteristics of different
languages, in particular how listeners perceive speech rhythm (Arvaniti, 2009; Kohler, 2009).
3.1. Studies on speech rhythm and dialect and language contact
There are few sociolinguistic studies of speech rhythm, and even fewer have investigated whether language and dialect contact has an effect on
rhythmic patterns. Some studies have examined varieties of English in contact with other varieties of English and with other languages.
Thomas and Carter (2006) investigated whether there was a difference between white American English and African American Vernacular
English (AAVE) speakers in terms of speech rhythm. They calculated nPVI in recordings of ex-slave AAVE speakers who were born before the
American civil war, speakers born between 1869 and 1961 and speakers born between 1961 and 1985, and compared the values to those of white
American English speakers born in the same periods. They also analyzed speakers of Jamaican and Hispanic English. There turned out to be no
difference between AAVE speakers and white American English speakers in the two youngest groups of speakers, with all groups of speakers
located towards the stress-timed end. However, the findings suggest greater differences between the oldest groups of white and AAVE speakers
born before the American civil war. This means that, over time, AAVE has become more like white American English in having a larger degree of
stress-timing: recordings with the ex-slaves, born before the American civil war, showed a greater degree of syllable-timing. The oldest AAVE
speakers had rhythmic patterns like Jamaican English, a variety which was found to be significantly more syllable-timed. In addition, contact
varieties such as Hispanic English appear to be more syllable-timed, consistent with the findings of other studies. The study clearly shows the effect
of dialect contact: where there is long-term contact, rhythmic patterns become more similar. Coggshall (2008) examined Cherokee American
English, Lumbee American English and white American English. Native American Englishes and white American English were different in term
of rhythmic structures with Native American English being more syllable timed, probably because the Lumbee and Cherokee languages are
syllable-timed. There was also an effect of age: young speakers of Lumbee American English were more syllable timed than the older speakers.
Coggshall argues in addition for an effect of identity on rhythmic patterns: over the last 40 years there has been an increased interest in Native
American culture and political awareness. Increased contact with other Native American tribes may have led to speakers with less syllable-timed
rhythm to become more syllable-timed. Szakay (2008) found that Maori New Zealand English was more syllable timed than Pakeha New Zealand
English (i.e., white New Zealand English). This is probably due to Maori being mora-timed, which is more similar to syllable- than stress-timed
rhythm. In mora-timing two successive syllables each containing a short vowel are rhythmically equivalent to a single syllable containing a long
vowel (Warren and Bauer, 2004). In Szakay (2008) young speakers of both ethnic varieties had a lower vocalic nPVI than older speakers, which
demonstrates increased degree of syllable-timed rhythm over time indicating possible long-term effects of language contact. There was also an
effect of ethnic integration: Maori New Zealand English speakers were more syllable-timed the more strongly they were integrated in Maori culture
and society. Newman (2010) calculated nPVI for Chinese and white New York English speakers and a Latino New York English speaker based on
a 60 second reading passage. He found that the Chinese speakers and the Latino speaker were more syllable-timed than the others, which is probably
due to the first language background of the speakers. Fagyal (2010) examined French and foreign accented French in a suburb of Paris; however,
all speakers were said to speak the same variety, had grown up in the same area and had the same socio-demographic background. It was
hypothesized that the speakers with African background, who in addition to French spoke languages that were typologically stress-timed, would
be less syllabletimed than the ethnic French (i.e., non-immigrant) speakers. This did not turn out to be the case: both groups were syllable-timed.
However, the ethnic French speakers had greater variability in vowel duration. Hansen and Pharao (2010) measured the duration of short and long
vowels in lists of words in Copenhagen Danish read by multicultural and ethnic Danish (i.e., non-immigrant) speakers. While the ethnic Danish
speakers had a significant difference in the
826
duration of short and long vowels preceding the stressed vowel in the word, the speakers of multicultural Copenhagen Danish lacked this difference.
The authors suggest that this ‘may contribute to the impression of an alternate rhythm in [Copenhagen] Multiethnolect compared to Copenhagen
[Danish]’ (2010:94).
We have seen that the largest effects on speech rhythm can be observed in contact varieties. That does not mean to say that there is no variation
in speech rhythm between, say, different accents of L1 British English, but such effects are small and are likely to be caused by speaking style and
the data that is selected for analysis (White and Mattys, 2007a). White and Mattys (2007b) found that the rhythmic patterns were a strong predictor
of how native-like a speaker’s L2 English sounded. Yuan (2008) had similar findings for speakers of French, German, Italian, Russian, and Spanish
L2 English speakers compared with L1 English speakers. The reason is most probably due to a reduction in the duration difference between stressed
full vowels and unstressed reduced vowels; the effect was greater for languages that do not distinguish between full and reduced vowels, e.g.,
Spanish more than German. There was also a relationship between L2 learning and acquisition of rhythmic structures in the L2. Diez et al. (2008)
similarly found that increased L2 proficiency led to more native-speaker-like vocalic durations, which were reflected in rhythmic patterns.
3.2. Vowel duration, articulation rate and speech rhythm in British English
Existing studies on rhythm in British English are small scale: they have analyzed a small number of speakers and have included a small amount
of data for each speaker. In addition, it has often been reading passages and not free speech that has been the basis of the analysis. Low et al. (2000)
examined two sets of ten read sentences: one with full and potentially reduced vowels, while the other set had no potentially reduced vowels. Ten
speakers of British English and ten speakers of Singapore English were included in the study. They found that the reduced vowels in Singapore
English had longer duration than the reduced vowels in British English. In addition, the reduced vowels in Singapore English were less centralized
in the F1/F2 plane (Low et al., 2000:389), i.e., a more peripheral or tense quality. Deterding (2001) examined free speech of six speakers of Standard
Southern British (SSB) (British expats living in Singapore) and compared it to
sixspeakersofSingaporeEnglish.Heanalyzedaround50utterancesofsixormoresyllablesper speaker which were taken from a conversation that
included a variety of topics. The Singapore English speakers were more syllable-timed than the SSB speakers. He argues that this is because
Singapore English has a smaller contrast between long and short vowels and there is less reduction of vowels in unstressed syllables. Vowel
durations are also more variable in British English, which is likely to be the main difference between the two varieties. Singapore English is
influenced by long-term contact with syllable-timed Chinese languages (Deterding, 2001). In line with this, White and Mattys (2007b) found that
varieties of British English which have a substrate influence (Welsh, Orkney and Shetland English) were more syllable-timed
thanSSB.BristolEnglishwas located between SSB and the substratevarieties intermsofrhythmicstructure,but the differences were very small and
may have been caused by the materials. To conclude, the main difference between British English and Singapore English (and other varieties such
as Jamaican English and Maori New Zealand English) appears to be caused by the latter varieties reducing unstressed vowels less (Nolan and Asu,
2009:66), which again leads to a more syllable-timed rhythm. Loukina and Kochanski (2010) examined whether the varieties included in IViE
(Grabe et al., 2000) could be identified based on therhythmicpatternsalone.Theyfoundabove-
chanceidentifications,butalsolargeindividualvariationbetweenthespeakers. Punjabi and Afro-Caribbean British English were usually distinguished
from the other varieties, but there were no distinct rhythm classes within the varieties included in IViE. When examining the duration of vocalic
and consonantal segments there were differences between groups of speakers with different ethnic backgrounds: the speakers of Punjabi and Afro-
Caribbean backgrounds have different patterns of variation from the speakers of other varieties of British English.
Differences in syllable duration in stress- and syllable-timed languages are largely due to vowel phonology (Hoequist, 1983). We have seen
that this locates English among the languages towards the stress-timed end of the rhythm continuum. Umeda (1975) has reported on American
English vowel duration in continuous speech, and found variation in terms of duration between the phonological context (voiceless and voiced
positions), between lexical and function words and between vowels in stressed and unstressed position. Gender differences have also been found.
Hillenbrand et al. (1995) examined vowel duration in American English, and found that female speakers have longer vowel duration than male
speakers. This was seen for all vowels. Differences in duration are also found between vowels in read and spontaneous speech and between regional
accents. Of course there is also individual variation, both between and within speakers. Any changes in the duration of vowels and other
phonological processes such as laxing of short monophthongs, tensing of schwa and monophthongization of diphthongs may influence rhythmic
patterns.
Clopper et al. (2005) investigated the duration of vowels in different accents of American English. They found that differences between accents
were significant, mainly between Southern and Northern/Midlands areas, but found no gender differences. This was confirmed by Jacewicz et al.
(2007) who observed that the female speakers had longer vowels overall than males, but the difference was not significant. Both studies found that
vowels were longer in Southern American English. In particular, PRICE is longer in North Carolina (South) than the two other (Northern) states
investigated. However, these studies did not take articulation rate into consideration. Jacewicz et al. (2009, 2010) examined articulation rate in
American English accents, and showed that speakers of Southern American English speak more slowly than Northern American English speakers
and that this is the reason for differences in vowel duration between the varieties. Male speakers have slightly faster articulation rates than women
for read speech, but no differences were found in conversational speech. This shows that differences in articulation rate between varieties of English
are small.
Are vowel duration differences between groups of speakers larger in contact varieties? Wassink (1999, 2001, 2006) examined vowel duration
in Jamaican English. Female speakers of both Jamaican Creole and Jamaican English had a greater difference in duration between pairs of long
and short vowels than male speakers of these varieties. This may be the reason for differences in rhythmic patterns observed in other studies:
Jamaican English (though the variety was not specified) was found to be more syllable-timed than American English (Thomas and Carter, 2006)
and British English (Grabe and Low, 2002).
We have seen that the reason for the more syllable-timed rhythm may be a reduction in the durational distinction between long and short vowels
(and between stressed and unstressed vowels). Varieties that are undergoing changes in the vowel system could also display changes in the rhythmic
patterns. Diphthongs in inner London are currently undergoing change: traditional variants with open onsets are being replaced by variants with
raised onsets, and some speakers have near-monophthongal qualities (Kerswill et al., 2008). We do not know whether these new variants are also
shorter in duration than the traditionally shifted diphthongs (Wells, 1982:306); if so, then a change in the duration of diphthongs in London English
may have had an effect on the rhythmic structures. If the durational differences between vowels are being leveled, we would also expect schwa to
have increased in duration with inner London to be in the lead: the duration of phonological schwa may be longer in Hackney than in Havering and
the non-Anglos may have a longer phonological schwa than the Anglos. This is similar to the process in Multicultural Copenhagen Danish where
short vowels are getting longer and long vowels shorter (Hansen and Pharao, 2010). Singapore English has less reduction of vowels in unstressed
syllables (Deterding, 2001; Low et al., 2000), which is part of the reason for a more syllabletimed rhythm than SSB.
A number of studies have also investigated the relationship between speech rhythm and articulation rate. Deterding (2001) has investigated
how articulation rate influenced speech rhythm in Singapore English and Dellwo (2010) has studied the relationship between speech rhythm and
articulation rate in German. Both found that fast speech was more syllabletimed than slow speech, due to a reduction in the durational difference
between syllables. Languages that have a faster average speaking rate, measured in number of syllables per second, are also among those that are
considered syllable-timed (Arvaniti, 2009:59). Szakay (2008) also found an interaction between speech rhythm and articulation rate in New Zealand
English: a faster articulation rate leads to a more syllable-timed rhythm. However, this was only found for Pakeha speakers (and not Maori
speakers).
Based on the findings discussed above, we hypothesized that MLE speakers would show suprasegmental innovations, having more syllable-
timed rhythm than has previously been reported for British English (Deterding, 2001; Grabe and Low, 2002; Low et al., 2000; White and Mattys,
2007b) which may correlate with a faster articulation rate (Dellwo, 2010; Deterding, 2001; Szakay, 2008). We will argue that the process is caused
by changes in the duration of particular vowels: changes in the duration of particular segments may be the reason for overall changes in rhythmic
patterns. These durational changes (an internal effect) are in turn caused by dialect and language contact (an external effect). A leveling of the
contrast between long and short vowels and stressed and unstressed vowels may in fact be a feature of contact varieties (or multicultural varieties).
This means that these varieties are more syllable-timed regardless of their typological background, as we have seen in the foreign accented French
in the suburb of Paris (Fagyal, 2010).
4. Methodology
4.1. Speakers
The study included young and old speakers from both Hackney (inner London) and Havering (outer London) who were all born in London and
had working-class backgrounds. The young Hackney speakers were 36 teenagers (aged 16–19, born 1985–1988) of Anglo and non-Anglo
backgrounds. As a baseline we also included 7 elderly speakers born between 1918 and 1938 and 4 speakers born between 1874 and 1892. The
latter speakers were taken from recordings made by Eva Sivertsen in Tower Hamlets (Bow) in the mid-1950s and formed the basis of her study on
Cockney phonology (Sivertsen, 1960). The young Anglo group was divided into speakers in a mainly Anglo friendship network (score 1–3) and a
mixed friendship network (score 4–5). The non-Anglo speakers all had a mixed friendship network (score 4–5). The Havering speakers were 14
young Anglo speakers, all in a mainly Anglo friendship network (score 1–3), and 7 elderly speakers. All interviews with the young speakers were
carried out by one of the RAs on the Linguistic Innovators project. In general, the interviews with the younger speakers have very little fieldworker
involvement. Most are group discussions between the participants. Some of the interviews with the older speakers, 5 in Hackney and 3 in Havering,
were carried out by another fieldworker. These interviews were either one-to-one or in pairs. Both fieldworkers were native speakers of south-
eastern British English.
4.2. Materials and measurements
It has been shown that it is the selection of data that has the largest effect on rhythm scores. This is a much stronger effect than that of the
measurer, such as inconsistencies in the measurements, and to a more limited degree individual variation within one variety (Wiget et al., 2010).
There are, in addition, effects of speaking style (Arvaniti, 2009). The suggested solution is having a dataset with randomly selected sampled
sentences or using natural spontaneous speech (Wiget et al., 2010:1566). It is also important to have comparable data, but still using continuous
natural speech data. Our solution was to
828
I[(ig._2)TD$FG]
Fig. 2. Segmentation of a soundfile into lexical and vocalic and consonantal elements and pauses. Tier 1 shows the segmental elements and tier 2 the lexical elements. IPA
transcription: ðezbafIfti7neIʃәnbɔIzәnәbLs.
examine personal narratives to control for variation in speech style. The narratives are about fights, police incidents and family issues and all
include a great deal of personal involvement. The duration of the narratives varied from 45 to 180 s for each speaker, which is in line with previous
research (Szakay, 2008). Narratives consist of relatively fluent speech and the other speakers do not prevent the fluency of speech, i.e., when they
produce speech these are short responses which facilitate the narrative. The effects of interlocutors on a particular speaker’s speech rhythm in
conversation are anyway quite small. Reed (2010), using Singapore English and British English data, found that it was only at the beginning of
turns that speakers were influenced by interlocutors’ speech rhythm.
The speech was segmented into consonantal and vocalic elements by forced phonemic alignment (Yuan and Liberman, 2008), which produces
a statistical best match between the acoustic structure of a sound file and the phonemic representation of the words in a corresponding text file. The
aligner was used together with the British English Example Pronunciation phonemic dictionary. 3 The output was a Praat textgrid with separate
phonemic and lexical tiers. Due to naturally occurring connected speech processes, all segmentation was checked manually and the duration of
each vocalic and consonantal segment and pauses was corrected as required by comparing the visual information on the spectrogram and the
auditory information. Fig. 2 shows a corrected segmentation into lexical and vocalic and consonantal segments together with the spectrogram and
waveform display. 16,586 vocalic elements were included in the analysis with an average of 237 vocalic elements per speaker (the range was 72–
661).
Pauses and hesitations, which in most cases were transcribed as a very long vowel, were excluded from the analysis. Diphthongs were treated
as one vowel. Adjacent vowels (following Thomas and Carter, 2006) were divided into separate elements where there was a noticeable change in
vowel quality. The beginning of vowels following stop consonants was set at the onset of voicing following the release of the stop burst. Glides
and liquids were treated as consonants, apart from vocalized /l/ following a vowel, which was analyzed with the vowel. Unlike Thomas and Carter
(2006), who did not include vowels in pre-pausal feet positions because of lengthening of these vowels (Klatt, 1976), we chose to include all vowels
in the analysis. This is in line with Grabe and Low (2002) and White and Mattys (2007a:507) who state that lengthening does not occur before all
pauses.
Measurements of normalized vocalic PVI, as an indicator of rhythmic patterns (Grabe and Low, 2002), were calculated using a Praat script.
Articulation rate for an individual speaker (Vrate) was defined as the number of vocalic elements divided by the duration of all vocalic elements in
the particular recording. Wilcoxon-tests and linear regression analysis were employed to test for statistical significance.
6. Discussion
We have documented that a more syllable-timed rhythm and possibly an increased vocalic articulation rate are features of MLE and, together
with the documented phonological, grammatical and pragmatic innovations (Cheshire et al., 2008, 2011; Fox, in press; Kerswill et al., 2008;
Gabrielatos et al., 2010; Torgersen et al., 2011), it shows that major language change can be evidenced in the English of Londoners. The relatively
low vocalic nPVI for all speaker groups in Hackney may confirm inner London’s status as a center of linguistic innovation and long-standing
migration and dialect contact: having more syllable-timed rhythm is a feature of contact varieties such as Singapore English (nPVI 52.3) (Grabe
and Low, 2002) and Maori New Zealand English (nPVI 48.1) (Szakay, 2008). The inner-London speakers are more syllable-timed due to language
and dialect contact in friendship groups. For (some) non-Anglo speakers there may also be effects of home languages other than English and other
varieties of English such as Jamaican English. What may be part of the cause of changes in the rhythmic patterns is a change in the duration of
particular segments. We have shown how FACE, and to some degree GOAT, have shorter durations in Hackney than in Havering. The non-Anglos
are in the lead in having near-monophthongal variants and also have a more syllable-timed rhythm than the other speakers. The most syllable-timed
rhythm was observed in Hackney and all speaker groups there were more syllable-timed than the speakers in Havering, which may indicate that
inner London English has been more syllable-timed for a long time. Hackney was also in the lead in reversal of diphthong shifting. Thus, there
may be a relationship between speech rhythm, the duration of particular vowels and the location of vowels in vowel space. Measurements of the
duration of schwa show effects of age, location and sex, but not ethnicity. Lengthening of schwa is likely to be a feature of the speech of young
males and the Hackney speakers are in the lead. Again, this points to inner London having more advanced forms than outer London.
The changes in the duration of some vowels have led to a leveling of durational differences. The outcome is a reduced difference in the duration
of vowels in successive syllables, which was seen in the reduced nPVI. This supports the view that speech rhythm as a phonological characteristic
in languages is gradient rather than categorical: particular vowels may change their durational characteristics which locates different languages and
language varieties on a scale where successive vowels are more or less equidistant in terms of duration. We have argued that the durational changes
we observed in diphthongs and schwa were caused by dialect and language contact. Why do we get these changes? Studies show that there are L2
transfer effects on speech rhythm (Yuan, 2010), which indicates that we may be observing a kind of prosodic interlanguage (Diez et al., 2008).
Indeed, Cheshire et al. (2011) argue that some linguistic processes that can be observed in inner London, where minority ethnic groups are in the
lead in language change, have features of second language acquisition. Speakers of immigrant backgrounds may acquire their English from speakers
who also have English as their second language. This has been labeled group second language acquisition (Winford, 2003) and the L2 speakers are
then the English language model for the other speakers of immigrant backgrounds. In fact, there are quite strong substrate effects on rhythmic
patterns (White and Mattys, 2007a, 2007b; Thomas and Carter, 2006; Szakay, 2008). White and Mattys (2007a) found that Spanish native speakers
had a vocalic nPVI of 36 in their L1 Spanish and 66 in their L2 English. English native speakers had a vocalic nPVI of 73 in their English and 51
in Spanish indicating that speakers find it hard to achieve native-like prosodies in their L2. However, Diez et al. (2008) also found that language
proficiency had an effect on rhythmic patterns: increased L2 proficiency leads to more native-like rhythmic patterns.
Multicultural varieties like MLE may then also display effects of language and dialect contact on suprasegmentals. We have already seen how
Multicultural Copenhagen Danish was different from Copenhagen Danish in the durational patterns of long and short vowels (Hansen and Pharao,
2010). We may have observed a related phenomenon in MLE, where some diphthongs and schwa have changed in duration. In turn, these changes
have led to changes in the rhythmic patterns. The results of the present study, combined with work on other varieties, reinforce the idea that the
tendency for English to become more syllable-timed is a global phenomenon fuelled by language and dialect contact. The work on London English
also shows how large corpora of conversational data can be used to increase our understanding of the effects of dialect and language contact on
language change.
3
Downloaded from http://www.speech.cs.cmu.edu/comp.speech/Section1/Lexical/beep.html. Accessed 13 March 2010.
Acknowledgements
The data was collected as part of the project Linguistic Innovators: The English of Adolescents in London funded by the ESRC (RES 000-23-
1680). We would like to thank Sue Fox, the guest editors and three anonymous reviewers for valuable comments on this article.
References
Abercrombie, D., 1967. Elements of General Phonetics. Edinburgh University Press, Edinburgh.
Arvaniti, A., 2009. Rhythm, timing and the timing of rhythm. Phonetica 66, 46–63.
Bailey, G., 2002. Real and apparent time. In: Chambers, J.K., Trudgill, P., Schilling-Estes, N. (Eds.), The Handbook of Language Variation and Change. Blackwell, Oxford, pp.
312–332.
Bauer, L., 1994. Watching English Change. Longman, Harlow.
Bode´n, P., 2010. Pronunciation in Swedish multiethnolect. In: Quist, P., Svendsen, B.A. (Eds.), Multilingual Urban Scandinavia. Multilingual Matters, Bristol, pp. 65–78.
Cheshire, J., Fox, S., 2009. Was/were variation: a perspective from London. Language Variation and Change 21, 1–38.
Cheshire, J., Fox, S., Kerswill, P., Torgersen, E., 2008. Ethnicity, friendship network and social practices as the motor of dialect change: linguistic innovation in London.
Sociolinguistica 22, 1–23.
Cheshire, J., Kerswill, P., Fox, S., Torgersen, E., 2011. Contact, the feature pool and the speech community: the emergence of Multicultural London English. Journal of
Sociolinguistics 15, 151–196.
Clopper, C.G., Pisoni, D.B., de Jong, K., 2005. Acoustic characteristics of the vowel systems of six regional varieties of American English. Journal of the Acoustical Society of
America 118, 1661–1676.
Coggshall, E.L., 2008. The prosodic rhythm of two varieties of native American English. University of Pennsylvania Working Papers in Linguistics 14 (2), 1–9. Dauer, R.M., 1983.
Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11, 51–62.
Dellwo, V., 2010. Influences of speech rate on the acoustic correlates of speech rhythm: an experimental phonetic study based on acoustic and perceptual evidence. PhD thesis,
University of Bonn.
Deterding, D., 2001. The measurement of rhythm: a comparison of Singapore and British English. Journal of Phonetics 29, 217–230.
Diez, F.G., Dellwo, V., Gevalda, N., Rosen, S., 2008. The development of measurable speech rhythm during second language acquisition. Journal of the Acoustical Society of
America 123, 3886.
Fagyal, Zs., 2010. Rhythm types and the speech of working-class youth in a banlieue of Paris: the role of vowel elision and devoicing. In: Preston, D.R., Niedzielski, N. (Eds.), A
Reader in Sociophonetics. Mouton de Gruyter, Berlin, pp. 91–132.
Fox, S., 2012. Performed narrative: the pragmatic function of this is + speaker and other quotatives in London adolescent speech. In: van Alphen, I., Buchstaller, I. (Eds.), Quotatives:
Cross-linguistic and Cross-disciplinary Perspectives. Benjamins, Amsterdam, pp. 231–258.
Gabrielatos, C., Torgersen, E., Hoffmann, S., Fox, S., 2010. A corpus-based sociolinguistic study of indefinite article forms in London English. Journal of English Linguistics 38,
297–334.
Grabe, E., 2004. Intonational variation in urban dialects of English spoken in the British Isles. In: Gilles, P., Peters, J. (Eds.), Regional Variation in Intonation. Niemeyer, Tu¨bingen,
pp. 9–31.
Grabe, E., Low, E.L., 2002. Durational variability in speech and the rhythm class hypothesis. In: Gussenhoven, C., Warner, N. (Eds.), Papers in Laboratory Phonology, vol. 7.
Mouton, Berlin, pp. 515–546.
Grabe, E., Post, B., Nolan, F., Farrar, K., 2000. Pitch accent realization in four varieties of British English. Journal of Phonetics 28, 161–185.
Hansen, G.F., Pharao, N., 2010. Prosody in the Copenhagen multiethnolect. In: Quist, P., Svendsen, B.A. (Eds.), Multilingual Urban Scandinavia. Multilingual Matters, Bristol, pp.
79–95.
Hillenbrand, J., Getty, L.A., Clark, M.J., Wheeler, K., 1995. Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America 97, 3099–3111.
Hilton, N.H., 2010. Regional dialect levelling and language standards: changes in the Hønefoss dialect. PhD thesis, University of York.
Hoequist, C., 1983. Syllable duration in stress-, syllable- and mora-timed languages. Phonetica 40, 203–237.
Jacewicz, E., Fox, R.A., Salmons, J., 2007. Vowel duration in three American English Dialects. American Speech 82, 367–385.
Jacewicz, E., Fox, R.A., O’Neill, C., Salmons, J., 2009. Articulation rate across dialect, age, and gender. Language Variation and Change 21, 233–256.
Jacewicz, E., Fox, R.A., Wei, L., 2010. Between-speaker and within-speaker variation in speech tempo of American English. Journal of the Acoustical Society of America 128,
839–850.

Artículo Especializado

Uploaded by

Copyright:

Available Formats

Artículo Especializado

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artículo Especializado

Uploaded by

Copyright:

Available Formats

An investigation of speech rhythm in London English

Eivind Nessa Torgersen a,*, Anita Szakay b,1

ARTICLE INFO ABSTRACT

* Corresponding author. Tel.: +47 73 55 97 90; fax: +47 73 55 90 51.

1 = all friends same ethnicity as self

3 = 21–40% of a different ethnicity than self

2. Suprasegmentals, language change and multicultural varieties

3. Speech rhythm and the measurement of speech rhythm

3.1. Studies on speech rhythm and dialect and language contact

4.2. Materials and measurements

You might also like