LANGUAGE, Linguistics and Languages of The World
LANGUAGE, Linguistics and Languages of The World
LANGUAGE, Linguistics and Languages of The World
LANGUAGE
I LANGUAGE
Language, the principal means used by human beings to communicate with one another.
Language is primarily spoken, although it can be transferred to other media, such as writing. If
the spoken means of communication is unavailable, as may be the case among the deaf, visual
means such as sign language can be used. A prominent characteristic of language is that the
relation between a linguistic sign and its meaning is arbitrary: There is no reason other than
convention among speakers of English that a dog should be called dog, and indeed other
languages have different names (for example, Spanish perro, Russian sobaka, Japanese inu).
Language can be used to discuss a wide range of topics, a characteristic that distinguishes it
from animal communication. The dances of honey bees, for example, can be used only to
communicate the location of food sources (see Honey Bee: Communication). While the
language-learning abilities of apes have surprised many—and there continues to be controversy
over the precise limits of these abilities—scientists and scholars generally agree that apes do
not progress beyond the linguistic abilities of a two-year-old child (see Communication:
Communication Among Animals).
II LINGUISTICS
Linguistics is the scientific study of language. Several of the subfields of linguistics that
will be discussed here are concerned with the major components of language: Phonetics is
concerned with the sounds of languages, phonology with the way sounds are used in individual
languages, morphology with the structure of words, syntax with the structure of phrases and
sentences, and semantics with the study of meaning. Another major subfield of linguistics,
pragmatics, studies the interaction between language and the contexts in which it is used.
Synchronic linguistics studies a language's form at a fixed time in history, past or present.
Diachronic, or historical, linguistics, on the other hand, investigates the way a language changes
over time. A number of linguistic fields study the relations between language and the subject
matter of related academic disciplines, such as sociolinguistics (sociology and language) and
psycholinguistics (psychology and language). In principle, applied linguistics is any application
of linguistic methods or results to solve problems related to language, but in practice it tends to
be restricted to second-language instruction.
2
LANGUAGES OF THE WORLD
Estimates of the number of languages spoken in the world today vary depending on where the
dividing line between language and dialect is drawn. For instance, linguists disagree over whether
Chinese should be considered a single language because of its speakers' shared cultural and literary
tradition, or whether it should be considered several different languages because of the mutual
unintelligibility of, for example, the Mandarin spoken in Beijing and the Cantonese spoken in Hong Kong
(see Chinese Language). If mutual intelligibility is the basic criterion, current estimates indicate that there
are about 6,000 languages spoken in the world today. However, many languages with a smaller number
of speakers are in danger of being replaced by languages with large numbers of speakers. In fact, some
scholars believe that perhaps 90 percent of the languages spoken in the 1990s will be extinct or doomed
to extinction by the end of the 21st century. The 10 most widely spoken languages, with approximate
numbers of native speakers, are as follows: Chinese, 1.2 billion; Arabic, 422 million; Hindi, 366 million;
English, 341 million; Spanish, 322 to 358 million; Bengali, 207 million; Portuguese, 176 million; Russian,
167 million; Japanese, 125 million; German, 100 million. If second-language speakers are included in
these figures, English is the second most widely spoken language, with 508 million speakers. See also
Indian Languages.
A Language Classification
Linguists classify languages using two main classification systems: typological and genetic. A
typological classification system organizes languages according to the similarities and differences in their
structures. Languages that share the same structure belong to the same type, while languages with
different structures belong to different types. For example, despite the great differences between the two
languages in other respects, Mandarin Chinese and English belong to the same type, grouped by word-
order typology. Both languages have a basic word order of subject-verb-object.
A genetic classification of languages divides them into families on the basis of their historical
development: A group of languages that descend historically from the same common ancestor form a
language family. For example, the Romance languages form a language family because they all
descended from the Latin language. Latin, in turn, belongs to a larger language family, Indo-European,
the ancestor language of which is called Proto-Indo-European. Some genetic groupings are universally
accepted. However, because documents attesting to the form of most ancestor languages, including
Proto-Indo-European, have not survived, much controversy surrounds the more wide-ranging genetic
groupings. A conservative survey of the world's language families follows.
3
B Indo-European Language Family
The Indo-European languages are the most widely spoken languages in Europe, and they also
extend into western and southern Asia. The family consists of a number of subfamilies or branches
(groups of languages that descended from a common ancestor, which in turn is a member of a larger
group of languages that descended from a common ancestor). Most of the people in northwestern
Europe speak Germanic languages, which include English, German, and Dutch as well as the
Scandinavian languages, such as Danish, Norwegian, and Swedish. The Celtic languages, such as
Welsh and Gaelic, once covered a large part of Europe but are now restricted to its western fringes. The
Romance languages, all descended from Latin, are the only survivors of a somewhat more extensive
family, Italic, which includes, in addition to Latin, a number of now extinct languages of Italy (see Italic
Languages). Languages of the Baltic and Slavic (Slavonic) branches are closely related. Only two of the
Baltic languages survive: Lithuanian and Latvian. The Slavic languages, which cover much of eastern
and central Europe, include Russian, Ukrainian, Polish, Czech, Serbo-Croatian, and Bulgarian. In the
Balkan Peninsula, two branches of Indo-European exist that each consist of a single language—namely
the Greek language and the Albanian language. Farther east, in Caucasia, the Armenian language
constitutes another single-language branch of Indo-European.
The other main surviving branch of the Indo-European family is Indo-Iranian (see Indo-Iranian
Languages). It has two subbranches, Iranian and Indo-Aryan (Indic). Iranian languages are spoken
mainly in southwestern Asia and include Persian, Pashto (spoken in Afghanistan), and Kurdish. Indo-
Aryan languages are spoken in the northern part of South Asia (Pakistan, northern India, Nepal, and
Bangladesh) and also in most of Sri Lanka (see Indian Languages). This branch includes Hindi-Urdu,
Bengali, Nepali, and Sinhalese (the language spoken by the majority of people in Sri Lanka). Historical
documents attest to other, now extinct, branches of Indo-European, such as the Anatolian languages,
which were once spoken in what is now Turkey and include the ancient Hittite language.
The Uralic languages constitute the other main language family of Europe. They are spoken
mostly in the northeastern part of the continent, spilling over into northwestern Asia; one language,
Hungarian, is spoken in central Europe. Most Uralic languages belong to the family's Finno-Ugric branch
(see Finno-Ugric Languages). This branch includes (in addition to Hungarian) Finnish, Estonian, and
Saami. Europe also has one language isolate (a language not known to be related to any other
language): Basque, which is spoken in the Pyrenees. At the boundary between southeastern Europe and
Asia lie the Caucasus Mountains. Since ancient times the region has contained a large number of
languages, including two groups of languages that have not been definitively related to any other
4
language families. The South Caucasian, or Kartvelian, languages are spoken in Georgia and include
the Georgian language. The North Caucasian languages fall into North-West Caucasian, North-Central
Caucasian, and North-East Caucasian subgroups. The genetic relation of North-West Caucasian to the
other subgroups is not universally agreed upon. The North-West Caucasian languages include Abkhaz,
the North-Central Caucasian languages include Chechen, and the North-East Caucasian languages
include the Avar language (see Caucasian Languages).
South Asia contains, in addition to the Indo-Aryan branch of Indo-European, two other large
language families. The Dravidian family is dominant in southern India and includes Tamil and Telugu.
The Munda languages represent the Austro-Asiatic language family in India and contain many
languages, each with relatively small numbers of speakers. The Austro-Asiatic family also spreads into
Southeast Asia, where it includes the Khmer (Cambodian) and Vietnamese languages (see Austro-
Asiatic Languages). South Asia contains at least one language isolate, Burushaski, spoken in a remote
part of northern Pakistan. See also Indian Languages.
A number of linguists believe that many of the languages of central, northern, and eastern Asia
form a single Altaic language family, although others consider Turkic, Tungusic, and Mongolic to be
separate, unrelated language families (see Altaic Languages). The Turkic languages include Turkish and
a number of languages of the former Union of Soviet Socialist Republics (USSR), such as Uzbek and
Tatar. The Tungusic languages are spoken mainly by small population groups in Siberia and Northeast
China. This family includes the nearly extinct Manchu language. The main language of the Mongolic
family is Mongolian. Some linguists also assign Korean and Japanese to the Altaic family, although
others regard these languages as isolates. In northern Asia there are a number of languages that appear
either to form small, independent families or to be language isolates, such as the Chukotko-Kamchatkan
language family of the Chukchi and Kamchatka peninsulas in the far east of Russia. These languages
are often referred to collectively as Paleo-Siberian (Paleo-Asiatic), but this is a geographic, not a genetic,
grouping.
The Sino-Tibetan language family covers not only most of China, but also much of the Himalayas
and parts of Southeast Asia (see Sino-Tibetan Languages). The family's major languages are Chinese,
Tibetan, and Burmese. The Tai languages constitute another important language family of Southeast
Asia. They are spoken in Thailand, Laos, and southern China and include the Thai language. The Miao-
Yao, or Hmong-Mien, languages are spoken in isolated areas of southern China and northern Southeast
Asia. The Austronesian languages, formerly called Malayo-Polynesian, cover the Malay Peninsula and
most islands to the southeast of Asia and are spoken as far west as Madagascar and throughout the
5
Pacific islands as far east as Easter Island. The Austronesian languages include Malay (called Bahasa
Malaysia in Malaysia, and Bahasa Indonesia in Indonesia), Javanese, Hawaiian, and Maori (the
language of the aboriginal people of New Zealand).
Although the inhabitants of some of the coastal areas and offshore islands of New Guinea speak
Austronesian languages, most of the main island's inhabitants, as well as some inhabitants of nearby
islands, speak languages unrelated to Austronesian. Linguists collectively refer to these languages as
Papuan languages, although this is a geographical term covering about 60 different language families.
The languages of Aboriginal Australians constitute another unrelated group, and it is debatable whether
all Australian languages form a single family (see Australia).
The languages of Africa may belong to as few as four families: Afro-Asiatic, Nilo-Saharan, Niger-
Congo, and Khoisan, although the genetic unity of Nilo-Saharan and Khoisan is still disputed (see
African Languages). Afro-Asiatic languages occupy most of North Africa and also large parts of
southwestern Asia. The family consists of several branches. The Semitic branch includes Arabic,
Hebrew, and many languages of Ethiopia and Eritrea, including Amharic, the dominant language of
Ethiopia (see Semitic Languages). The Chadic branch, spoken mainly in northern Nigeria and adjacent
areas, includes Hausa, one of the two most widely spoken languages of sub-Saharan Africa (the other
being Swahili). Other subfamilies of Afro-Asiatic are Berber, Cushitic, and the single-language branch
Egyptian, which contains the now-extinct language of the ancient Egyptians (see Egyptian Language;
Coptic Language).
The Niger-Congo family covers most of sub-Saharan Africa and includes such widely spoken
West African languages as Yoruba and Fulfulde, as well as the Bantu languages of eastern and southern
Africa, which include Swahili and Zulu. The Nilo-Saharan languages are spoken mainly in eastern Africa,
in an area between those covered by the Afro-Asiatic and the Niger-Congo languages. The best-known
Nilo-Saharan language is Masai, spoken by the Masai people in Kenya and Tanzania. The Khoisan
languages are spoken in the southwestern corner of Africa and include the Nama language (formerly
called Hottentot).
Most linguists separate the indigenous languages of the Americas into a large number of families
and isolates, while one linguist has proposed grouping these languages into just three superfamilies.
Nearly all specialists reject this proposal. Well-established families include Inuit-Aleut (Eskimaleut). The
family stretches from the eastern edge of Siberia to the Aleutian Islands, and across Alaska and northern
6
Canada to Greenland, where one variety of the Inuit language, Greenlandic, is an official language. The
Na-Dené languages, the main branch of which comprises the Athapaskan languages, occupies much of
northwestern North America. The Athapaskan languages also include, however, a group of languages in
the southwestern United States, one of which is Navajo. Languages of the Algonquian and Iroquoian
families constitute the major indigenous languages of northeastern North America, while the Siouan
family is one of the main families of central North America.
The Uto-Aztecan family extends from the southwestern United States into Central America and
includes Nahuatl, the language of the Aztec civilization and its modern descendants (see Aztec Empire).
The Mayan languages are spoken mainly in southern Mexico and Guatemala (see Maya). Major
language families of South America include Carib and Arawak in the north, and Macro-Gê and Tupian in
the east. Guaraní, recognized as a national language in Paraguay alongside the official language,
Spanish, is an important member of the Tupian family. In the Andes Mountains region, the dominant
indigenous languages are Quechua and Aymara; the genetic relation of these languages to each other
and to other languages remains controversial. See also Native American Languages.
Individual pidgin and creole languages pose a particular problem for genetic classification
because the vocabulary and grammar of each comes from different sources. Consequently, many
linguists do not try to classify them genetically. Pidgin and creole languages are found in many parts of
the world, but there are particular concentrations in the Caribbean, West Africa, and the islands of the
Indian Ocean and the South Pacific. English-based creoles such as Jamaican Creole and Guyanese
Creole, and French-based creoles such as Haitian Creole, can be found in the Caribbean. English-based
creoles are widespread in West Africa. About 10 percent of the population of Sierra Leone speaks Krio
as a native language, and an additional 85 percent speaks it as a second language. The creoles of the
Indian Ocean islands, such as Mauritius, are French-based. An English-based pidgin, Tok Pisin, is
spoken by more than 2 million people in Papua New Guinea, making it the most widely spoken auxiliary
language of that country. The inhabitants of Solomon Islands and Vanuatu speak similar varieties of Tok
Pisin, called Pijin and Bislama, respectively.
H International Languages
International languages include both existing languages that have become international means of
communication and languages artificially constructed to serve this purpose. The most famous and
widespread artificial international language is Esperanto; however, the most widespread international
languages are not artificial. In medieval Europe, Latin was the principal international language. Today,
English is used in more countries as an official language or as the main means of international
7
communication than any other language. French is the second most widely used language, largely due
to the substantial number of African countries with French as their official language. Other languages
have more restricted regional use, such as Spanish in Spain and Latin America, Arabic in the Middle
East, and Russian in the republics of the former USSR.
Languages continually undergo changes, although speakers of a language are usually unaware
of the changes as they are occurring. For instance, American English has an ongoing change whereby
the pronunciation difference between the words cot and caught is being lost. The changes become more
dramatic after longer periods of time. Modern English readers may require notes to understand fully the
writings of English playwright William Shakespeare, who wrote during the late 16th and early 17th
centuries. The English of 14th-century poet Geoffrey Chaucer differs so greatly from the modern
language that many readers prefer a translation into modern English. Learning to read the writings of
Alfred the Great, the 9th-century Saxon king, is comparable to acquiring a reading knowledge of
German.
A Sound Change
Historical change can affect all components of language. Sound change is the area of language
change that has received the most study. One of the major sound changes in the history of the English
language is the so-called Great Vowel Shift. This shift, which occurred during the 15th and 16th
centuries, affected the pronunciation of all English long vowels (vowels that have a comparatively long
sound duration). In Middle English, spoken from 1100 to 1500, the word house was pronounced with the
vowel sound of the modern English word boot, while boot was pronounced with the vowel sound of the
modern English boat. The change that affected the pronunciation of house also affected the vowels of
mouse, louse, and mouth. This illustrates an important principle of sound change: It tends to be regular—
that is, a particular sound change in a language tends to occur in the same way in all words.
The principle of the regularity of sound change has been particularly important to linguists when
comparing different languages for genetic relatedness. Linguists compare root words from the different
languages to see if they are similar enough to have once been the same word in a common ancestor
language. By establishing that the sound differences between similar root words are the result of regular
sound changes that occurred in the languages, linguists can support the conclusion that the different
languages descended from the same original language. For example, by comparing the Latin word pater
with its English translation, father, linguists might claim that the two languages are genetically related
because of certain similarities between the two words. Linguists could then hypothesize that the Latin p
had changed to f in English, and that the two words descended from the same original word. They could
8
search for other examples to strengthen this hypothesis, such as the Latin word piscis and its English
translation, fish, and the Latin pes and the English translation, foot. The sound change that relates f in
the Germanic languages to p in most other branches of Indo-European is a famous sound change called
Grimm's Law, named for German grammarian Jacob Grimm (see Grimm Brothers).
B Morphological Change
The morphology of a language can also change. An ongoing morphological change in English is
the loss of the distinction between the nominative, or subject, form who and the accusative, or object,
form whom. English speakers use both the who and whom forms for the object of a sentence, saying
both “Who did you see?” and “Whom did you see?” However, English speakers use only the form who
for a sentence's subject, as in “Who saw you?” Old English, the historical form of English spoken from
about 700 to about 1100, had a much more complex morphology than modern English. The modern
English word stone has only three additional forms: the genitive singular stone's, the plural stones, and
the genitive plural stones'. All three of these additional forms have the same pronunciation. In Old
English these forms were all different from one another: stan, stanes, stanas, and stana, respectively. In
addition, there was a dative singular form stane and a dative plural form stanum, used, for instance, after
certain prepositions, as in under stanum (under stones).
C Syntactic Change
Change can also affect syntax. In modern English, the basic word order is subject-verb-object, as
in the sentence “I know John.” The only other possible word order is object-subject-verb, as in “John I
know (but Mary I don't).” Old English, by contrast, allowed all possible word order permutations, including
subject-object-verb, as in Gif hie ænigne feld secan wolden, meaning “If they wished to seek any field,”
or literally “If they any field to seek wished.” The loss of word-order freedom is one of the main syntactic
changes that separates the modern English language from Old English.
The meanings of words can also change. In Middle English, the word nice usually had the
meaning “foolish,” and sometimes “shy,” but never the modern meaning “pleasant.” Change in the
meanings of words is known as semantic change and can be viewed as part of the more general
phenomenon of lexical change, or change in a language's vocabulary. Words not only can change their
meaning but also can become obsolete. For example, modern readers require a note to explain
9
Shakespeare's word hent (take hold of), which is no longer in use. In addition, new words can be
created, such as feedback.
While much change takes place in a given language without outside interference, many changes
can result from contact with other languages. Linguists use the terms borrowing and loan to refer to
instances in which one language takes something from another language. The most obvious cases of
borrowing are in vocabulary. English, for example, has borrowed a large part of its vocabulary from
French and Latin. Most of these borrowed words are somewhat more scholarly, as in the word human
(Latin humanus), because the commonly used words of any language are less likely to be lost or
replaced. However, some of the words borrowed into English are common, such as the French word
very, which replaced the native English word sore in such phrases as sore afraid, meaning “very
frightened.” The borrowing of such common words reflects the close contact that existed between the
English and the French in the period after the Norman Conquest of England in 1066.
Borrowing can affect not only vocabulary but also, in principle, all components of a language's
grammar. The English suffix -er, which is added to verbs to form nouns, as in the formation of baker from
bake, is ultimately a borrowing from the Latin suffix -arius. The suffix has been incorporated to such an
extent, however, that it is used with indigenous words, such as bake, as well as with Latin words. Syntax
also can be borrowed. For example, Amharic, a Semitic language of Ethiopia, has abandoned the usual
Semitic word-order pattern, verb-subject-object, and replaced it with the word order subject-object-verb,
borrowed from neighboring non-Semitic languages. Although in principle any component of language
can be borrowed, some components are much more susceptible to borrowing than others. Cultural
vocabulary is the most susceptible to borrowing, while morphology is the least susceptible.
F Reconstructing Languages
Linguistic reconstruction is the recovery of the stages of a language that existed prior to those
found in written documents. Using a number of languages that are genetically related, linguists try to
reconstruct at least certain aspects of the languages' common ancestor, called the protolanguage.
Linguists theorize that those features that are the same among the protolanguage's descendant
languages, or those features that differ but can be traced to a common origin, can be considered
features of the ancestor language. Nineteenth-century linguistic science made significant progress in
reconstructing the Proto-Indo-European language. While many details of this reconstruction remain
controversial, in general linguists have gained a good conception of Proto-Indo-European's phonology,
morphology, and vocabulary. However, due to the range of syntactic variation among Proto-Indo-
European's descendant languages, linguists have found syntactic reconstruction more problematic.
10
NONORAL LANGUAGE
Language, although primarily oral, can also be represented in other media, such as
writing. Under certain circumstances, spoken language can be supplanted by other media, as in
sign language among the deaf (see Sign Language). Writing can be viewed in one sense as a
more permanent physical record of the spoken language. However, written and spoken
languages tend to diverge from one another, partly because of the difference in medium. In
spoken language, the structure of a message cannot be too complex because of the risk that
the listener will misunderstand the message. Since the communication is face-to-face, however,
the speaker has the opportunity to receive feedback from the listener and to clarify what the
listener does not understand. Sentence structures in written communication can be more
complex because readers can return to an earlier part of the text to clarify their understanding.
However, the writer usually does not have the opportunity to receive feedback from the reader
and to rework the text, so texts must be written with greater clarity. An example of this difference
between written and spoken language is found in languages that have only recently developed
written variants. In the written variants there is a rapid increase in the use of words such as
because and however in order to make explicit links between sentences—links that are normally
left implicit in spoken language.
Sign languages, which differ from signed versions of spoken languages, are the native
languages of most members of deaf communities. Linguists have only recently begun to
appreciate the levels of complexity and expressiveness found in sign languages. In particular,
as in oral languages, sign languages are generally arbitrary in their use of signs: In general, no
reason exists, other than convention, for a certain sign to have a particular meaning. Sign
languages also exhibit dual patterning, in which a small number of components combine to
produce the total range of signs, similar to the way in which letters combine to make words in
English. In addition, sign languages use complex syntax and can discuss the same wide range
of topics possible in spoken languages.
Body language refers to the conveying of messages through body movements other than those
movements that form a part of sign or spoken languages. Some gestures can have quite
specific meanings, such as those for saying good-bye or for asking someone to approach. Other
gestures more generally accompany speech, such as those used to emphasize a particular
point. Although there are cross-cultural similarities in body language, substantial differences
also exist both in the extent to which body language is used and in the interpretations given to
particular instances of body language. For example, the head gestures for “yes” and “no” used
in the Balkans seem inverted to other Europeans. Also, the physical distance kept between
participants in a conversation varies from culture to culture: A distance considered normal in one
culture can strike someone from another culture as aggressively close.
In certain circumstances, other media can be used to convey linguistic messages, particularly
when normal media are unavailable. For example, Morse code directly encodes a written
message, letter by letter, so that it can be transmitted by a medium that allows only two
values—traditionally, short and long signals or dots and dashes (see Morse Code,
International). Drums can be used to convey messages over distances beyond the human
voice's reach—a method known as drum talk. In some cases, such communication methods
serve the function of keeping a message secret from the uninitiated. This is often the case with
whistle speech, a form of communication in which whistling substitutes for regular speech,
usually used for communication over distances.