Collecting and Analysing Data
Collecting and Analysing Data
Collecting and Analysing Data
5
Collecting and analysing data
Learning objectives
In this unit, you will:
■ carry out systematic studies using relevant data for English Language studies (AO4)
■ develop the skills to analyse and synthesise language information from a variety of sources (AO5)
■ learn about the guidelines which govern how research is carried out in a fair and appropriate
manner (AO4)
■ apply these principles to research in English Language topics (AO4).
The Cambridge International AS & A Level course does not require you to carry out your
own research project, but it is important that you are aware of the standard research
techniques. This will allow you to better understand research papers that you read.
ACTIVITY 1
Discuss with a partner whether the following topics are suitable for A
Level English Language investigation. For any suitable topics, suggest
a method of investigation. Suggest why some topics are unsuitable.
For example, you might think that the topic is impractical to
investigate, or too general.
• analysis of one minute of a sporting commentary to assess what
techniques of unscripted discourse are used
• analysis of two front-page newspapers from non-English-speaking
areas of the world, to see the extent of English Language lexis
• comparison of the lyrics of two songs from different time periods
to assess syntax and lexical differences
• comparison of two pieces of travel writing from different
times/centuries to assess different language styles of writing
• recording two minutes of an infant’s speech at monthly intervals
from 18–24 months to assess language acquisition
• comparison of two cosmetic or household products, from different
time periods, aimed at women to assess contrasts in the language
of persuasion and any features of language and gender
• analysis of two Facebook posts – one male and one female – to
assess whether there are lexical and stylistic differences between
genders.
Research topics and data sources
This section outlines the research methods you are most likely to use for working with English Language
data.
Copies of spoken and written texts as they are used naturally are now stored electronically. This collection
of texts is known as a corpus and the information stored is corpus data. More information on the use of
corpus data is found in Section 7.
The following is a list of some of the most popular topic areas for English Language research studies.
• lexis: distinctive jargon, relevant to a particular topic (e.g. sporting commentaries or professions, e.g.
education)
• neologisms: new words / acronyms, particularly those used in social media and advertising (e.g. ‘lol’,
‘btw’, ‘404’, ‘tweet cred’)
• features of style in a particular text (e.g. rhetorical questions, metaphor, puns, modification from
adjectives and adverbs)
• syntax: a text’s composition regarding the length and structure of sentences as well as their types (e.g.
imperative, exclamative, interrogative)
• semantics: meanings associated with particular words or phrases which have generally accepted
associations (e.g. ‘home’ does mean a living place, but it also has associations of warmth, security and
belonging)
• the form and layout of the text (e.g. brochures, posters, speeches)
• unscripted discourse features including conversational features, accents and dialects, varieties of world
English, and language and gender
• tracking diachronic changes to word meanings and their usage.
KEY CONCEPT
Diversity
The diversity of English offers a rich opportunity for analysis,
comparison and exploration. Data relevant to English Language study
must be collected and processed according to ethical guidelines,
before it is analysed and presented in a systematic way. Discuss what
you understand by ethical guidelines and where they should be used
in the analysis of English Language data.
ACTIVITY 2
Sources of data
There is a wealth of written data from such sources as advertisements, brochures, leaflets, editorials, news
stories, articles, reviews, blogs, investigative journalism, letters, podcasts, (auto) biographies, children’s
books, diaries, essays, scripted speech and narrative/descriptive writing.
Spoken data is a very interesting source to investigate, and its recording and transcribing is essential for
careful analysis. The main categories are:
• real speech (e.g. friends talking; a teacher giving a lesson; an infant/child talking to friends or to adults)
• represented speech, such as a TV or film drama or a scripted speech
• media (e.g TV; film advertisements; news)
digital data where the boundaries between spoken and written language become blurred (e.g. social
• networking sites).
It is easy to gather much more spoken data than you actually need. Transcribing speech can be very time-
consuming and laborious, as you should write down not only every word, but all hesitations and pauses.
Just two minutes of discourse can require a lot of transcription time! If you are analysing how something is
said, rather than what is said, you may need to use phonetic spelling. When you are analysing a variety of
world English or a dialect, specialist books and online sources will teach you the symbols that match the
sounds.
ACTIVITY 3
Questionnaire design
Questionnaires are a set of questions, often, but not always, containing a choice of answers that a sample
of respondents will complete. The answers are then analysed for results.
Questionnaire design and asking people questions seems deceptively easy. But it is important to ensure
that the respondents understand the questions and complete them honestly and according to their views.
You will find a lot more information online about questionnaire design. The following points are given as
general guidelines:
• The questionnaire should be simple in design, polite and friendly. It should clearly explain the aims of
the survey.
• Early questions should engage the participants’ interest and should be straightforward.
• Important questions requiring thought and extended answers should be in the middle of the
questionnaire.
• Any questions likely to cause offence are to be avoided.
• Technical questions, if they are to be given to a non-specialist audience, are to be avoided.
• Open-ended questions, which require a lot of time to complete, should be kept to a minimum.
• ‘Loaded’ questions, which suggest the required answer to the respondents, are to be avoided.
ACTIVITY 4
ACTIVITY 5
Read the pilot survey questions a–e, then answer the questions which
follow.
a How much do you earn?
b Do you agree or disagree with the advertiser’s untruthful claim
that ‘women will be more beautiful’ after using their face
cream?
c How old are you?
d Do you agree that synthetic personalisation in language helps
media institutions reinforce their linguistic control over their
audience?
1 Why would these questions be inappropriate where the
respondents complete the survey without an interviewer?
2 Rephrase each question to be more appropriate or better phrased
for the respondents to answer.
Data analysis
Your research is likely to have data which can be measured in different ways, and specialist statistical
books and online tutorials will give additional information and help. The following is a list of the most likely
scales of measurement you will use:
1 Nominal: data gathered which is allocated to a particular category (e.g. ‘yes/no’; ‘number of virtuous
errors used’). (Virtuous errors are errors made by young children as they try to apply the regular rules of
the language they hear around them to irregular forms – e.g. they may say ‘runned’ instead of the
standard ‘ran’. See Unit 8.4).
2 Ordinal: data which can be ranked in order (e.g. results to show which second language people spoke,
where English is measured with other languages)
3 Interval: where the difference between data can be measured (e.g. temperature)
4 Ratio: similar to interval, but it must have a true zero (e.g. height)
Note: you are unlikely to need to use interval and ratio data in English Language studies.
KEY CONCEPT
Diversity
The diversity of English offers a rich opportunity for analysis which
must be carried out according to best practice and ethics. What
ethical issues might arise in an analysis of English Language data?
Much of the research carried out in English Language topics is done through corpus linguistics, but where
observations, such as children using language and the measurement of attitudes about language, are
being investigated, then the welfare of the participants must be respected.
ACTIVITY 6
Self-assessment checklist
Reflect on what you’ve learnt in this unit and indicate your confidence level between 1 and 5. If you
score below 3, revisit that section. Come back to this list later in your course. Has your confidence
grown?