The SAGE Handbook of
Online Research Methods
Online Research Methods
and Social Theory
Contributors: Nigel Fielding & Raymond M. Lee & Grant Blank
Print Pub. Date: 2008
Online Pub. Date:
Print ISBN: 9781412922937
Online ISBN: 9780857020055
DOI: 10.4135/9780857020055
Print pages: 537-550
This PDF has been generated from SAGE Research Methods. Please note that the
pagination of the online version will vary from the pagination of the print book.
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
10.4135/9780857020055.n29
Online Research Methods and Social
Theory
Grant Blank
ABSTRACT
[p. 537 ↓ ] The enormous growth in online activities has created all sorts of
new opportunities for research. These opportunities are theoretical as well as
methodological. The theoretical opportunities have been present in prior chapters but
never emphasized; this chapter brings theory into focus without losing sight of methods.
Specifically, the chapter discusses the explanatory power of theory based on online
methodologies to address important social issues. Using this goal it describes three
themes common to the preceding chapters: the volume of data, additional computer
resources, and the ‘qualitative analysis bottleneck’. Each theme presents problems as
well as opportunities, and the goal of this chapter is to explore how methods and theory
work together to define and mitigate the problems as well as to exploit the opportunities.
INTRODUCTION
The link between methods and theory has a history almost as long as modern science
itself. It begins at the dawn of empirical science, over 350 years ago. One of the
earliest scientific communities was formed around Robert Boyle, leading a group of
experimentalists who were exploring the relationships between pressure, temperature,
and volume. Their primary technology was a vacuum pump that they called an ‘air
pump. Their findings were codified into what we now know as ‘Boyle s Laws. (Much of
the following discussion is drawn from Shapin and Shaffer (1985) and Zaret (1989))
Page 2 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
These early scientists are interesting not just because they developed some of the
earliest experimental research methods using the high technology of the day, and
not just for their exploration of the relations between pressure, temperature, and
volume, but also because their work is intimately linked to social theory. The mid-1600s
in England was a period of political turmoil: there was the English Civil War, the
Regicide (1649), the creation of the Republic (1649-53), Oliver Cromwell's Protectorate
(1653-58), and the Restoration of the Stuart monarchy in 1660. As they attempted
to understand this period of bitter political, social, and religious conflict many English
people came to the conclusion that the fundamental source of social conflict was
differing views of religious truth.
[p. 538 ↓ ] The implication of this assumption was that when everyone believed in the
same religion there would be an end to these extraordinary conflicts.
The Restoration of the Monarchy in 1660 had restored a central political authority, but
it did not dampen religious strife. As a result, the 1660s were marked by a new crisis of
civil authority. The issue was the role of religious belief, particularly the Protestant and
Puritan emphasis on an individual's personal religious beliefs. This created a problem
that became a major source of tension and conflict.
The problem was that it is very difficult to settle disputes when everyone relies on their
own personal vision of truth. Under such circumstances, how can anyone determine
whose personal vision is fairer? More just? Or, in any sense, better? In fact, when
people believe that their highly individualized versions of truth are the only correct
version then political compromise and accommodation become very difficult. Thoughtful
English people saw society and politics splitting into a large number of semi-hostile
groups, each suspiciously defending its personal vision of the truth. This was not
attractive, for it looked as though the jealous incompatibility of these visions might make
a cohesive society with normal politics impossible.
In this social environment the experimental scientists offered an alternative vision of
community. This community claimed to have created an understanding of conflict and
social unity that stood in stark contrast to the disorder plaguing English society. Their
signal achievement was that they were able to settle disputes and achieve consensus
Page 3 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
without resorting to violence and without powerful individuals imposing their beliefs on
others.
In their view, facts were a social creation. They were made when the community
freely assented. Stable agreement was won because experimentalists organized
themselves into a defined and bounded society that excluded those who did not
accept the fundamentals of good order. Consensus agreement on facts was an
accomplishment of that community. It was not imposed by an external authority.
Facts themselves were uncovered by experiments, attested to by competent
observers. When there was a disagreement it could be settled by appeal to facts
made experimentally manifest and confirmed by competent witnesses from within the
community. The early experimentalists presented themselves as a godly community, as
‘priests of nature’. Robert Boyle suggested that experimental trials be carried out only
on Sunday.
This did not imply consensus was always easily reached. Indeed, Hooke's vehement
disagreements with Newton and others anticipated a long line of hostile quarrels among
scientists. This is another respect in which the early experimentalists formed something
that looks like science.
Despite their internal disagreements, despite the inevitable tensions of ego and
competition for status, in the context of strife-ridden, post-Civil War society the model
of a community committed to joint discovery of facts was an attractive alternative. It
contributed to the political support required to set up the early institutions that supported
and fostered the development of science: the Royal Society and its journal.
The point is that there is a fundamental link between social theory and research
methods embedded in the culture of science from the very beginning. The link continues
in the online research methods described in this volume. In addition to the symbolic
and thematic link, a dramatic unity based on location also ties this book to the early
experimentalists: in March 2007 the authors of this volume participated in a conference
organized by Roger Burrows under the auspices of the ESRC e-Society Programme
held in London at the Royal Society. While most of this book is concerned with the
methodological issues raised by online research, this chapter will focus on the interface
Page 4 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
between methods and theory. It is clear that the new technologies of the online world
are creating new opportunities for substantive theory, just as they are creating new
opportunities for methodology. The new theoretical opportunities come in part from the
new social forms and new communities being created by online technologies. They also
come from the fact that online research can offer a novel perspective that casts new
light on older, pre-existing social forms. This chapter does not attempt to develop new
theory, and it is not a ‘theory chapter’; instead, the question I try to answer is: how do
online research methods relate to theory?
[p. 539 ↓ ] A volume on methods leads naturally to discussion of a particular kind of
theory. This is not the ‘grand theory of Marx, Weber, and Durkheim; instead, it is the
middle-range theory or substantive theory that is commonly used in conjunction with
standard methodological tools like statistical hypothesis testing. This sort of theory has
several relationships to methods. Theoretical concepts are operationalized in scales
or indices, in survey questions, by describing attributes of people and organizations,
or by coding qualitative data into appropriate categories. Relations between concepts
are described by hypotheses, which often form the basis for inferential tests. Related
hypotheses can be collected into theories that may be modeled with statistical or
mathematical methods. The theory and all its components remain fairly concretely tied
to empirical data and to measurement. The pay-off from the use of this kind of theory
is often a clearer understanding of contemporary social problems or issues. Thus this
chapter discusses the explanatory power of theory based on online methodologies to
address important social issues.
In the course of this task I draw together many common methodological themes from
prior chapters. This is a personal reading of these chapters, and no one should infer
that my opinions are shared by the authors themselves or by other editors. I found
that the papers in this volume each attempt to deal with new opportunities offered by
gathering data online, while suggesting ways to cope with special problems posed by
online research. I generally draw on online examples with some comparison to offline
work. I found three common themes. Each theme reflects attempts to deal with new
problems or opportunities in online work. They are:
Page 5 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
VOLUME OF DATA
For researchers used to gathering data in the offline world, one of the striking
characteristics of online research is the sheer volume of data. The quantity of data
comes in two forms. First, people leave electronic traces everywhere, as Ted Welser,
Marc Smith, Danyel Fisher, and Eric Gleave point out (this volume). In cashless
financial transactions, communication via e-mail, text message, or instant message,
voice telephone records, medical records, or interactions with official government
agencies, many aspects of people s lives are captured and electronically recorded. For
anyone accustomed to the painful cost of collecting data offline, the extent and easy
availability of electronic data is breathtaking. For reasons of declining cost and ease of
accessibility we are seeing rapid increases in electronic record gathering and storage.
These trends are likely to continue. These factors alone are likely to encourage much
more use of online data in future research.
Most of this is unobtrusive, as Dietmar Janetzko's chapter on non-reactive data
collection says, in the sense that people are recorded as they go about their ordinary
lives. They are not explicit research subjects, nor do they think of themselves as
being part of a research project. Yet, any data that are archived can be incorporated
into a research project. There are many research possibilities here, but there are
subtle, often serious problems. Most of these electronic records are collected for
administrative purposes and they share the problems of paper records. Their content
reflects the narrow administrative purposes for which they were collected, the needs
and convenience of bureaucrats, not the demands of research methods or theory.
[p. 540 ↓ ] Furthermore, many of these data are proprietary. Corporations collect them
for their economic purposes. Private companies are usually unwilling to create publicuse datasets. They are usually unwilling to supply the data to researchers because
of privacy and competitive fears. Once they give data to a researcher it is out of their
control. The data could be mined for important competitive information if it fell into their
competitors’ hands. Therefore, giving proprietary data to a researcher requires a major
leap of faith and trust, with no likely business benefit. It isn't likely to happen easily or
often. An example of proprietary data used for research is Marc Sanford's (2008) retail
scanner data. After fourteen months of persuasion, requiring what Sanford describes
Page 6 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
as ‘countless hours on the phone’ and signing several legal agreements designed to
limit the use of the data, and ensure security and confidentiality, Sanford was given over
750 million records. An example of public e-mail is the Enron data (see Klimt and Yang,
2004; Culotta et al., 2004), discussed in chapters by Janetzko and by Eynon et al. It
consists of about 200,000 e-mails exchanged between 151 top executives. They were
released during the court cases that followed the Enron accounting fraud. These are
exceptions that prove the rule.
Even if the data are made available for research, there is yet another problem. The
core problem with these data is the lack of a link to substantive theory. They typically
do not contain key variables that social scientists incorporate into their theories. Race,
for example, is often not included in financial records. Religion and ethnicity are rarely
available. Websites, blogs, listservs, Usenet, and online games contain only the data
that users think important, which is inconsistent from person to person. With many
interesting variables unavailable, people are, at best, thinly described. Because of these
problems many forms of electronic record are very difficult for researchers to use.
Sometimes this problem can be addressed by combining electronically collected
data with other data, adding race, ethnicity, crime rates, or whatever substantive
variables are missing. Geographic data can often be merged with other geographically
coded data, such as census tract data or police crime statistics. This is how Sanford
(2008) added theory variables to his retail scanner data. In almost all cases such
datasets need time-consuming, highly skilled work to put them into a condition where
interesting substantive problems can be addressed. They often need to be aggregated
or disaggregated to theoretically meaningful levels or to be matched with other data.
This requires serious data management skills, which are usually not taught as part
of graduate training and are, in fact, rare among social scientists. There will be social
science uses for some of this new data, but they depend on creative, imaginative
thought to make them workable. A notable example is Marc Smith's NetScan work
collecting Usenet mail headers (see the Welser et al. chapter). This example is in many
respects typical of other attempts to create useful social data and theory from electronic
traces. We will return below to some of the issues raised by this example.
The volume of data has a second form: online researchers who can collect their own
data have it easy. Internet and online surveys are cheap and fast compared to the
Page 7 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
offline alternatives. Responses can be automatically checked against other answers
and stored directly in a dataset, ready for analysis. Indeed, simple analyses, such as
descriptive statistics and frequencies, can be automatically produced. This can largely
eliminate the time-consuming, difficult, and costly steps of data cleaning and data
input. Four chapters discuss this: Vehovar and Manfreda's Overview of online surveys,
Fricker's Sampling Methods for Web and Email Surveys, Best and Krueger's Internet
Survey Design, and Kaczmirek's Internet Survey Software Tools. Sample sizes can be,
at least potentially, extremely large. Website click-through data can have sample sizes
of over 100 million cases (remember, that is the sample!). As the Hindmarsh chapter
on Distributed Video Analysis points out, cheap cameras and inexpensive disk storage
increase the feasibility of video recordings. The resulting data can contain records of
individuals almost unprecedented in their details.
[p. 541 ↓ ] The low cost of online data collection and the possibility for using video
have been widely noticed. Less widely remarked is the fact that the low cost and easy
access to subjects also applies to ethnographic research. Six chapters describe the
implications of online research for various forms of qualitative data collection: Hine's
overview chapter on Virtual Ethnography, O'Connor et al.'s chapter on Internet-based
Interviewing, Wakeford and Cohen's chapter on Using Blogs for Research, Gaiser's
chapter on Online Focus Groups, the Schroeder and Bailenson chapter on Multiuser
Virtual Environments, and the Hindmarsh chapter. Enormous amounts of qualitative
data can be collected very quickly. For blogs, listservs, Usenet, or e-mail the electronic
form is the only form in which the data exist. These data do not need to be converted
to electronic form by transcription and this eliminates major costs, time delays, and
sources of error. The new wealth of data opens a real opportunity for all kinds of
innovative research.
There is no ‘free lunch in online data collection. The price of simple, low-cost access
to subjects is a set of complicated, difficult ethical questions. Ethics are discussed
in several chapters, but they are directly addressed in the Eynon et al. s chapter on
the Ethics of Internet Research. There are two primary protections for social science
research subjects: anonymity and informed consent, and under online conditions both
are more difficult to achieve. The same easy access to online data and the ease of
matching individual respondents to other datasets that make online data collection so
much simpler also make it much easier for someone to break anonymity and discover
Page 8 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
the identity of individual respondents. Files in all their versions are often preserved on
backups, which are often automatically created and not under control of the researcher.
Someone could obtain an early version of a file with identifiers still in place, possibly
without the researcher even knowing. An obvious situation where this could occur is
when a government agency is interested in a research project that collected data on
illegal actions, such as criminal behavior, drug use, illegal immigration, or terrorism.
Anonymity can also be compromised on the Internet. Anything sent across the Internet
will be stored on multiple servers between its origin and its destination, and can be
intercepted at any point. Emails and all attachments are preserved on the servers
and backup systems of both the sender and addressee, even if users delete them.
Backup copies of e-mails are often preserved for years due to legal requirements.
These facts mean that researchers cannot guarantee anonymity for anything sent
over the Internet and especially not for e-mail. My impression is that human subjects
protection committees have not always understood that privacy and the Internet is often
an oxymoron.
Informed consent is much more complicated in settings like listservs or role-playing
games, where subjects may never anticipate that their conversations are being
recorded for research purposes. Additional complexity is introduced by the fact that
many listservs and Usenet groups retain message archives. Since most listservs
include frequent messages where people write statements such as ‘that topic was
previously discussed, search the archives, the existence and location of the archives
is public knowledge. Often the archives can be searched and retrieved by any
interested person. So how can they be anything but public? If they are public, is
consent necessary? If consent is necessary, how can a subject give informed consent
concerning messages that may have been written years ago? How can the researcher
even find the authors to obtain their consent?
Two special online problems are worth highlighting here, one faced by qualitative
researchers and one for survey researchers. Authenticity is a special problem in online
qualitative research. Like the famous New Yorker cartoon that proclaimed ‘On the
Internet no one knows you are a dog, qualitative researchers can never be sure who
is responding. Researchers often have no way to check respondents’ claims of racial,
ethnic, or gender identity. This is a particular problem for research on role-playing
Page 9 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
games, where part of the attraction of the game is the possibility to experiment with
a different identity. Identifying respondents using passwords or IP addresses offers
no guarantee. Passwords can be given to friends. Anecdotally, I just counted and
I have current logon information (including passwords) for at least five friends on
various systems. The IP address of a computer only says which computer transmitted
a message; it does not identify the person using the computer. Since many computers
can be used by several people, identifying respondents is especially difficult online. The
difficulty of authenticating users is one major reason why online voting has never been
used (cf. Jefferson et al., 2004).
[p. 542 ↓ ] The paradox is that anonymity is more difficult to achieve online, but many
forms of faking and fraud are easier. People can be more easily identified, but they
can also hide their identity more easily. This problem is larger than the problem of
online identity. Most e-mail is spam, and most researchers will have to filter it out before
they analyze the rest. Fraudulent websites, phishing schemes, and other corrupt or
illegal activities are endemic. The ease of online data collection has its downside; it is
also easier for online subjects to manipulate the data collection process. For example,
they can send copies of e-mail questionnaires to friends (who can fill them out and
send them in), they can fill out multiple copies of online questionnaires, or they can lie
about significant attributes like race or gender. While there are solutions to all of these
problems, nonetheless they are harder to detect online. Online researchers have to
cope with the omnipresent possibility of fraud in a way that has few parallels in offline
research.
The special problem of online survey research is that there is no way to construct a
sampling frame. There is no online equivalent to random digit dialing. Therefore it is
not generally possible to select online respondents according to some randomized
process. Even if Internet usage reached saturation levels, for most populations the
sampling frame problem would remain intractable. An exception, as Fricker points out
in his chapter, is a survey of an organization where the organization has a complete list
of its members and everyone has e-mail addresses; but that is unusual. As discussed
in the chapters on Internet surveys, the solution to this problem is to use mixedmode research. Online and offline data collection can be combined in various ways to
overcome the lack of a random sample in online research, while still retaining many of
its advantages of low cost and easy administration.
Page 10 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
What will we do with all these data?
The signal characteristic that distinguishes online from offline data collection is the
enormous amount of data available online. As we think about our new-found wealth of
data I want to raise two skeptical questions, one for qualitative data and the second for
quantitative data.
The sources of massive amounts of qualitative data that are being and could be
collected include automated security cameras, credit card transactions, affinity card
transactions, as well as purposefully collected data like video and audio tapes. These
data promise a remarkably fine-grained, detailed picture of people in all kinds of social
situations. Given this fact, here is the place to raise the key question: So what? What
is the pay-off? Does the availability of this enormous volume of data promise soon-tocome advances in our understanding of society, politics, and culture, along with much
better theory? My best-guess answer to this question is ‘probably not’.
To see why, it is necessary to realize that detailed qualitative data is not new. Ecological
psychologists (e.g. Barker and Wright, 1951; 1954) collected it over 50 years ago.
Barker and his students created and published minute-by-minute records of the
activities of children from morning to night. We can reasonably ask, if there is going to
be a major payoff from data-intensive studies of social life, why haven't we heard more
about the ecological psychologists? Why aren't they more important? Why didn't people
pay attention?
[p. 543 ↓ ] One answer is that the theory Barker developed was exceptionally closely
tied to concrete social settings and situations. Barker studied under Kurt Lewin and he
was influenced by Lewin's theories of the importance of the environment in predicting
behavior. Barker himself argued that behavior was radically situated, meaning that
accurate predictions about behavior require detailed knowledge of the situation or
environment in which people find themselves. His work often consisted of recording how
expected behavior is situational: people act differently in different behavioral settings;
e.g., in their roles as students or teachers in school or as customers in a store. In
his theory of behavior settings Barker is fairly explicit that he believes broad, ‘grand
theories cannot usefully predict behavior. Since Barker focuses so tightly on behavior
Page 11 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
in a very local setting, his research doesn t generalize very well to other settings. This
is an implication built into the idea of behavioral settings and it is intentional. Ecological
psychology has sustained its intellectual ground as a school, but its results turn out to
be fairly limited in their applications. This is a disadvantage. Other researchers, looking
for theories that help them in their research, will not find rich sources of insightful ideas
that they can use. Since few other researchers found the ecological psychologists work
useful, it was not widely adopted.
A second reason for the lack of attention to ecological psychologists work is that
detailed observational data has built-in limitations. Observational data gives no access
to the internal states of people. Observers have no information on attitudes, emotions,
motives, or meaning. This is a serious limitation, because much human action depends
on meaning. The same actions can have multiple meanings and, for different people,
they can have completely different meanings. Without the ability to gain access to
meanings it is very hard to develop theory.
Of course, no researcher ever has complete access to internal states like emotions
or meaning. To a greater or lesser extent, meanings, motives, or other mental states
always have to be inferred. It is a matter of the degree to which meaning can be inferred
from particular kinds of data. The point here is that observational data supplies much
less direct access to internal states than other kinds of data.
The problems of fine-grained data raise a key question: under what circumstances
will detailed, fine-grained data about social life be useful? Several answers have been
developed to this question. Most answers describe the use of case studies as their
approach to data-intensive research. Burawoy (1991) developed the ‘Extended Case
Method’ as a way to use detailed case studies to identify weaknesses in existing theory,
and to extend and refine theory; for example, by describing subtypes of a phenomenon.
Ragin (1987) links case studies to the study of commonalities; comparable cases are
studied to construct a single composite portrait of the phenomenon. Case studies
are holistic and they emphasize causal complexity and conjunctural causation. This
use of cases is similar to that used by anthropological ethnography, where holistic
understanding is a central goal. What can we learn from these two examples? Both
emphasize the importance of case selection.
Page 12 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
Case selection is key. Flyvbjerg (2006) summarizes four case-selection strategies to
maximize the researcher s ability to understand the phenomenon (see also Ragin,
1992). Average or ordinary cases are not rich sources of information. Instead, extreme
or deviant cases may reveal more about the relevant actors and mechanisms. These
cases are chosen to emphasize a central aspect of a phenomenon. Often a theoretical
sampling strategy is used: choose cases as different as possible. If this ‘maximum
difference strategy is followed, then any commonalities discovered are much more
likely to be fundamental to the phenomenon rather than artifacts of a biased selection
of cases. Third, critical cases have special characteristics or properties that make
them unusually relevant to the problem. For example, a case can be chosen because
it seems most likely to disconfirm the hypothesis of interest. If the hypothesis is
confirmed for this case, then the researcher can argue that it is likely to be true in all
less critical cases. The researcher argues, ‘This case had the best chance of falsifying
my argument and it failed; hence, my argument must be true’. Finally, cases can be
selected because they form an exemplar. These cases form the basis for exemplary
research that shows how a particular paradigm (Kuhn, 1970) can be applied in a
concrete research setting. An example is Geertz's (1973) study of the ‘deep play’ of
the Balinese cockfight. The importance of case selection underlines that there is no
substitute for good research design. For fine-grained data to be useful, it must be
carefully chosen to illuminate issues of broader interest.
[p. 544 ↓ ] Research design is usually driven by a theoretical understanding of what
to investigate. Theory is important because it gives direction and focus to research. It
identifies important issues and categories. It suggests the kinds of research settings
and data that could speak to those issues. It suggests relevant related concepts to
investigate. This can be overstated; many theories are also vague and incomplete. In
practice there are limits to the ability of theory to guide empirical research. But within
these limits, theories play a major role in research design.
There are some research settings where individual cases are of great interest: the
French Revolution is one example. But under most circumstances, the cases are
more interesting as examples of a larger phenomenon. Here theory plays a key role:
theory connects cases. It is broader than individual cases and so it tells us which cases
are examples of the same event or situation and which cases are different. Powerful
theories can link disparate settings that seem to have little in common, and show how
Page 13 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
they are actually examples of the same phenomenon. For researchers the categories
and links between categories that make up theory supply conceptual tools to help them
think about their research, their research site(s), and their data. Theories are good to
think with. As noted above in the discussion of ecological psychology, one of the most
valuable pay-offs from theory is that theories developed in one setting may serve as
sources of creative ideas for researchers working in other settings.
The problem with ‘found’ data – data collected by other entities for their own purposes
and later used for research – is that the researcher has to take what is available. This
makes research design more difficult. Large amounts of data alone don't guarantee that
any useful research result will emerge. This problem exists in contemporary qualitative
research. It exists when researchers attempt to collect large quantities of observational
data, like using video, without other sources of data. The greatest strength of this work
– the possibility for rich, detailed observations of social action – is also its greatest
weakness. The rich observations can be too rich and they can be hard to transform into
theory. This will inevitably limit the value of video, audio, and similar sources of data.
A solution to this dilemma is to collect mixed-mode data. The weaknesses of pure
observational data can be overcome when the observations are supplemented by
theoretically informed interviews or questionnaires. These are more directly able to
address questions of meanings, emotions, or attitudes that give texture and significance
to social action. Many of the chapters discuss use of mixed-mode data. Mixed-mode
data has a number of advantages. First is that the weaknesses of each mode can
be offset by the strengths of another mode. Multiple sources of data convey a more
detailed, ‘richer’ understanding of the phenomenon, and each mode may (or may not)
validate the results obtained from other modes. Results validated by multiple modes
have enhanced confidence and credibility.
Online quantitative data have a related problem. I readily concede that Marc Smith's
NetScan dataset of 1.2 billion cases is far bigger than any dataset I've worked with.
Hoovering up all possible data is a strategy with valuable advantages. Collecting all this
data gives a great flexibility, including the ability to address questions that Smith never
thought of. Researchers are not interested in the whole 1.2 billion cases, but in various
subsets, including studies of change over time. It is impossible to know in advance
which subset will be useful. So Smith needed to collect the full dataset to be able to
Page 14 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
gain access to the appropriate subset. With a dataset this large, a researcher can select
random samples without replacement. A model can be created using one sample and
then validated or refined with an entirely separate sample (see Little and Schucking's
chapter on Data Mining for further discussion). If case studies yield dense, rich stories,
the dry, technical content of Usenet headers must be the opposite. It is a tribute to
Smith and his collaborators that they have shown how much interesting information is
available in such sparse data.
[p. 545 ↓ ] The costs of working with these data are significant. First, even with 1.2
billion cases the NetScan dataset has problems with missing data. Specifically, there is
an undercount problem. NetScan is not the whole Usenet. No one knows the extent of
the whole Usenet. Like the US decennial census, the size and scope of its undercount
problem is not clear. But it is not random.
Second, almost any research using the NetScan data requires major data management
work to create a dataset in a form that can be analyzed. The data management is
required because NetScan is stored in a form designed for flexible, efficient storage
and retrieval. Appropriate data have to be extracted from the existing tables and
combined or aggregated into a form that software used for statistical analysis or network
analysis will accept. This is time-consuming even for skilled people, and few social
scientists have data management skills. As noted above, this is a common problem
when attempting to use electronic traces for research purposes.
An alternative strategy complements the NetScan approach. It involves the use of
sampling with a wireless technology: beepers. Mihaly Csikszentmihalyi's ‘experience
sampling method (ESM) stands in sharp contrast to both qualitative and quantitative
‘collect everything research strategies. The ESM is a data-gathering technique where
subjects are given a beeper and a stack of questionnaires.
They are beeped at random times during the day. When beeped, they fill out one copy
of the questionnaire. The questionnaire not only asks what they are doing and how
they feel about it, but also what it means to them (Hektner et al. (2007) has details
and summarizes the entire research stream). This too has weaknesses, but they are
different weaknesses. They are mostly the well-known problems of questionnaire
research: response rates, sampling bias, reliability, validity, and others.
Page 15 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
The use of the ESM has a variety of advantages. The questionnaires give more direct
access to internal states; to attitudes, emotions, and meanings. Questionnaires can be
designed to ask about theoretically grounded empirical categories. Finally the data are
based on a random sample of times of the day. The point is, instead of spending the
time and money to collect everything, and having to spend more time deciding what
you really want, and then throwing away all the data you collected that you decided you
don t need, you can simply collect what you wanted to begin with. Samples are really
valuable; they are much faster and easier to collect and to analyze. You lose very little
by employing a sample. I think it is a reasonable methodological question to ask: under
what circumstances is there value in collecting more than a random sample? Why not
collect only the data you need in the first place?
I was born in Missouri, which is called the ‘show-me state. This nickname supposedly
comes from Missourians habit of asking people to ‘show me the evidence. There
is something to be said for this. In the context of online methodology, show me
the theoretical pay-off. It is hard to find a concept as striking or as influential as
Csikszentmihalyi's (1991) idea of flow, developed out of his studies using the ESM.
ADDITIONAL COMPUTATIONAL
RESOURCES
One methodological theme for the past few decades has been the continuing stream of
additional computer resources. We have experienced the remarkable impact of small
computers on methodology. We are now seeing the effects of electronic networks.
Some of these effects are described in Fernandes’ chapter on web- and Grid-based
middleware solutions to the problem of access to distributed data resources and in
the Crouchley and Allan chapter on Grid-based statistical analysis. Network analysis
methods are described in Hogan's chapter.
[p. 546 ↓ ] It is clear that additional computational resources have made possible all
sorts of new developments. Many types of statistical analysis, modeling, and graphics
require intensive computational resources. There has been a clear theoretical pay-off
from this work. Two examples will make this point clearer.
Page 16 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
First, rapid advances in network theory rely heavily on the availability of abundant
computational power. Social network theory is the single biggest theoretical and
empirical success story of the social sciences over the past decade. Much can be said
about it, but I wish to make one particular point here. If you know the history of research
into social networks, much of the groundwork was laid in the 1950s and 1960s; but
it ran into a crippling limit. The computational power needed to analyze data on real
networks simply didn't exist. This is a case where theoretical development was hindered
by lack of appropriate methodological tools (see Watts, 2003). Research continued,
but at a slower pace until major advances in computing began to have an impact. The
research pace has quickened in line with increases in available computing power.
That restriction has lifted, but it has not completely disappeared. Grid computing (see
the Crouchley and Allan chapter) promises to help alleviate this problem. Much of the
recent blossoming of work on social networks owes a great debt to rapid increases in
the power of computing.
Second, as early as 1966 the Coleman Report on Equality of Educational Opportunity
made pioneering social science use of multiple regression (Coleman et al., 1966).
Multiple regression allowed Coleman to compare the relative influence of the major
factors influencing student school achievement. Coleman found that the dominant
influence on school achievement was parents and the home environment that they
created. Other factors, like teacher training, dollars spent per pupil, and the quality of
the school facilities, had an impact, but even combined they were less important than
the influence of parents. This was an unexpected finding. It was a striking, controversial
conclusion at the time. After all, administrators want to believe that if they can only
raise teacher salaries, hire more teachers with advanced credentials, improve their
science labs and multimedia centers, or improve other inputs, then they will produce
a corresponding improvement in student learning outcomes. The multiple regressions
allowed Coleman and the researchers who followed to argue that student learning may
in fact improve, but not very much. Parents are overwhelmingly more important. This
conclusion is of continuing relevance today, over 40 years later. The Crouchley and
Allan chapter discusses the impact of contemporary policy initiatives on educational
achievement.
Page 17 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
The continuing increases in computer speed and storage capacity and the further
development of networks promise continuing theoretical pay-offs. Certainly network
theory will continue to benefit. Exactly which other theories will benefit is harder to
predict. In a general sense it is safe to say that there is a great deal of potential here,
but it is hard to say specifically where the potential will be realized.
THE ‘QUALITATIVE ANALYSIS
BOTTLENECK’
Although our ability to record social events has increased dramatically, our ability to
analyze the recorded data has not expanded nearly so fast. On one hand certain types
of analysis are much easier today. Statistical analysis, for example, has benefited.
Certainly one effect of the additional computational power is that many more models
can be examined and model diagnostics are easier. Statistics have always been a
way to summarize data. In general, the ideas of central tendency, spread, and other
statistical concepts can summarize a large dataset about as effectively as a small one.
So as datasets become larger the nature of the statistical summaries does not change.
Networks, the Grid, and data archives give researchers convenient access to statistics
and data that they could never use before. See Keith Cole et al. s chapter on archives
and secondary analysis.
[p. 547 ↓ ] In general, computer power makes possible much more thorough
explorations of data using statistical and also graphical techniques. Graphical analysis
and visualization has blossomed remarkably with the increase in computing power.
Computers draw all kinds of diagrams and plots so much faster than they can be drawn
by hand. The advances in online research methods have been almost wholly positive
for quantitative researchers.
Collection of qualitative data has always been extremely slow and difficult. As a result,
past qualitative researchers have typically worked with small amounts of data. They
worked intensively on their data. Many recent developments in qualitative research
have focused on improving data collection. Fielding and Lee s chapter on Qualitative eSocial Science/cyber-research highlights these advances. Because most online data
Page 18 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
– web pages, online role-playing games, e-mails, blogs, video, still images, graphics,
etc. – are readily available in electronic form, qualitative researchers have gone from
being data-poor to being overwhelmed by rich, new sources of data. Fielding and
Lee s chapter and Carmichael s chapter on Secondary Qualitative Analysis describe
developments in secondary analysis that further increase the availability of data.
There have also been advances in qualitative, non-statistical analysis. Brent s chapter
on Artificial Intelligence and Hindmarsh's chapter on Distributed Video Analysis point
to some of those advances. The process has been improved by the use of qualitative
analysis software like NVivo, Atlas/ti, or Qualrus.
The software adds reliability and speed. In spite of these advances in artificial
intelligence and in software, qualitative analysis has not changed much. As in the
offline world, online data must be coded into theoretically meaningful categories to
be analyzed. Coding has always been a time-consuming process requiring highly
skilled researchers or carefully trained assistants. Coding remains a labor-intensive,
agonizingly slow process.
Although Fielding and Lee s chapter points to interesting developments in qualitative
analysis, a central problem remains unsolved. In ordinary language people frequently
use synonyms to refer to the same object. To use a simple example, people refer to
fraternities not only by their formal name, but also by such names as the ‘Baker Street
house, the ‘boys on the hill, the ‘guys across from the Student Union, and more. For
this reason, meaning can only be derived from the context. Determining context and
meaning is not something that computers do well. This has been a crippling limit on the
use of automated categorizing of text or other qualitative data. Inexpensive, powerful,
small computers have had a much greater impact on statistical analysis than they have
on analysis of qualitative data.
Considerable work has been done on the automated processing of text. I am aware
of projects working with web pages, e-mail, blogs, and online versions of newspapers.
The most sophisticated work is proprietary, owned by corporations or governments.
Major statistical software companies have produced text mining software; for example,
SAS Institute s TextMiner and SPSS s LexiQuest products. The academic work is
based on qualitative analysis software, such as NVivo, MAXQDA, or Qualrus. There is
Page 19 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
also software specifically developed to support mixed-mode research, like QDA Miner.
Looking at these products it is clear that any sort of truly automated processing of text
is in the future, except in some highly restricted domains where controlled vocabularies
can be used.
However, there are situations where synonyms are not common. In those settings,
controlled vocabularies are feasible. Many corporations, for example, keep careful
track of any time one of their products or a competing product is mentioned in the
press. Since products are almost always mentioned by name – e.g. ‘Rice Krispies’ or
‘Prius’ – a controlled vocabulary is fairly obvious. It is then relatively straightforward
to write software that will automatically search the websites of major newspapers
and magazines and automatically download all articles that contain any word in the
vocabulary. Of course a person still has to read the article, but the software has
eliminated a major step that was a rote part of the process. Other situations where
controlled vocabularies are feasible include research involving names of people,
geographic locations, or events. These situations, however, remain a minor element in
the abundance of electronic qualitative data.
[p. 548 ↓ ] Qualitative analysts have mostly reacted to their new-found wealth of data
by ignoring it. They have used their new computerized analysis possibilities to do more
detailed analysis of the same (small) amount of data. Qualitative analysis has not really
come to terms with the fact that enormous amounts of qualitative data are now available
in electronic form. Analysis techniques have not been developed that would allow
researchers to take advantage of this fact.
The impact of qualitative online research has been weakened by this analysis
bottleneck. As more social scientists see the potential of the Internet for their own
research, awareness will grow of the disparity between the vast wealth of data and
the difficulty of analyzing even small parts of it. In analysis there is often a tradeoff
between simplicity versus depth. There is a continuum. At one end of the continuum,
researchers can conduct an intense and time-consuming data reduction effort to create
standardized variables that can be analyzed with statistical techniques. At the other
extreme, researchers can do an analysis that takes into account the full richness
of the data and they have to accept that this will allow thorough analysis of only a
small amount of data. Since only a few case studies can be analyzed, this effort is
Page 20 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
typically not generalizable to a population and the researcher must make the case for
its value on other grounds, as described in the discussion of case studies. Where in
this continuum a researcher will be located can't easily be determined in advance. It will
depend on the goals and needs of each individual research project.
The Pandora website (http://www.pandora.com) suggests one interesting solution
to the analysis bottleneck. Pandora is software that uses a mathematical formula to
characterize and identify music. Users pick artists or songs that they like. Pandora
uses the mathematical characteristics of the music to stream similar songs by similar
artists to the listener. This supplies listeners with music that they will like without them
having to make explicit choices. Discussion of music often focuses on musical tastes
categorized by genre, which is not a very fine-grained category system. Pandora
seems to be a reliable approach that promises more sensitive, fine-grained categories.
This seems to fit well with the extraordinary diversity of music available online. To my
knowledge no one is currently conducting research using an approach like Pandora,
but it seems like a fruitful area to explore. A creative research project might yield very
interesting results.
Of course, songs have a number of characteristics that they don't share with most
qualitative data. They are short, discrete, and they can be characterized using musical
attributes that have often been well defined for centuries. Nonetheless, the example
of Pandora points to a possible way to overcome the qualitative analysis bottleneck.
There is other work in this area, part of a research stream on score-matching algorithms
(e.g. Dannenberg and Raphael, 2006). If pattern recognition software similar to Pandora
could be written to characterize text, videos, or other qualitative data – even in limited
domains – then it may be possible to automate recognition of faces, objects, or actions,
in order to automate descriptions in fieldnotes. This software could scan video images
and automatically record its observations. This would be a major change and it would
make film much more attractive as a research tool.
[p. 549 ↓ ] The central value of such software is that it would dramatically speed up
coding of qualitative data.
Page 21 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory
Uni of Oxford (Bodleian Lib)
Copyright ©2013
SAGE Research Methods
THE ONLINE WORLD
The online world is new enough that we are currently exploring what it can do.
Reading these chapters there is a strong feeling of people struggling to understand its
capabilities, struggling to use the new tools that it makes available, and trying to take
advantage of its strengths; in short, trying to use it to help them solve their problems. It
is exciting to watch and I look forward to the future.
10.4135/9780857020055.n29
Page 22 of 22
The SAGE Handbook of Online Research Methods:
Online Research Methods and Social Theory