Academia.eduAcademia.edu

The SAGE handbook of online research methods

2008

p. 537 ↓ ] The enormous growth in online activities has created all sorts of new opportunities for research. These opportunities are theoretical as well as methodological. The theoretical opportunities have been present in prior chapters but never emphasized; this chapter brings theory into focus without losing sight of methods. Specifically, the chapter discusses the explanatory power of theory based on online methodologies to address important social issues. Using this goal it describes three themes common to the preceding chapters: the volume of data, additional computer resources, and the 'qualitative analysis bottleneck'. Each theme presents problems as well as opportunities, and the goal of this chapter is to explore how methods and theory work together to define and mitigate the problems as well as to exploit the opportunities. Uni of Oxford (Bodleian Lib)

The SAGE Handbook of Online Research Methods Online Research Methods and Social Theory Contributors: Nigel Fielding & Raymond M. Lee & Grant Blank Print Pub. Date: 2008 Online Pub. Date: Print ISBN: 9781412922937 Online ISBN: 9780857020055 DOI: 10.4135/9780857020055 Print pages: 537-550 This PDF has been generated from SAGE Research Methods. Please note that the pagination of the online version will vary from the pagination of the print book. Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods 10.4135/9780857020055.n29 Online Research Methods and Social Theory Grant Blank ABSTRACT [p. 537 ↓ ] The enormous growth in online activities has created all sorts of new opportunities for research. These opportunities are theoretical as well as methodological. The theoretical opportunities have been present in prior chapters but never emphasized; this chapter brings theory into focus without losing sight of methods. Specifically, the chapter discusses the explanatory power of theory based on online methodologies to address important social issues. Using this goal it describes three themes common to the preceding chapters: the volume of data, additional computer resources, and the ‘qualitative analysis bottleneck’. Each theme presents problems as well as opportunities, and the goal of this chapter is to explore how methods and theory work together to define and mitigate the problems as well as to exploit the opportunities. INTRODUCTION The link between methods and theory has a history almost as long as modern science itself. It begins at the dawn of empirical science, over 350 years ago. One of the earliest scientific communities was formed around Robert Boyle, leading a group of experimentalists who were exploring the relationships between pressure, temperature, and volume. Their primary technology was a vacuum pump that they called an ‘air pump. Their findings were codified into what we now know as ‘Boyle s Laws. (Much of the following discussion is drawn from Shapin and Shaffer (1985) and Zaret (1989)) Page 2 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods These early scientists are interesting not just because they developed some of the earliest experimental research methods using the high technology of the day, and not just for their exploration of the relations between pressure, temperature, and volume, but also because their work is intimately linked to social theory. The mid-1600s in England was a period of political turmoil: there was the English Civil War, the Regicide (1649), the creation of the Republic (1649-53), Oliver Cromwell's Protectorate (1653-58), and the Restoration of the Stuart monarchy in 1660. As they attempted to understand this period of bitter political, social, and religious conflict many English people came to the conclusion that the fundamental source of social conflict was differing views of religious truth. [p. 538 ↓ ] The implication of this assumption was that when everyone believed in the same religion there would be an end to these extraordinary conflicts. The Restoration of the Monarchy in 1660 had restored a central political authority, but it did not dampen religious strife. As a result, the 1660s were marked by a new crisis of civil authority. The issue was the role of religious belief, particularly the Protestant and Puritan emphasis on an individual's personal religious beliefs. This created a problem that became a major source of tension and conflict. The problem was that it is very difficult to settle disputes when everyone relies on their own personal vision of truth. Under such circumstances, how can anyone determine whose personal vision is fairer? More just? Or, in any sense, better? In fact, when people believe that their highly individualized versions of truth are the only correct version then political compromise and accommodation become very difficult. Thoughtful English people saw society and politics splitting into a large number of semi-hostile groups, each suspiciously defending its personal vision of the truth. This was not attractive, for it looked as though the jealous incompatibility of these visions might make a cohesive society with normal politics impossible. In this social environment the experimental scientists offered an alternative vision of community. This community claimed to have created an understanding of conflict and social unity that stood in stark contrast to the disorder plaguing English society. Their signal achievement was that they were able to settle disputes and achieve consensus Page 3 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods without resorting to violence and without powerful individuals imposing their beliefs on others. In their view, facts were a social creation. They were made when the community freely assented. Stable agreement was won because experimentalists organized themselves into a defined and bounded society that excluded those who did not accept the fundamentals of good order. Consensus agreement on facts was an accomplishment of that community. It was not imposed by an external authority. Facts themselves were uncovered by experiments, attested to by competent observers. When there was a disagreement it could be settled by appeal to facts made experimentally manifest and confirmed by competent witnesses from within the community. The early experimentalists presented themselves as a godly community, as ‘priests of nature’. Robert Boyle suggested that experimental trials be carried out only on Sunday. This did not imply consensus was always easily reached. Indeed, Hooke's vehement disagreements with Newton and others anticipated a long line of hostile quarrels among scientists. This is another respect in which the early experimentalists formed something that looks like science. Despite their internal disagreements, despite the inevitable tensions of ego and competition for status, in the context of strife-ridden, post-Civil War society the model of a community committed to joint discovery of facts was an attractive alternative. It contributed to the political support required to set up the early institutions that supported and fostered the development of science: the Royal Society and its journal. The point is that there is a fundamental link between social theory and research methods embedded in the culture of science from the very beginning. The link continues in the online research methods described in this volume. In addition to the symbolic and thematic link, a dramatic unity based on location also ties this book to the early experimentalists: in March 2007 the authors of this volume participated in a conference organized by Roger Burrows under the auspices of the ESRC e-Society Programme held in London at the Royal Society. While most of this book is concerned with the methodological issues raised by online research, this chapter will focus on the interface Page 4 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods between methods and theory. It is clear that the new technologies of the online world are creating new opportunities for substantive theory, just as they are creating new opportunities for methodology. The new theoretical opportunities come in part from the new social forms and new communities being created by online technologies. They also come from the fact that online research can offer a novel perspective that casts new light on older, pre-existing social forms. This chapter does not attempt to develop new theory, and it is not a ‘theory chapter’; instead, the question I try to answer is: how do online research methods relate to theory? [p. 539 ↓ ] A volume on methods leads naturally to discussion of a particular kind of theory. This is not the ‘grand theory of Marx, Weber, and Durkheim; instead, it is the middle-range theory or substantive theory that is commonly used in conjunction with standard methodological tools like statistical hypothesis testing. This sort of theory has several relationships to methods. Theoretical concepts are operationalized in scales or indices, in survey questions, by describing attributes of people and organizations, or by coding qualitative data into appropriate categories. Relations between concepts are described by hypotheses, which often form the basis for inferential tests. Related hypotheses can be collected into theories that may be modeled with statistical or mathematical methods. The theory and all its components remain fairly concretely tied to empirical data and to measurement. The pay-off from the use of this kind of theory is often a clearer understanding of contemporary social problems or issues. Thus this chapter discusses the explanatory power of theory based on online methodologies to address important social issues. In the course of this task I draw together many common methodological themes from prior chapters. This is a personal reading of these chapters, and no one should infer that my opinions are shared by the authors themselves or by other editors. I found that the papers in this volume each attempt to deal with new opportunities offered by gathering data online, while suggesting ways to cope with special problems posed by online research. I generally draw on online examples with some comparison to offline work. I found three common themes. Each theme reflects attempts to deal with new problems or opportunities in online work. They are: Page 5 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods VOLUME OF DATA For researchers used to gathering data in the offline world, one of the striking characteristics of online research is the sheer volume of data. The quantity of data comes in two forms. First, people leave electronic traces everywhere, as Ted Welser, Marc Smith, Danyel Fisher, and Eric Gleave point out (this volume). In cashless financial transactions, communication via e-mail, text message, or instant message, voice telephone records, medical records, or interactions with official government agencies, many aspects of people s lives are captured and electronically recorded. For anyone accustomed to the painful cost of collecting data offline, the extent and easy availability of electronic data is breathtaking. For reasons of declining cost and ease of accessibility we are seeing rapid increases in electronic record gathering and storage. These trends are likely to continue. These factors alone are likely to encourage much more use of online data in future research. Most of this is unobtrusive, as Dietmar Janetzko's chapter on non-reactive data collection says, in the sense that people are recorded as they go about their ordinary lives. They are not explicit research subjects, nor do they think of themselves as being part of a research project. Yet, any data that are archived can be incorporated into a research project. There are many research possibilities here, but there are subtle, often serious problems. Most of these electronic records are collected for administrative purposes and they share the problems of paper records. Their content reflects the narrow administrative purposes for which they were collected, the needs and convenience of bureaucrats, not the demands of research methods or theory. [p. 540 ↓ ] Furthermore, many of these data are proprietary. Corporations collect them for their economic purposes. Private companies are usually unwilling to create publicuse datasets. They are usually unwilling to supply the data to researchers because of privacy and competitive fears. Once they give data to a researcher it is out of their control. The data could be mined for important competitive information if it fell into their competitors’ hands. Therefore, giving proprietary data to a researcher requires a major leap of faith and trust, with no likely business benefit. It isn't likely to happen easily or often. An example of proprietary data used for research is Marc Sanford's (2008) retail scanner data. After fourteen months of persuasion, requiring what Sanford describes Page 6 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods as ‘countless hours on the phone’ and signing several legal agreements designed to limit the use of the data, and ensure security and confidentiality, Sanford was given over 750 million records. An example of public e-mail is the Enron data (see Klimt and Yang, 2004; Culotta et al., 2004), discussed in chapters by Janetzko and by Eynon et al. It consists of about 200,000 e-mails exchanged between 151 top executives. They were released during the court cases that followed the Enron accounting fraud. These are exceptions that prove the rule. Even if the data are made available for research, there is yet another problem. The core problem with these data is the lack of a link to substantive theory. They typically do not contain key variables that social scientists incorporate into their theories. Race, for example, is often not included in financial records. Religion and ethnicity are rarely available. Websites, blogs, listservs, Usenet, and online games contain only the data that users think important, which is inconsistent from person to person. With many interesting variables unavailable, people are, at best, thinly described. Because of these problems many forms of electronic record are very difficult for researchers to use. Sometimes this problem can be addressed by combining electronically collected data with other data, adding race, ethnicity, crime rates, or whatever substantive variables are missing. Geographic data can often be merged with other geographically coded data, such as census tract data or police crime statistics. This is how Sanford (2008) added theory variables to his retail scanner data. In almost all cases such datasets need time-consuming, highly skilled work to put them into a condition where interesting substantive problems can be addressed. They often need to be aggregated or disaggregated to theoretically meaningful levels or to be matched with other data. This requires serious data management skills, which are usually not taught as part of graduate training and are, in fact, rare among social scientists. There will be social science uses for some of this new data, but they depend on creative, imaginative thought to make them workable. A notable example is Marc Smith's NetScan work collecting Usenet mail headers (see the Welser et al. chapter). This example is in many respects typical of other attempts to create useful social data and theory from electronic traces. We will return below to some of the issues raised by this example. The volume of data has a second form: online researchers who can collect their own data have it easy. Internet and online surveys are cheap and fast compared to the Page 7 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods offline alternatives. Responses can be automatically checked against other answers and stored directly in a dataset, ready for analysis. Indeed, simple analyses, such as descriptive statistics and frequencies, can be automatically produced. This can largely eliminate the time-consuming, difficult, and costly steps of data cleaning and data input. Four chapters discuss this: Vehovar and Manfreda's Overview of online surveys, Fricker's Sampling Methods for Web and Email Surveys, Best and Krueger's Internet Survey Design, and Kaczmirek's Internet Survey Software Tools. Sample sizes can be, at least potentially, extremely large. Website click-through data can have sample sizes of over 100 million cases (remember, that is the sample!). As the Hindmarsh chapter on Distributed Video Analysis points out, cheap cameras and inexpensive disk storage increase the feasibility of video recordings. The resulting data can contain records of individuals almost unprecedented in their details. [p. 541 ↓ ] The low cost of online data collection and the possibility for using video have been widely noticed. Less widely remarked is the fact that the low cost and easy access to subjects also applies to ethnographic research. Six chapters describe the implications of online research for various forms of qualitative data collection: Hine's overview chapter on Virtual Ethnography, O'Connor et al.'s chapter on Internet-based Interviewing, Wakeford and Cohen's chapter on Using Blogs for Research, Gaiser's chapter on Online Focus Groups, the Schroeder and Bailenson chapter on Multiuser Virtual Environments, and the Hindmarsh chapter. Enormous amounts of qualitative data can be collected very quickly. For blogs, listservs, Usenet, or e-mail the electronic form is the only form in which the data exist. These data do not need to be converted to electronic form by transcription and this eliminates major costs, time delays, and sources of error. The new wealth of data opens a real opportunity for all kinds of innovative research. There is no ‘free lunch in online data collection. The price of simple, low-cost access to subjects is a set of complicated, difficult ethical questions. Ethics are discussed in several chapters, but they are directly addressed in the Eynon et al. s chapter on the Ethics of Internet Research. There are two primary protections for social science research subjects: anonymity and informed consent, and under online conditions both are more difficult to achieve. The same easy access to online data and the ease of matching individual respondents to other datasets that make online data collection so much simpler also make it much easier for someone to break anonymity and discover Page 8 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods the identity of individual respondents. Files in all their versions are often preserved on backups, which are often automatically created and not under control of the researcher. Someone could obtain an early version of a file with identifiers still in place, possibly without the researcher even knowing. An obvious situation where this could occur is when a government agency is interested in a research project that collected data on illegal actions, such as criminal behavior, drug use, illegal immigration, or terrorism. Anonymity can also be compromised on the Internet. Anything sent across the Internet will be stored on multiple servers between its origin and its destination, and can be intercepted at any point. Emails and all attachments are preserved on the servers and backup systems of both the sender and addressee, even if users delete them. Backup copies of e-mails are often preserved for years due to legal requirements. These facts mean that researchers cannot guarantee anonymity for anything sent over the Internet and especially not for e-mail. My impression is that human subjects protection committees have not always understood that privacy and the Internet is often an oxymoron. Informed consent is much more complicated in settings like listservs or role-playing games, where subjects may never anticipate that their conversations are being recorded for research purposes. Additional complexity is introduced by the fact that many listservs and Usenet groups retain message archives. Since most listservs include frequent messages where people write statements such as ‘that topic was previously discussed, search the archives, the existence and location of the archives is public knowledge. Often the archives can be searched and retrieved by any interested person. So how can they be anything but public? If they are public, is consent necessary? If consent is necessary, how can a subject give informed consent concerning messages that may have been written years ago? How can the researcher even find the authors to obtain their consent? Two special online problems are worth highlighting here, one faced by qualitative researchers and one for survey researchers. Authenticity is a special problem in online qualitative research. Like the famous New Yorker cartoon that proclaimed ‘On the Internet no one knows you are a dog, qualitative researchers can never be sure who is responding. Researchers often have no way to check respondents’ claims of racial, ethnic, or gender identity. This is a particular problem for research on role-playing Page 9 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods games, where part of the attraction of the game is the possibility to experiment with a different identity. Identifying respondents using passwords or IP addresses offers no guarantee. Passwords can be given to friends. Anecdotally, I just counted and I have current logon information (including passwords) for at least five friends on various systems. The IP address of a computer only says which computer transmitted a message; it does not identify the person using the computer. Since many computers can be used by several people, identifying respondents is especially difficult online. The difficulty of authenticating users is one major reason why online voting has never been used (cf. Jefferson et al., 2004). [p. 542 ↓ ] The paradox is that anonymity is more difficult to achieve online, but many forms of faking and fraud are easier. People can be more easily identified, but they can also hide their identity more easily. This problem is larger than the problem of online identity. Most e-mail is spam, and most researchers will have to filter it out before they analyze the rest. Fraudulent websites, phishing schemes, and other corrupt or illegal activities are endemic. The ease of online data collection has its downside; it is also easier for online subjects to manipulate the data collection process. For example, they can send copies of e-mail questionnaires to friends (who can fill them out and send them in), they can fill out multiple copies of online questionnaires, or they can lie about significant attributes like race or gender. While there are solutions to all of these problems, nonetheless they are harder to detect online. Online researchers have to cope with the omnipresent possibility of fraud in a way that has few parallels in offline research. The special problem of online survey research is that there is no way to construct a sampling frame. There is no online equivalent to random digit dialing. Therefore it is not generally possible to select online respondents according to some randomized process. Even if Internet usage reached saturation levels, for most populations the sampling frame problem would remain intractable. An exception, as Fricker points out in his chapter, is a survey of an organization where the organization has a complete list of its members and everyone has e-mail addresses; but that is unusual. As discussed in the chapters on Internet surveys, the solution to this problem is to use mixedmode research. Online and offline data collection can be combined in various ways to overcome the lack of a random sample in online research, while still retaining many of its advantages of low cost and easy administration. Page 10 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods What will we do with all these data? The signal characteristic that distinguishes online from offline data collection is the enormous amount of data available online. As we think about our new-found wealth of data I want to raise two skeptical questions, one for qualitative data and the second for quantitative data. The sources of massive amounts of qualitative data that are being and could be collected include automated security cameras, credit card transactions, affinity card transactions, as well as purposefully collected data like video and audio tapes. These data promise a remarkably fine-grained, detailed picture of people in all kinds of social situations. Given this fact, here is the place to raise the key question: So what? What is the pay-off? Does the availability of this enormous volume of data promise soon-tocome advances in our understanding of society, politics, and culture, along with much better theory? My best-guess answer to this question is ‘probably not’. To see why, it is necessary to realize that detailed qualitative data is not new. Ecological psychologists (e.g. Barker and Wright, 1951; 1954) collected it over 50 years ago. Barker and his students created and published minute-by-minute records of the activities of children from morning to night. We can reasonably ask, if there is going to be a major payoff from data-intensive studies of social life, why haven't we heard more about the ecological psychologists? Why aren't they more important? Why didn't people pay attention? [p. 543 ↓ ] One answer is that the theory Barker developed was exceptionally closely tied to concrete social settings and situations. Barker studied under Kurt Lewin and he was influenced by Lewin's theories of the importance of the environment in predicting behavior. Barker himself argued that behavior was radically situated, meaning that accurate predictions about behavior require detailed knowledge of the situation or environment in which people find themselves. His work often consisted of recording how expected behavior is situational: people act differently in different behavioral settings; e.g., in their roles as students or teachers in school or as customers in a store. In his theory of behavior settings Barker is fairly explicit that he believes broad, ‘grand theories cannot usefully predict behavior. Since Barker focuses so tightly on behavior Page 11 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods in a very local setting, his research doesn t generalize very well to other settings. This is an implication built into the idea of behavioral settings and it is intentional. Ecological psychology has sustained its intellectual ground as a school, but its results turn out to be fairly limited in their applications. This is a disadvantage. Other researchers, looking for theories that help them in their research, will not find rich sources of insightful ideas that they can use. Since few other researchers found the ecological psychologists work useful, it was not widely adopted. A second reason for the lack of attention to ecological psychologists work is that detailed observational data has built-in limitations. Observational data gives no access to the internal states of people. Observers have no information on attitudes, emotions, motives, or meaning. This is a serious limitation, because much human action depends on meaning. The same actions can have multiple meanings and, for different people, they can have completely different meanings. Without the ability to gain access to meanings it is very hard to develop theory. Of course, no researcher ever has complete access to internal states like emotions or meaning. To a greater or lesser extent, meanings, motives, or other mental states always have to be inferred. It is a matter of the degree to which meaning can be inferred from particular kinds of data. The point here is that observational data supplies much less direct access to internal states than other kinds of data. The problems of fine-grained data raise a key question: under what circumstances will detailed, fine-grained data about social life be useful? Several answers have been developed to this question. Most answers describe the use of case studies as their approach to data-intensive research. Burawoy (1991) developed the ‘Extended Case Method’ as a way to use detailed case studies to identify weaknesses in existing theory, and to extend and refine theory; for example, by describing subtypes of a phenomenon. Ragin (1987) links case studies to the study of commonalities; comparable cases are studied to construct a single composite portrait of the phenomenon. Case studies are holistic and they emphasize causal complexity and conjunctural causation. This use of cases is similar to that used by anthropological ethnography, where holistic understanding is a central goal. What can we learn from these two examples? Both emphasize the importance of case selection. Page 12 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods Case selection is key. Flyvbjerg (2006) summarizes four case-selection strategies to maximize the researcher s ability to understand the phenomenon (see also Ragin, 1992). Average or ordinary cases are not rich sources of information. Instead, extreme or deviant cases may reveal more about the relevant actors and mechanisms. These cases are chosen to emphasize a central aspect of a phenomenon. Often a theoretical sampling strategy is used: choose cases as different as possible. If this ‘maximum difference strategy is followed, then any commonalities discovered are much more likely to be fundamental to the phenomenon rather than artifacts of a biased selection of cases. Third, critical cases have special characteristics or properties that make them unusually relevant to the problem. For example, a case can be chosen because it seems most likely to disconfirm the hypothesis of interest. If the hypothesis is confirmed for this case, then the researcher can argue that it is likely to be true in all less critical cases. The researcher argues, ‘This case had the best chance of falsifying my argument and it failed; hence, my argument must be true’. Finally, cases can be selected because they form an exemplar. These cases form the basis for exemplary research that shows how a particular paradigm (Kuhn, 1970) can be applied in a concrete research setting. An example is Geertz's (1973) study of the ‘deep play’ of the Balinese cockfight. The importance of case selection underlines that there is no substitute for good research design. For fine-grained data to be useful, it must be carefully chosen to illuminate issues of broader interest. [p. 544 ↓ ] Research design is usually driven by a theoretical understanding of what to investigate. Theory is important because it gives direction and focus to research. It identifies important issues and categories. It suggests the kinds of research settings and data that could speak to those issues. It suggests relevant related concepts to investigate. This can be overstated; many theories are also vague and incomplete. In practice there are limits to the ability of theory to guide empirical research. But within these limits, theories play a major role in research design. There are some research settings where individual cases are of great interest: the French Revolution is one example. But under most circumstances, the cases are more interesting as examples of a larger phenomenon. Here theory plays a key role: theory connects cases. It is broader than individual cases and so it tells us which cases are examples of the same event or situation and which cases are different. Powerful theories can link disparate settings that seem to have little in common, and show how Page 13 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods they are actually examples of the same phenomenon. For researchers the categories and links between categories that make up theory supply conceptual tools to help them think about their research, their research site(s), and their data. Theories are good to think with. As noted above in the discussion of ecological psychology, one of the most valuable pay-offs from theory is that theories developed in one setting may serve as sources of creative ideas for researchers working in other settings. The problem with ‘found’ data – data collected by other entities for their own purposes and later used for research – is that the researcher has to take what is available. This makes research design more difficult. Large amounts of data alone don't guarantee that any useful research result will emerge. This problem exists in contemporary qualitative research. It exists when researchers attempt to collect large quantities of observational data, like using video, without other sources of data. The greatest strength of this work – the possibility for rich, detailed observations of social action – is also its greatest weakness. The rich observations can be too rich and they can be hard to transform into theory. This will inevitably limit the value of video, audio, and similar sources of data. A solution to this dilemma is to collect mixed-mode data. The weaknesses of pure observational data can be overcome when the observations are supplemented by theoretically informed interviews or questionnaires. These are more directly able to address questions of meanings, emotions, or attitudes that give texture and significance to social action. Many of the chapters discuss use of mixed-mode data. Mixed-mode data has a number of advantages. First is that the weaknesses of each mode can be offset by the strengths of another mode. Multiple sources of data convey a more detailed, ‘richer’ understanding of the phenomenon, and each mode may (or may not) validate the results obtained from other modes. Results validated by multiple modes have enhanced confidence and credibility. Online quantitative data have a related problem. I readily concede that Marc Smith's NetScan dataset of 1.2 billion cases is far bigger than any dataset I've worked with. Hoovering up all possible data is a strategy with valuable advantages. Collecting all this data gives a great flexibility, including the ability to address questions that Smith never thought of. Researchers are not interested in the whole 1.2 billion cases, but in various subsets, including studies of change over time. It is impossible to know in advance which subset will be useful. So Smith needed to collect the full dataset to be able to Page 14 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods gain access to the appropriate subset. With a dataset this large, a researcher can select random samples without replacement. A model can be created using one sample and then validated or refined with an entirely separate sample (see Little and Schucking's chapter on Data Mining for further discussion). If case studies yield dense, rich stories, the dry, technical content of Usenet headers must be the opposite. It is a tribute to Smith and his collaborators that they have shown how much interesting information is available in such sparse data. [p. 545 ↓ ] The costs of working with these data are significant. First, even with 1.2 billion cases the NetScan dataset has problems with missing data. Specifically, there is an undercount problem. NetScan is not the whole Usenet. No one knows the extent of the whole Usenet. Like the US decennial census, the size and scope of its undercount problem is not clear. But it is not random. Second, almost any research using the NetScan data requires major data management work to create a dataset in a form that can be analyzed. The data management is required because NetScan is stored in a form designed for flexible, efficient storage and retrieval. Appropriate data have to be extracted from the existing tables and combined or aggregated into a form that software used for statistical analysis or network analysis will accept. This is time-consuming even for skilled people, and few social scientists have data management skills. As noted above, this is a common problem when attempting to use electronic traces for research purposes. An alternative strategy complements the NetScan approach. It involves the use of sampling with a wireless technology: beepers. Mihaly Csikszentmihalyi's ‘experience sampling method (ESM) stands in sharp contrast to both qualitative and quantitative ‘collect everything research strategies. The ESM is a data-gathering technique where subjects are given a beeper and a stack of questionnaires. They are beeped at random times during the day. When beeped, they fill out one copy of the questionnaire. The questionnaire not only asks what they are doing and how they feel about it, but also what it means to them (Hektner et al. (2007) has details and summarizes the entire research stream). This too has weaknesses, but they are different weaknesses. They are mostly the well-known problems of questionnaire research: response rates, sampling bias, reliability, validity, and others. Page 15 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods The use of the ESM has a variety of advantages. The questionnaires give more direct access to internal states; to attitudes, emotions, and meanings. Questionnaires can be designed to ask about theoretically grounded empirical categories. Finally the data are based on a random sample of times of the day. The point is, instead of spending the time and money to collect everything, and having to spend more time deciding what you really want, and then throwing away all the data you collected that you decided you don t need, you can simply collect what you wanted to begin with. Samples are really valuable; they are much faster and easier to collect and to analyze. You lose very little by employing a sample. I think it is a reasonable methodological question to ask: under what circumstances is there value in collecting more than a random sample? Why not collect only the data you need in the first place? I was born in Missouri, which is called the ‘show-me state. This nickname supposedly comes from Missourians habit of asking people to ‘show me the evidence. There is something to be said for this. In the context of online methodology, show me the theoretical pay-off. It is hard to find a concept as striking or as influential as Csikszentmihalyi's (1991) idea of flow, developed out of his studies using the ESM. ADDITIONAL COMPUTATIONAL RESOURCES One methodological theme for the past few decades has been the continuing stream of additional computer resources. We have experienced the remarkable impact of small computers on methodology. We are now seeing the effects of electronic networks. Some of these effects are described in Fernandes’ chapter on web- and Grid-based middleware solutions to the problem of access to distributed data resources and in the Crouchley and Allan chapter on Grid-based statistical analysis. Network analysis methods are described in Hogan's chapter. [p. 546 ↓ ] It is clear that additional computational resources have made possible all sorts of new developments. Many types of statistical analysis, modeling, and graphics require intensive computational resources. There has been a clear theoretical pay-off from this work. Two examples will make this point clearer. Page 16 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods First, rapid advances in network theory rely heavily on the availability of abundant computational power. Social network theory is the single biggest theoretical and empirical success story of the social sciences over the past decade. Much can be said about it, but I wish to make one particular point here. If you know the history of research into social networks, much of the groundwork was laid in the 1950s and 1960s; but it ran into a crippling limit. The computational power needed to analyze data on real networks simply didn't exist. This is a case where theoretical development was hindered by lack of appropriate methodological tools (see Watts, 2003). Research continued, but at a slower pace until major advances in computing began to have an impact. The research pace has quickened in line with increases in available computing power. That restriction has lifted, but it has not completely disappeared. Grid computing (see the Crouchley and Allan chapter) promises to help alleviate this problem. Much of the recent blossoming of work on social networks owes a great debt to rapid increases in the power of computing. Second, as early as 1966 the Coleman Report on Equality of Educational Opportunity made pioneering social science use of multiple regression (Coleman et al., 1966). Multiple regression allowed Coleman to compare the relative influence of the major factors influencing student school achievement. Coleman found that the dominant influence on school achievement was parents and the home environment that they created. Other factors, like teacher training, dollars spent per pupil, and the quality of the school facilities, had an impact, but even combined they were less important than the influence of parents. This was an unexpected finding. It was a striking, controversial conclusion at the time. After all, administrators want to believe that if they can only raise teacher salaries, hire more teachers with advanced credentials, improve their science labs and multimedia centers, or improve other inputs, then they will produce a corresponding improvement in student learning outcomes. The multiple regressions allowed Coleman and the researchers who followed to argue that student learning may in fact improve, but not very much. Parents are overwhelmingly more important. This conclusion is of continuing relevance today, over 40 years later. The Crouchley and Allan chapter discusses the impact of contemporary policy initiatives on educational achievement. Page 17 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods The continuing increases in computer speed and storage capacity and the further development of networks promise continuing theoretical pay-offs. Certainly network theory will continue to benefit. Exactly which other theories will benefit is harder to predict. In a general sense it is safe to say that there is a great deal of potential here, but it is hard to say specifically where the potential will be realized. THE ‘QUALITATIVE ANALYSIS BOTTLENECK’ Although our ability to record social events has increased dramatically, our ability to analyze the recorded data has not expanded nearly so fast. On one hand certain types of analysis are much easier today. Statistical analysis, for example, has benefited. Certainly one effect of the additional computational power is that many more models can be examined and model diagnostics are easier. Statistics have always been a way to summarize data. In general, the ideas of central tendency, spread, and other statistical concepts can summarize a large dataset about as effectively as a small one. So as datasets become larger the nature of the statistical summaries does not change. Networks, the Grid, and data archives give researchers convenient access to statistics and data that they could never use before. See Keith Cole et al. s chapter on archives and secondary analysis. [p. 547 ↓ ] In general, computer power makes possible much more thorough explorations of data using statistical and also graphical techniques. Graphical analysis and visualization has blossomed remarkably with the increase in computing power. Computers draw all kinds of diagrams and plots so much faster than they can be drawn by hand. The advances in online research methods have been almost wholly positive for quantitative researchers. Collection of qualitative data has always been extremely slow and difficult. As a result, past qualitative researchers have typically worked with small amounts of data. They worked intensively on their data. Many recent developments in qualitative research have focused on improving data collection. Fielding and Lee s chapter on Qualitative eSocial Science/cyber-research highlights these advances. Because most online data Page 18 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods – web pages, online role-playing games, e-mails, blogs, video, still images, graphics, etc. – are readily available in electronic form, qualitative researchers have gone from being data-poor to being overwhelmed by rich, new sources of data. Fielding and Lee s chapter and Carmichael s chapter on Secondary Qualitative Analysis describe developments in secondary analysis that further increase the availability of data. There have also been advances in qualitative, non-statistical analysis. Brent s chapter on Artificial Intelligence and Hindmarsh's chapter on Distributed Video Analysis point to some of those advances. The process has been improved by the use of qualitative analysis software like NVivo, Atlas/ti, or Qualrus. The software adds reliability and speed. In spite of these advances in artificial intelligence and in software, qualitative analysis has not changed much. As in the offline world, online data must be coded into theoretically meaningful categories to be analyzed. Coding has always been a time-consuming process requiring highly skilled researchers or carefully trained assistants. Coding remains a labor-intensive, agonizingly slow process. Although Fielding and Lee s chapter points to interesting developments in qualitative analysis, a central problem remains unsolved. In ordinary language people frequently use synonyms to refer to the same object. To use a simple example, people refer to fraternities not only by their formal name, but also by such names as the ‘Baker Street house, the ‘boys on the hill, the ‘guys across from the Student Union, and more. For this reason, meaning can only be derived from the context. Determining context and meaning is not something that computers do well. This has been a crippling limit on the use of automated categorizing of text or other qualitative data. Inexpensive, powerful, small computers have had a much greater impact on statistical analysis than they have on analysis of qualitative data. Considerable work has been done on the automated processing of text. I am aware of projects working with web pages, e-mail, blogs, and online versions of newspapers. The most sophisticated work is proprietary, owned by corporations or governments. Major statistical software companies have produced text mining software; for example, SAS Institute s TextMiner and SPSS s LexiQuest products. The academic work is based on qualitative analysis software, such as NVivo, MAXQDA, or Qualrus. There is Page 19 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods also software specifically developed to support mixed-mode research, like QDA Miner. Looking at these products it is clear that any sort of truly automated processing of text is in the future, except in some highly restricted domains where controlled vocabularies can be used. However, there are situations where synonyms are not common. In those settings, controlled vocabularies are feasible. Many corporations, for example, keep careful track of any time one of their products or a competing product is mentioned in the press. Since products are almost always mentioned by name – e.g. ‘Rice Krispies’ or ‘Prius’ – a controlled vocabulary is fairly obvious. It is then relatively straightforward to write software that will automatically search the websites of major newspapers and magazines and automatically download all articles that contain any word in the vocabulary. Of course a person still has to read the article, but the software has eliminated a major step that was a rote part of the process. Other situations where controlled vocabularies are feasible include research involving names of people, geographic locations, or events. These situations, however, remain a minor element in the abundance of electronic qualitative data. [p. 548 ↓ ] Qualitative analysts have mostly reacted to their new-found wealth of data by ignoring it. They have used their new computerized analysis possibilities to do more detailed analysis of the same (small) amount of data. Qualitative analysis has not really come to terms with the fact that enormous amounts of qualitative data are now available in electronic form. Analysis techniques have not been developed that would allow researchers to take advantage of this fact. The impact of qualitative online research has been weakened by this analysis bottleneck. As more social scientists see the potential of the Internet for their own research, awareness will grow of the disparity between the vast wealth of data and the difficulty of analyzing even small parts of it. In analysis there is often a tradeoff between simplicity versus depth. There is a continuum. At one end of the continuum, researchers can conduct an intense and time-consuming data reduction effort to create standardized variables that can be analyzed with statistical techniques. At the other extreme, researchers can do an analysis that takes into account the full richness of the data and they have to accept that this will allow thorough analysis of only a small amount of data. Since only a few case studies can be analyzed, this effort is Page 20 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods typically not generalizable to a population and the researcher must make the case for its value on other grounds, as described in the discussion of case studies. Where in this continuum a researcher will be located can't easily be determined in advance. It will depend on the goals and needs of each individual research project. The Pandora website (http://www.pandora.com) suggests one interesting solution to the analysis bottleneck. Pandora is software that uses a mathematical formula to characterize and identify music. Users pick artists or songs that they like. Pandora uses the mathematical characteristics of the music to stream similar songs by similar artists to the listener. This supplies listeners with music that they will like without them having to make explicit choices. Discussion of music often focuses on musical tastes categorized by genre, which is not a very fine-grained category system. Pandora seems to be a reliable approach that promises more sensitive, fine-grained categories. This seems to fit well with the extraordinary diversity of music available online. To my knowledge no one is currently conducting research using an approach like Pandora, but it seems like a fruitful area to explore. A creative research project might yield very interesting results. Of course, songs have a number of characteristics that they don't share with most qualitative data. They are short, discrete, and they can be characterized using musical attributes that have often been well defined for centuries. Nonetheless, the example of Pandora points to a possible way to overcome the qualitative analysis bottleneck. There is other work in this area, part of a research stream on score-matching algorithms (e.g. Dannenberg and Raphael, 2006). If pattern recognition software similar to Pandora could be written to characterize text, videos, or other qualitative data – even in limited domains – then it may be possible to automate recognition of faces, objects, or actions, in order to automate descriptions in fieldnotes. This software could scan video images and automatically record its observations. This would be a major change and it would make film much more attractive as a research tool. [p. 549 ↓ ] The central value of such software is that it would dramatically speed up coding of qualitative data. Page 21 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory Uni of Oxford (Bodleian Lib) Copyright ©2013 SAGE Research Methods THE ONLINE WORLD The online world is new enough that we are currently exploring what it can do. Reading these chapters there is a strong feeling of people struggling to understand its capabilities, struggling to use the new tools that it makes available, and trying to take advantage of its strengths; in short, trying to use it to help them solve their problems. It is exciting to watch and I look forward to the future. 10.4135/9780857020055.n29 Page 22 of 22 The SAGE Handbook of Online Research Methods: Online Research Methods and Social Theory