Academia.eduAcademia.edu

JSE Editorial 32-2 on replication in parapsychology.pdf

2016, Journal of Scientific Exploration

Journal of Scientific Exploration, Vol. 32, No. 2, pp. 255–264, 2018 0892-3310/18 EDITORIAL DOI: https://doi.org/10.31275/2018/1330 W hen I first dipped my toe tentatively into the frigid waters of psi research, back in the late 1970s, one of the big issues of the time was whether the ability to replicate experiments distinguishes—or as philosophers often say, demarcates—science from non-science (or pseudoscience). This was a big issue because all too often parapsychological skeptics glibly used that demarcation criterion to bludgeon psi researchers and dismiss them as unscientific. Fortunately, in those days there was some very sensible writing on the subject, particularly from Harry Collins, to whom I was especially indebted when I tackled the topic of replicability myself for the first time.1 The skeptical position on the issue of repeatability struck me as so lame that I even naïvely expected the debate to be settled rather quickly. However, because psi researchers often enter the field having little acquaintance with the work that preceded them, and because many critics of that research likewise fail to master the relevant issues, I suppose I shouldn’t be surprised that the debate over the nature and importance of replicability still rages. Indeed, little (if any) attention is given to the reasonable points that should have put that issue to rest long ago. Instead, researchers and commentators focus relentlessly—and as usual, inconclusively—on the results of meta-analyses. Some of those meta-analyses are indeed worthy of attention,2 but (I would say) only in light of the overlooked considerations I discuss below. So I’d like to review some problems with the still-widely–held view that the ability to replicate experiments is what demarcates science from non-science or pseudoscience.3 As I see it, that position is both shallow and confused, and the problems with it don’t even have the virtue of being subtle. First, the skeptical reliance on the demarcation criterion rests on a naïve conception of the actual importance within science of experimental repeatability. Indeed, experimental repeatability plays little if any role in disciplines (including some physical sciences) whose scientific credentials are not in dispute. Second, it seriously misconstrues how the appeal to replicability works even in those physical sciences where it plays a real role. Third, the received view rests on philosophical confusions regarding the nature of similarity—in particular, the flawed idea that there can be formal, context-independent, criteria for the similarity of two things. And fourth, it rests on confusions over the nature of human abilities generally, and in Editorial 256 particular, the appropriate methodologies for studying them. One could also argue that a fifth problem for this received view is that psi research can in fact point to replicable results. But that last issue must be reserved for another occasion. The Real Role of Replication in Science It’s clear enough why some people place great emphasis on the replication of experiments, both in parapsychology and in orthodox science. The familiar, underlying idea is that if an experiment E gives a result which replication attempts are unable to reproduce, we have reason to regard E’s result as scientifically dubious. And if continued attempts to replicate E fail to duplicate E’s result, we have, it would seem, prima facie evidence for taking that result to be due to a flaw in E’s experimental design, or to experimenter negligence or incompetence, or perhaps even to chicanery. As a rule, then, only experiments whose results can be repeated are considered genuine and reliable. This, clearly, is why some consider experimental repeatability to be a demarcation criterion between science and non-science. So let’s consider first the respects in which the received view’s underlying conception of repeatability is naïve. The replicability criterion is obviously borrowed from the physical sciences—but only from some of them (primarily, physics and chemistry). However, experimental repeatability has very little utility in other physical sciences of impeccable credentials—for example, geology and astronomy. Moreover, the received view seriously misconstrues how the appeal to replicability works even in those physical sciences where it plays a major role. To see this, consider first the abstract question: In what respect(s) can we allow replication attempt E2 to differ from an original experiment E1 and still consider it to be a replication attempt?4 Obviously, the two experiments can’t be alike in all respects; they would then be identical, not different experiments. Clearly, E1 and E2 will at least differ with respect to time and/or place of the experiments. But of course, many other changes will accompany the changes in time and place. These will likely include, for example, differences in the experimental conditions or environment (including inevitable subtle changes in the experimental apparatus required—especially sophisticated, sensitive, and delicate equipment that may continually require fine-tuning), or changes in the actual participants (or just their state of mind). All of these may vary subtly or dramatically from one test to another. But this means that some changes between E1 and E2 must be tolerated. But in that case, how is one supposed to determine, before the results are in, which differences (if any) matter? What is seldom observed (except Editorial 257 by Harry Collins) is that in every science in which experimentation plays a role, it is standard practice to tolerate many differences between original experiments and replication attempts. But that means that scientists in these domains are working with a very loose conception of replication. In fact, scientists who rely on replication attempts don’t—and can’t—decide, until the results are in, whether the inevitable differences between experiments matter. But that means they can’t specify, in advance of conducting a replication attempt, a reliable, much less formal, recipe for replicating the original experiment. Let’s look at this in more detail. Consider first how these observations are true even in the so-called “hard” sciences. In physics, for example, an experiment conducted at laboratory L1 with a certain kind of particle accelerator might be replicated at laboratory L2 with a different design of accelerator. In microbiology, experiments conducted with microorganism M1 in solution S1 might be replicated by studying M1 in a different solution S2 (which may have been more convenient to use, but whose differences are regarded as not making a difference). In fact, even a different microorganism M2 might have been substituted and its difference discounted. And of course, despite the expectations (or at least the hopes) of the replicating scientist, it’s always possible that such differences between experiments lead to differences in experimental outcome. One thing this means is that, as good science is actually practiced, the concepts of similarity of design, agreement of results, or same result are both loose and elastic. (We’ll return to this point when considering the nature of similarity.) For now, the important point is that the inevitable differences between experiments E1 and E2 will be ignored if their results agree, whereas those same differences might be deemed both relevant and critical if the different experiments yield relevantly different results. But that means it’s not decided in advance—on purely formal grounds—whether the inevitable differences in E1 and E2 matter. If E2 gets unexpected or undesired results, only then might scientists consider that they were wrong in assuming that the differences didn’t matter. But in that case, if E2’s results are considered different from those of E1, scientists could easily conclude that E2 wasn’t the same experiment as E1, rather than a failed replication. Thus it can be unclear what the difference is between a failed replication and a different experiment. Of course in that case, since differences between E1 and E2 can’t be avoided and may lead to a difference in outcome for two experiments, the failure of E2 to achieve the same results as E1 doesn’t automatically discredit or even cast serious doubt on E1. Granted, the situation changes somewhat when a series of replication attempts fails to produce the results Editorial 258 of the original experiment. But since all those experiments will likewise differ from each other, it’s hardly a straightforward matter to tease out what’s responsible for what. Now in parapsychology, where differences in participants (or their state of mind) may be considerable, and where (thanks to the source of psi problem) we can’t be sure who might be influencing experimental outcomes (e.g., the “official” subject, experimenter, onlooker, analyzer), the notions of “same experiment” and “same result” seem especially unclear. But even if we ignore source-of-psi complications, psi research demonstrates the same sort of loose conception of repeatability found in the physical sciences. In parapsychology, E2 may differ from E1 with respect to (for example) the method of stimulating or eliciting a subject’s response, providing a subject with feedback, evaluating subjects’ responses, the type of interaction permitted between experimenter and subject (including the words spoken and the inflection of those words), and even in the type of response required of the subject. But this is at most just a difference in degree—not a difference in kind— from what we find in non-behavioral sciences. It certainly doesn’t justify the claim that parapsychology is a non-science, or that it’s a pseudoscience, or that parapsychology has no repeatable experiments. Methodologically speaking, experimental psi research operates with the same loose conception of repeatability we find in physics, chemistry, and microbiology. And in none of these cases must this reveal a defect in the way the science is practiced. Rather, it’s a simple consequence of the inevitable differences between any experiment E1 and any attempted replication E2. Philosophical Problems with the Received View To lend a somewhat broader perspective to this discussion, we should also observe that some difficulties in determining when an experiment has been repeated are not peculiar to the scientific enterprise or to the process of experimentation. Rather, they’re instances of the more general problem of determining when any sort of event has been repeated. These problems, in other words, concern the general concept of recurrence, and even more fundamentally, the concept of similarity. Thus, the question “When does E2 replicate E1?” is at bottom a question about when two experiments count as similar. But the concept of similarity is irreducibly context-dependent. That is, things are not inherently either similar or dissimilar. They must count or be taken as similar or dissimilar relative to some context of inquiry and criteria of relevance. And since no context of inquiry is inherently privileged, that means there’s no privileged answer to questions of the form: Is A similar to B? Editorial 259 Consider for example: Are the movements of an elephant similar to those of a flea? Clearly, there’s no privileged correct answer to that question. Sometimes size matters, and sometimes it doesn’t. Similarly, does a young beginner’s golf swing contain the same movements as the swing of Tiger Woods? Again, there’s no privileged correct answer. We might say “yes” if we’re comparing golf swings to tennis swings, but not when the focus is on fine differences between the techniques of different golfers. And neither of those perspectives enjoys inherent priority over the other. Or suppose I try to tell the same joke I heard someone tell the day before. Is the joke I told similar or not to the one I heard earlier? Obviously, it depends on what’s relevant to our answering that question, and no criteria of relevance are inherently privileged over the others. Depending on the situation, we might focus on whether my joke made the audience laugh, or whether the words were exactly the same, or delivered at the same speed, or with the same accent or timing, or with the same inflection, or whether my voice had the same timbre as that of my predecessor. The point should be clear: Similarity is not a static two-term relation obtaining inherently between the things taken to be similar. Rather, similarity exists only with respect to variable and shifting criteria of relevance. It can only be a dynamic relation holding between things at a time and within a context of needs and interests.5 Likewise, in scientific experimentation, whether E2 replicates E1 is not strictly a function of the formal, much less antecedently specifiable, features of the two experiments. We’ve already observed that replication attempts will inevitably differ in some respects from the original experiment. But we also noted that in the ordinary course of the established sciences, some of these differences—for example, the mental states of the experimenters, and differences in the equipment used—tend to be discounted when a replication attempt is deemed successful. However, in parapsychology the very same sorts of differences are regarded as potentially relevant to the experimental outcome, although even in parapsychology such differences might also be discounted when replication attempts are deemed successful. It’s also worth noting that there are further complications in deciding what counts as the same result. For example, one can raise legitimate questions about what counts as an appropriate level of significance from mean chance expectation. And in parapsychology, a familiar nagging issue is whether extra-chance negative results (psi-missing) can be allowed to replicate positive scoring in an earlier experiment. These complications reinforce the points already made about the loose way in which the concept of replication is inevitably used in science, and they needn’t be considered in more detail here.6 Editorial 260 Human Abilities Because psi experiments presumably study ostensible abilities of their subjects, it obviously matters what sort of endowment psi might be, and whether the methodologies used to study it are appropriate to those endowments. But then, it’s important to note some relevant, but typically unheralded, matters regarding human abilities.7 The first point to note is that the notion of a human ability (like the concept of replication) is extremely loose and elastic, covering an enormously wide terrain. In one appropriate and also very common use of the term, “ability” can stand for rudimentary and more or less universal human (or organic) endowments. For example, we can speak of someone’s ability to laugh, experience fear, express aggression or compassion, or merely breathe, blink, or move the muscles in one’s arm. In this sense of the term, an ability needn’t be any kind of proficiency or skill, or disposition to exhibit such a proficiency. But the term “ability” can also denote various degrees of competence or mastery—for example, when we speak of a person’s ability, to learn a new language, carry a tune, hammer a nail, or control pain through selfhypnosis. And of course it can also denote competencies requiring great mastery, as in the ability to play professional-level tennis, write a string quartet, dock a space capsule, read an orchestral score, or solve quadratic equations. “Ability” in this sense seems nearly synonymous with what we usually mean by “skill.” But if we’re to have a nuanced, general account of human abilities, we must also consider some other endowments, likewise unevenly distributed among humans, but which we would probably not want to label as skills. Consider, for example, the ability to fire an employee, express sensuality, speak in front of an audience, inspire loyalty in others, remain hopeful in the face of adversity, manipulate others through guilt, and laugh at oneself. A moment’s reflection on the examples above should make it clear that the term “ability,” like most ordinary language expressions, has no single and preferred—much less clear and unambiguous—meaning. That’s one reason why there’s no interesting set of properties shared by all the things we consider abilities, and which distinguish abilities from non-abilities. At best, the different senses of “ability” merely identify useful points on a continuum of human endowments (ordered roughly in terms of complexity and refinement). And the reason this matters is that it reveals why laboratory research in parapsychology is almost ludicrously premature. It highlights the fact that researchers have no idea what kind of organic function they’re trying to investigate. Not only are we ignorant of psi’s Editorial 261 finer-grained features, we don’t even know what its natural history might be—for example, whether it has an evolutionary role or primary or overall purpose or function (although there is no shortage of speculation on these matters8). Of course, there’s no reason to think that psychic phenomena occur only for parapsychologists, much less only when those parapsychologists set out to look for them. After all, a major motivation for conducting formal studies is that we have evidence of psi occurring spontaneously in life. But since we’re a very long way from understanding the nature and function of everyday psi, we don’t know whether psychic functioning resembles musical or athletic abilities in its variability, or whether it’s a brute endowment such as the capacity to see or to move one’s limbs. Obviously, then, in the absence of this rudimentary knowledge, we have no idea whether (or to what extent) our experimental procedures are even appropriate to the phenomena. To see this, compare our knowledge and study of psi with our knowledge and study of memory. Memory is something we can study formally to some extent. But we have some idea how to proceed because we’re already very familiar with the many and diverse manifestations of memory in daily life. Or compare our knowledge of psi with our knowledge of the ability to be witty. It’s because we’re familiar with the latter that we know we can’t adequately study it experimentally. Or again, a tennis player’s ability to return serves is something that—unlike everyday psi— we can systematically and easily examine in real-life, relevant settings. In fact, we can study that ability pretty much on demand, and from virtually anyone who claims to have the ability. It should be obvious, then, that different abilities, as a rule, demand different modes of investigation. We wouldn’t examine mechanical aptitude the same way we investigate the ability to produce witty remarks, the ability to baby-sit, the ability to design and install a patio, the ability to learn a new language, the ability to empathize, or the skill of playing football widereceiver or soccer goal-keeper. Similarly, techniques appropriate to studying those abilities will differ from those suitable for examining rudimentary endowments, such as the capacity to blink, swallow, utter sounds, or dream. Furthermore, for most human abilities, it’s hard to pin down what, exactly, we need to look at. Consider, for example, the ability to compose music. Clearly, that ability can be expressed in many ways. Many composers notate their compositions; others lack that ability. Some composers have absolute pitch, some only relative pitch, and some neither. Some compose directly onto paper, while others need a piano or some other instrument. Some work best with large forms; others don’t. Some write especially well or idiomatically only for certain instruments; others don’t have that limitation. 262 Editorial Some have a keen ability to set words to music; others lack that sub-ability. Some are especially adept at harmony, rhythm, or instrumental color, and those specialties likewise take different forms and manifest in different degrees and combinations. But then there should be little temptation to think that compositional ability allows many useful generalizations. And there’s no reason to think this case is unique; the same is obviously true, for example, in the case of athletic ability, or comedic ability. What we do know is that people who possess a general ability may exhibit it in various ways and to varying degrees. The differences have to do with the subsidiary abilities or skills they possess and the manner in which they possess them. The moral here should be obvious: At our current level of ignorance, we’re in no position to say that psychic functioning is an exception to this rule. In fact, one can argue plausibly that the manifestation of psi is as deeply idiosyncratic and variable as any other ability. Psi-conducive conditions may be as personal and individual as the conditions people find amusing, or erotic. Most subjects don’t do their best under intense pressure or when the stakes are high (say, during a live television demonstration), but a few excel under those conditions and even relish the challenge. And some may be able to demonstrate psi only in the presence of select others—for example, investigators they find especially supportive or agreeable, just as most people can sing or express sensuality only in the presence of those with whom they feel personally safe. Second, the subjective experience of exercising psi varies widely—for example, whether ESP is accompanied by vivid, familiar, or any imagery. And third, the range and specificity of the ability may also vary idiosyncratically. For example, one might be good at psychokinetically influencing small objects but not at affecting random event generators in computerized experiments. Or, one might be good at remote-viewing shapes but not technical details, or colors but not smells, or medium-sized objects but not words on paper (Pat Price, notoriously, was distinctively [if not uniquely] good at this latter task). In fact, this type of ESP variability would parallel a familiar feature of more ordinary perceptual differences. Some are particularly good at (say) discriminating colors but not sounds, detecting subtle differences in wines or chocolates but not in audio components, or noticing eye color but not manipulative behavior. Moreover, many (and perhaps all) abilities are highly contextdependent and can be expressed or studied properly only under quite specific conditions. For example, we can evaluate a tennis player’s ability to return serves only under physically and psychologically challenging game conditions. Similarly, a pianist’s ability to play the “Waldstein” Sonata, or a comedian’s ability to be funny, varies with confidence level, Editorial 263 audience attitude, personal distractions, and so on. So if psychic functioning is analogous to these sorts of organic endowments (as many think and as both experimental evidence and anecdotal reports suggest), then we’d be entitled to say that not everyone is psychic, that some are more psychic than others (enough so to count as “stars” or as gifted), and that not all psychics are psychic in the same way. Needless to say, this can only complicate the process of replicating experiments, both across different subjects, and even with a single subject. But what if psychic functioning is analogous to elementary capacities? In that case, psi might be as uniformly distributed among humans as pulmonary or reproductive functioning, or as reflexive and involuntary as nursing behavior or fear responses. Moreover, although some lack these familiar capacities or possess them only in attenuated forms, most people have no such limitations. Analogously, the capacity to function psychically might be robust in all but a few individuals. It might also be the sort of thing we do all or much of the time, and the processes involved may be as removed from conscious awareness and control as those involved in digestion or breathing. However, even if psi functioning is a largely involuntary universal (or nearly universal) endowment, it may still be situation-sensitive to a degree that frustrates attempted replications. After all, our heart rate and digestion, as well as the capacity to sleep, breathe deeply, or ward off infections, can also vary considerably from one occasion to the next. In fact, if the exercise of psi capacities is need-determined (as some have proposed), then it could be analogous to and as variable as the capacity to increase adrenaline flow, or produce endorphins, or the ability to move or respond quickly, act decisively, or be courageous, or cheerful in difficult times, or selfless when a loved one needs to be protected. Clearly, without some solid grounding, prior to experimentation, concerning what sort of human endowment is being investigated, psi researchers can’t expect to know, say, whether replicating an experiment with different subjects is even feasible, or whether it’s feasible only if the same subject is re-tested, and then only under conditions as similar as possible to those in earlier successful experiments (assuming that can even be determined with any confidence). So even though parapsychology’s replication scorecard may not match that of most physical sciences, and even though that does not undermine parapsychology’s status as a legitimate area of scientific inquiry, the field nevertheless remains at an early stage of investigation. Indeed, until we have a more adequate natural history of psi, worrying about replicability may be pointless. In fact, given our current and considerable level of ignorance, one Editorial 264 could argue that research emphasis should still be on proper documentation and vetting of spontaneous or semi-experimental cases. —STEPHEN E. BRAUDE Notes 1 2 3 4 5 6 7 8 See Collins (1976, 1978). Also (and later) Collins (1992). My Collinsinspired discussion was in Braude (1979) and (later) Braude (2002). For example, Storm et al. (2017) and Cardeña (2018). Some of what follows I covered in an earlier Editorial—in JSE 27:1. Evidently, that effort had the usual lack of impact, and so I figure the topic merits another try. Intrepid readers might want to consult that earlier opus for several points not made here. I should remind the reader that Collins’ original discussions of this issue (and even mine) are considerably more detailed and nuanced. What follows here are simply a few highlights. For more on the concept of similarity, see Braude (2014:Chapters 1 and 2). But see Collins (1992) and Braude (2002). For an extended discussion of this topic, see Braude (2014). For some of the best, see Eisenbud (1992). References Cited Braude, S. E. (1979). ESP and Psychokinesis: A Philosophical Examination. Philadelphia: Temple University Press. Braude, S. E. (2002). ESP and Psychokinesis: A Philosophical Examination (Revised Edition). Parkland, FL: Brown Walker Press. Braude, S. E. (2014). Crimes of Reason: On Mind, Nature, & the Paranormal. Lanham, MD: Rowman & Littlefield. Cardeña, E. (2018, May 24). The Experimental Evidence for Parapsychological Phenomena: A Review. American Psychologist. Advance publication. http://dx.doi.org/10.1037/amp0000236 Collins, H. M. (1976). Upon the Replication of Scientific Findings: A Discussion Illuminated by the Experiences of Researchers into Parapsychology. Paper presented at the First International Conference on Social Studies of Science, Cornell University, November. Collins, H. M. (1978). Science and the Rule of Replicability. Paper presented at the AAAS Symposium on Replication and Experimenter Influence, Washington, DC, February. Collins, H. M. (1992). Changing Order: Replication and Induction in Scientific Practice. Chicago: University of Chicago Press. Eisenbud, J. (1992). Parapsychology and the Unconscious. Berkeley, CA: North Atlantic Books. Storm, L., Sherwood, S. J., Roe, C. A., et al. (2017). On the correspondence between dream content and target material under laboratory conditions: A meta-analysis of dream-ESP studies, 1966–2016. International Journal of Dream Research, 10(2):120–140.