0% found this document useful (0 votes)

34 views

Judging Interpretations: Thomas A. Schwandt

It is a important text from Lincoln Guba about the trustworthniness in qualitative recherche

Uploaded by

mthgroup

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Judging Interpretations: Thomas A. Schwandt

It is a important text from Lincoln Guba about the trustworthniness in qualitative recherche

Uploaded by

mthgroup

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

3

This chapter situates Guba and Lincoln’s chapter within

the broad philosophical debate about the justifiability of
interpretations.

Judging Interpretations
Thomas A. Schwandt

Among the most knotty problems faced by investigators committed to inter-

pretive practices in disciplines and fields such as sociocultural anthropology,
jurisprudence, literary criticism, historiography, feminist studies, public
administration, policy analysis, planning, educational research, and evalua-
tion are deciding whether an interpretation is credible and truthful and
whether one interpretation is better than another. Several contested episte-
mological issues make this problem particularly complex and very difficult
both to understand and to solve (Bohman, Hiley, and Shusterman, 1991).
First is the claim that interpretation is an omnipresent feature of all human
attempts to understand—in other words, there can be no appeal to some
kind of evidence, experience, or meaning that is somehow outside of inter-
pretation, independent of it, or more basic than it. To put it more directly, an
interpretivist would say there is no such thing as the “interpretation” of
the value of some policy or program, on the one hand, and “evidence” of the
value of that policy or program, on the other hand, with the latter being more
basic than the interpretation or in some way independent of the interpreta-
tion. Although, of course, evidence does matter, the very act of generating
evidence or identifying something as evidence is itself an interpretation.
Second is the interpretivists’ claim that every interpretation is made in some
context or background of beliefs, practices, or traditions. This does not
necessarily mean that every interpretation is, therefore, subjective (that is,
the product of the personal view of the interpreter). In fact, it means just the
opposite—namely, there is always an intersubjective aspect of interpretation;
the investigator cannot help but always be situated relative to (and cannot
escape) social circumstances such as a web of beliefs, practices, standpoints,

NEW DIRECTIONS FOR EVALUATION, no. 114, Summer 2007 © Wiley Periodicals, Inc.
Published online in Wiley InterScience (www.interscience.wiley.com) • DOI: 10.1002/ev.223 11
12 ENDURING ISSUES IN EVALUATION

and the like that he or she has learned as ways of living and grasping the
world (as expressed by Joseph Rouse, 1987). Third, a consequence of these
two assertions is the notion that if interpretations are always made in a con-
text or background of shared (social) beliefs and practices, it follows that
interpretations are, in an important sense, infused with political and ethical
implications related to matters of power and authority. In other words,
interpretation is not simply an individual cognitive act but a social and
political practice. Clearly, these central principles of a philosophy of inter-
pretivism stand in sharp contrast to what is, more or less, a standard
epistemological account of establishing the objectivity and truthfulness of
claims that we make about the world. On that account, a claim is consid-
ered objective and true to the extent that it is free of any biasing influence
of context or background beliefs and accurately mirrors the way the world
really is.
It is against this backdrop (and fairly fearlessly entering into this
complicated epistemological matter) that Egon Guba and Yvonna Lincoln
offered their thinking on the question of appropriate criteria for judging
evaluations as interpretations. In the chapter that appears on the following
pages (Guba and Lincoln, 1986) and in the two very influential books that
serve as bookends to it (Guba and Lincoln, 1985, 1989), they built an
argument for the way those committed to the interpretive practice of
evaluation could profitably address the difficult problem of demonstrating
the credibility of their interpretations. To their credit, as is apparent in the
following chapters as well as in the aforementioned books, they did not offer
their way of thinking as the last word but rather as an invitation to further
debate and consideration. For those of us who have, in the past twenty
years, subsequently wrestled with the problem of the nature and justifica-
tion of interpretations, their work has remained a touchstone for both
disagreement on the part of some scholars and elaboration and extension
on the part of others in many fields of study. They should be happy that the
invitation they issued has been accepted.
What they describe in the chapter are two approaches to thinking about
the problem of justifying interpretations. One way they characterize as that
of employing trustworthiness criteria, and they describe these criteria as
analogs to “scientific” understandings of conventional notions of internal
validity (credibility), external validity (transferability), reliability (depend-
ability), and objectivity (neutrality). The second way, they argue, is funda-
mentally different, and more aligned with assumptions about interpretations
as socially constructed undertakings with significant implications for the
ways in which we inevitability use those interpretations to continue to go
on with one another (as Wittgenstein might have said)—that is, in making
sense of or understanding one another and subsequently acting with
confidence on those understandings. Thus, they offered a new (and some-
times difficult) language of authenticity criteria—fairness, ontological
authenticity, educative authenticity, and catalytic authenticity.
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
JUDGING INTERPRETATIONS 13

To my way of thinking, although perhaps not to theirs and others, these

two ways of approaching the knotty problem of justifying interpretations as
credible and truthful are not opposed; in fact, they are complementary.
To illustrate, consider the following “evaluative” case: My brother and I have
an elderly mother whose everyday life is made more stressful and compli-
cated because she is experiencing progressive dementia and living with a
husband who is legally blind. As elderly siblings do with each other, I find
myself in the position of evaluating my brother’s response to that situation:
Is he doing the right thing for our mother and her husband? On the one
hand, anyone looking in on that situation would ask whether I really under-
stand the situation and his interactions with them: Do I have the facts of the
matter straight? How would I make such assurances of the credibility of my
interpretations to an onlooker? I might point out that I have witnessed their
interactions firsthand for quite some time and heard both him and my
parents describe the situation (persistent observation and prolonged engage-
ment), and that I have asked how others privy to their interactions have
sized up his behavior and responses (triangulation), that I have done my
best to find evidence that contradicts my interpretation (search for negative
cases)—all of these actions leading to some assurances of the credibility,
dependability, and confirmability of my interpretations about the value and
appropriateness of his responses to the situation. I might develop a story of
the way he interacts with them and tell it to others who find themselves
in the same situation with elderly parents as a way of demonstrating that
there is something to be learned here (transferability).
Yet surely, on the other hand, anyone looking in on this situation would
ask some other questions about what happened in the process of my form-
ing an interpretation of the value of my brother’s way of dealing with the
situation. For example, they might ask whether I really have a balanced view
of his reactions: Have I taken into account what my parents think? Have
I taken into account my own ways of thinking about what it is right to do
in this situation (fairness)? Second, they might ask whether I actually
discussed my evolving interpretations of the situation with my brother and
whether either of us came to better understand and appreciate (although
not necessarily agree with) each other’s way of thinking and acting (educa-
tive authenticity). Third, an onlooker might ask whether the understand-
ings developed through the sharing of our interpretations of the situation
substantially challenged both his and my self-understandings, and whether
either of us, as a result of our conversation, was not simply moved to some
new or different understanding of the situation at hand but actually began
to act differently (catalytic and tactical authenticity).
I leave it to readers to extend by analogy this way of thinking to a more
complicated evaluative case they have encountered. Guba and Lincoln’s
approaches to how we might profitably think about justifying the credibil-
ity and truthfulness of the interpretations we make in the interpretive prac-
tice of evaluation (and nursing, public administration, planning, cultural
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
14 ENDURING ISSUES IN EVALUATION

anthropology, and so on) are an extension of the ways we support the truth-
fulness, honesty, correctness, and actionability of our interpretations in
everyday life. To successfully defend our interpretations we appeal to crite-
ria of both trustworthiness and authenticity. Guba and Lincoln name these
ways in shorthand expressions befitting our ways of thinking of social
scientific practices like evaluation; yet, be not misled, more importantly they
have invited us to think more carefully about what judging the credibility
of interpretations actually entails in both our everyday lives and our
professional lives as interpreters of human actions.

References
Bohman, J. F., Hiley, D. R., and Shusterman, R. “Introduction: The Interpretive Turn.”
In D. R. Hiley, J. F. Bohman, and R. Shusterman (eds.), The Interpretive Turn: Philosophy,
Science, Culture (pp. 1–16). Ithaca, N.Y.: Cornell University Press, 1991.
Guba, E. G., and Lincoln, Y. S. Naturalistic Inquiry. Thousand Oaks, Calif.: Sage, 1985.
Guba, E. G., and Lincoln, Y. S. “But Is It Rigorous? Trustworthiness and Authenticity in
Naturalistic Evaluation.” In D. Williams (ed.), Naturalistic Evaluation. New Directions
for Evaluation, no. 30. San Francisco: Jossey-Bass, 1986.
Guba, E. G., and Lincoln, Y. S. Fourth Generation Evaluation. Thousand Oaks, Calif.:
Sage, 1989.
Rouse, J. Knowledge and Power. Ithaca, N.Y.: Cornell University Press, 1987.

THOMAS A. SCHWANDT is professor of education at the University of Illinois,

Urbana-Champaign. He earned his Ph.D. in inquiry methodology at Indiana
University, Bloomington, where Egon Guba was his thesis director.
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
The emergence of a new paradigm of inquiry (naturalistic)
has, unsurprisingly enough, led to a demand for rigorous
criteria that meet traditional standards of inquiry. Two sets
are suggested, one of which, the “trustworthiness” criteria,
parallels conventional criteria, while the second, “authen-
ticity” criteria, is implied directly by new paradigm
assumptions.

But Is It Rigorous? Trustworthiness

and Authenticity in Naturalistic
Evaluation
Yvonna S. Lincoln, Egon G. Guba

Until very recently, program evaluation has been conducted almost exclusively
under the assumptions of the conventional, scientific inquiry paradigm using
(ideally) experimentally based methodologies and methods. Under such
assumptions, a central concern for evaluation, which has been considered a
variant of research and therefore subject to the same rules, has been how to
maintain maximum rigor while departing from laboratory control to work in
the “real” world.
The real-world conditions of social action programs have led to increas-
ing relaxation of the rules of rigor, even to the extent of devising studies looser
than quasi-experiments. Threats to rigor thus abound in sections explaining
how, when, and under what conditions the evaluation was conducted so
that the extent of departure from desired levels of rigor might be judged.
Maintaining true experimental or even quasi-experimental designs, meeting
the requirements of internal and external validity, devising valid and reliable
instrumentation, probabilistically and representatively selecting subjects and
assigning them randomly to treatments, and other requirements of sound
procedure have often been impossible to meet in the world of schools
and social action. Design problems aside, the ethics of treatment given and
treatment withheld poses formidable problems in a litigious society (Lincoln
and Guba, 1985b).

We are indebted to Judy Meloy, graduate student at Indiana University, who scoured the
literature for references to fairness and who developed a working paper on which many
of our ideas depend.

D. D. Williams (Ed.). Naturalistic Evaluation.

NEW DIRECTIONS FOR PROGRAM EVALUATION, no. 30. San Francisco: Jossey-Bass, June 1986. 15
16 ENDURING ISSUES IN EVALUATION

Given the sheer technical difficulties of trying to maintain rigor and

given the proliferation of evaluation reports that conclude with that
ubiquitous finding, “no significant differences,” is it not surprising that the
demand for new evaluation forms has increased. What is surprising—for all
the disappointment with experimental designs—is the continued demand that
new models must demonstrate the ability to meet the same impossible
criteria! Evaluators and clients both have placed on new-paradigm evalua-
tion (Guba and Lincoln, 1981; Lincoln and Guba, 1985a) the expectation
that naturalistic evaluations must be rigorous in the conventional sense,
despite the fact that the basic paradigm undergirding the evaluation approach
has shifted.
Under traditional standards for rigor (which have remained largely
unmet in past evaluations), clients and program funders ask whether natu-
ralistic evaluations are not so subjective that they cannot be trusted. They ask
what roles values and multiple realities can legitimately play in evaluations
and whether a different team of evaluators might not arrive at entirely differ-
ent conclusions and recommendations, operating perhaps from a different set
of values. Thus, the rigor question continues to plague evaluators and clients
alike, and much space and energy is again consumed in the evaluation report
explaining how different and distinct paradigms call forth different evaluative
questions, different issues, and entirely separate and distinct criteria for
determining the reliability and authenticity—as opposed to rigor—of findings
and recommendations.

Rigor in the Conventional Sense

The criteria used to test rigor in the conventional, scientific paradigm
are well known. They include exploring the truth value of the inquiry or
evaluation (internal validity), its applicability (external validity or general-
izability), its consistency (reliability or replicability), and its neutrality
(objectivity). These four criteria, when fulfilled, obviate problems of
confounding, atypicality, instability, and bias, respectively, and they do so,
also respectively, by the techniques of controlling or randomizing possible
sources of confounding, representative sampling, replication, and insulation
of the investigator (Guba, 1981; Lincoln and Guba, 1985a). In fact, to use a
graceful old English cliché, the criteria are honored more in the breach than
in the observance; evaluation is but a special and particularly public instance
of the impossibility of fulfilling such methodological requirements.

Rigor in the Naturalistic Sense: Trustworthiness and

Authenticity
Ontological, epistemological, and methodological differences between the
conventional and naturalistic paradigms have been explicated elsewhere
(Guba and Lincoln, 1981; Lincoln and Guba, 1985a; Lincoln and Guba, 1986;
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
TRUSTWORTHINESS AND AUTHENTICITY IN NATURALISTIC EVALUATION 17

Guba and Lincoln, in press). Only a brief reminder about the axioms that
undergird naturalistic and responsive evaluations is given here.
The axiom concerned with the nature of reality asserts that there is no
single reality on which inquiry may converge, but rather there are multiple
realities that are socially constructed, and that, when known more fully, tend
to produce diverging inquiry. These multiple and constructed realities
cannot be studied in pieces (as variables, for example), but only holistically,
since the pieces are interrelated in such a way as to influence all other
pieces. Moreover, the pieces are themselves sharply influenced by the nature
of the immediate context.
The axiom concerned with the nature of “truth” statements demands
that inquirers abandon the assumption that enduring, context-free truth
statements—generalizations—can and should be sought. Rather, it asserts
that all human behavior is time- and context-bound; this boundedness
suggests that inquiry is incapable of producing nomothetic knowledge but
instead only idiographic “working hypotheses” that relate to a given and
specific context. Applications may be possible in other contexts, but they
require a detailed comparison of the receiving contexts with the “thick
description” it is the naturalistic inquirer’s obligation to provide for the
sending context.
The axiom concerned with the explanation of action asserts, contrary
to the conventional assumption of causality, that action is explainable only
in terms of multiple interacting factors, events, and processes that give shape
to it and are part of it. The best an inquirer can do, naturalists assert, is to
establish plausible inferences about the patterns and webs of such shaping
in any given evaluation. Naturalists utilize the field study in part because it
is the only way in which phenomena can be studied holistically and in situ in
those natural contexts that shape them and are shaped by them.
The axiom concerned with the nature of the inquirer-respondent
relationship rejects the notion that an inquirer can maintain an objective
distance from the phenomena (including human behavior) being studied, sug-
gesting instead that the relationship is one of mutual and simultaneous influ-
ence. The interactive nature of the relationship is prized, since it is only because
of this feature that inquirers and respondents may fruitfully learn together. The
relationship between researcher and respondent, when properly established, is
one of respectful negotiation, joint control, and reciprocal learning.
The axiom concerned with the role of values in inquiry asserts that far
from being value-free, inquiry is value-bound in a number of ways. These
include the values of the inquirer (especially evident in evaluation, for exam-
ple, in the description and judgment of the merit or worth of an evaluand),
the choice of inquiry paradigm (whether conventional or naturalistic, for
example), the choice of a substantive theory to guide an inquiry (for exam-
ple, different kinds of data will be collected and different interpretations
made in an evaluation of a new reading series, depending on whether the eval-
uator follows a skills or a psycholinguistic reading theory), and contextual
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
18 ENDURING ISSUES IN EVALUATION

values (the values inhering in the context, and which, in evaluation, make a
remarkable difference in how evaluation findings may be accepted and used).
In addition, each of these four value sources will interact with all the others
to produce value resonance or dissonance. To give one example, it would be
equally absurd to evaluate a skills-oriented reading series naturalistically as
it would to evaluate a psycholinguistic series conventionally because of the
essential mismatch in assumptions underlying the reading theories and
the inquiry paradigms.
It is at once clear, as Morgan (1983) has convincingly shown, that the
criteria for judging an inquiry themselves stem from the underlying para-
digm. Criteria developed from conventional axioms and rationally quite
appropriate to conventional studies may be quite inappropriate and even
irrelevant to naturalistic studies (and vice versa). When the naturalistic
axioms just outlined were proposed, there followed a demand for develop-
ing rigorous criteria uniquely suited to the naturalistic approach. Two
approaches for dealing with these issues have been followed.
Parallel Criteria of Trustworthiness. The first response (Guba, 1981;
Lincoln and Guba, 1985a) was to devise criteria that parallel those of the
conventional paradigm: internal validity, external validity, reliability, and
objectivity. Given a dearth of knowledge about how to apply rigor in the
naturalistic paradigm, using the conventional criteria as analogs or
metaphoric counterparts was a possible and useful place to begin. Further-
more, developing such criteria built on the two-hundred-year experience of
positivist social science.
These criteria are intended to respond to four basic questions (roughly,
those concerned with truth value, applicability, consistency, and neutrality),
and they can also be answered within naturalism’s bounds, albeit in different
terms. Thus, we have suggested credibility as an analog to internal validity,
transferability as an analog to external validity, dependability as an analog to
reliability, and confirmability as an analog to objectivity. We shall refer
to these criteria as criteria of trustworthiness (itself a parallel to the term rigor).
Techniques appropriate either to increase the probability that these
criteria can be met or to actually test the extent to which they have been
met have been reasonably well explicated, most recently in Lincoln and
Guba (1985a). They include:

For credibility:

• Prolonged engagement—lengthy and intensive contact with the phenom-

ena (or respondents) in the field to assess possible sources of distortion
and especially to identify saliencies in the situation
• Persistent observation—in-depth pursuit of those elements found to be
especially salient through prolonged engagement
• Triangulation (cross-checking) of data—by use of different sources, meth-
ods, and at times, different investigators
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
TRUSTWORTHINESS AND AUTHENTICITY IN NATURALISTIC EVALUATION 19

• Peer debriefing—exposing oneself to a disinterested professional peer to

“keep the inquirer honest,” assist in developing working hypotheses,
develop and test the emerging design, and obtain emotional catharsis
• Negative case analysis—the active search for negative instances relating
to developing insights and adjusting the latter continuously until no
further negative instances are found; assumes an assiduous search
• Member checks—the process of continuous, informal testing of information
by soliciting reactions of respondents to the investigator’s reconstruction of
what he or she has been told or otherwise found out and to the construc-
tions offered by other respondents or sources, and a terminal, formal testing
of the final case report with a representative sample of stakeholders.

For transferability:

• Thick descriptive data—narrative developed about the context so that

judgments about the degree of fit or similarity may be made by others
who may wish to apply all or part of the findings elsewhere (although it
is by no means clear how “thick” a thick description needs to be, as
Hamilton, personal communication, 1984, has pointed out).

For dependability and confirmability:

• An external audit requiring both the establishment of an audit trail and

the carrying out of an audit by a competent external, disinterested auditor
(the process is described in detail in Lincoln and Guba, 1985a). That part
of the audit that examines the process results in a dependability judgment,
while that part concerned with the product (data and reconstructions)
results in a confirmability judgment.

While much remains to be learned about the feasibility and utility

of these parallel criteria, there can be little doubt that they represent a
substantial advance in thinking about the rigor issue. Nevertheless, there
are some major difficulties with them that call out for their augmentation
with new criteria rooted in naturalism rather than simply paralleling those
rooted in positivism.
First, the parallel criteria cannot be thought of as a complete set because
they deal only with issues that loom important from a positivist construction.
The positivist paradigm ignores or fails to take into account precisely those
problems that have most plagued evaluation practice since the mid 1960s:
multiple value structures, social pluralism, conflict rather than consensus,
accountability demands, and the like. Indeed, the conventional criteria refer
only to methodology and ignore the influence of context. They are able to do
so because by definition conventional inquiry is objective and value-free.
Second, intuitively one suspects that if the positivist paradigm did
not exist, other criteria might nevertheless be generated directly from
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
20 ENDURING ISSUES IN EVALUATION

naturalist assumptions. The philosophical and technical problem might be

phrased thus: Given a relativist ontology and an interactive, value-bounded
epistemology, what might be the nature of the criteria that ought to charac-
terize a naturalistic inquiry? If we reserve the term rigor to refer to posi-
tivism’s criteria and the term reliability to refer to naturalism’s parallel
criteria, we propose the term authenticity to refer to these new, embedded,
intrinsic naturalistic criteria.
Unique Criteria of Authenticity. We must at once disclaim having
solved this problem. What follows are simply some strong suggestions that
appear to be worth following up at this time. One of us (Guba, 1981)
referred to the earlier attempt to devise reliability criteria as “primitive”; the
present attempt is perhaps even more aboriginal. Neither have we as yet
been able to generate distinct techniques to test a given study for adherence
to these criteria. The reader should therefore regard our discussion as spec-
ulative and, we hope, heuristic. We have been able to develop our ideas
of the first criterion, fairness, in more detail than the other four; its longer
discussion ought not to be understood as meaning, however, that fairness
is very much more important than the others.
Fairness. If inquiry is value-bound, and if evaluators confront a situa-
tion of value-pluralism, it must be the case that different constructions will
emerge from persons and groups with differing value systems. The task
of the evaluation team is to expose and explicate these several, possibly
conflicting, constructions and value structures (and of course, the evalua-
tors themselves operate from some value framework).
Given all these differing constructions, and the conflicts that will almost
certainly be generated from them by virtue of their being rooted in value
differences, what can an evaluator do to ensure that they are presented,
clarified, and honored in a balanced, even-handed way, a way that the several
parties would agree is balanced and even-handed? How do evaluators go
about their tasks in such a way that can, while not guaranteeing balance
(since nothing can), at least enhance the probability that balance will be well
approximated?
If every evaluation or inquiry serves some social agenda (and it invari-
ably does), how can one conduct an evaluation to avoid, at least probabilis-
tically, the possibility that certain values will be diminished (and their holders
exploited) while others will be enhanced (and their holders advantaged)?
The problem is that of trying to avoid empowering at the expense of impov-
erishing; all stakeholders should be empowered in some fashion at the
conclusion of an evaluation, and all ideologies should have an equal chance
of expression in the process of negotiating recommendations.
Fairness may be defined as a balanced view that presents all construc-
tions and the values that undergird them. Achieving fairness may be accom-
plished by means of a two-part process. The first step in the provision of
fairness or justice is the ascertaining and presentation of different value and
belief systems represented by conflict over issues. Determination of the
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
TRUSTWORTHINESS AND AUTHENTICITY IN NATURALISTIC EVALUATION 21

actual belief system that undergirds a position on any given issue is not
always an easy task, but exploration of values when clear conflict is evident
should be part of the data-gathering and data-analysis processes (especially
during, for instance, the content analysis of individual interviews).
The second step in achieving the fairness criterion is the negotiation of
recommendations and subsequent action, carried out with stakeholding
groups or their representatives at the conclusion of the data-gathering,
analysis, and interpretation stages of evaluation effort. These three stages
are in any event simultaneous and interactive within the naturalistic para-
digm. Negotiation has as its basis constant collaboration in the evaluative
effort by all stakeholders; this involvement is continuous, fully informed
(in the consensual sense), and operates between true peers. The agenda for
this negotiation (the logical and inescapable conclusion of a true collabo-
rative evaluation process), having been determined and bounded by all
stakeholding groups, must be deliberated and resolved according to rules of
fairness. Among the rules that can be specified, the following seem to be the
absolute minimum.

1. A negotiation must have the following characteristics:

a. It must be open, that is, carried out in full view of the parties or their
representatives with no closed sessions, secret codicils, or the like
permitted.
b. It must be carried out by equally skilled bargainers. In the real world
it will almost always be the case that one or another group of bar-
gainers will be the more skillful, but at least each side must have
access to bargainers of equal skill, whether they choose to use them
or not. In some instances, the evaluator may have to act not only as
mediator but as educator of those less skilled bargaining parties, offer-
ing additional advice and counsel that enhances their understanding
of broader issues in the process of negotiation. We are aware that this
comes close to an advocacy role, but we have already presumed that
one task of the evaluator is to empower previously impoverished
bargainers; this role should probably not cease at the negotiation stage
of the evaluation.
c. It must be carried out from equal positions of power. The power must
be equal not only in principle but also in practice; the power to sue a
large corporation in principle is very different from the power to sue it
in practice, given the great disparity of resources, risk, and other factors,
including, of course, more skillful and resource-heavy bargainers.
d. It must be carried out under circumstances that allow all sides to pos-
sess equally complete information. There is no such animal, of course,
as “complete information,” but each side should have the same infor-
mation, together with assistance as needed to be able to come to an
equal understanding of it. Low levels of understanding are tantamount
to lack of information.
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
22 ENDURING ISSUES IN EVALUATION

e. It must focus on all matters known to be relevant.

f. It must be carried out in accordance with rules that were themselves
the product of a pre-negotiation.
2. Fairness requires the availability of appellate mechanisms should one or
another party believe that the rules are not being observed by some.
These mechanisms are another of the products of the pre-negotiation
process.
3. Fairness requires fully informed consent with respect to any evaluation
procedures (see Lincoln and Guba, 1985a, and Lincoln and Guba, 1985b).
This consent is obtained not only prior to an evaluation effort but is
continually renegotiated and reaffirmed (formally with consent forms
and informally through the establishment and maintenance of trust and
integrity between parties to the evaluation) as the design unfolds, new
data are found, new constructions are made, and new contingencies are
faced by all parties.
4. Finally, fairness requires the constant use of the member-check process,
defined earlier, which includes calls for comments on fairness, and which
is utilized both during and after the inquiry process itself (in the data
collection-analysis-construction stage and later when case studies are
being developed). Vigilant and assiduous use of member-checking should
build confidence in individuals and groups and should lead to a perva-
sive judgment about the extent to which fairness exists.

Fairness as a criterion of adequacy for naturalistic evaluation is less

ambiguous than the following four, and more is known about how to
achieve it. It is not that this criterion is more easily achieved, merely that it
has received more attention from a number of scholars (House, 1976;
Lehne, 1978; Strike, 1982, see also Guba and Lincoln, 1985).
Ontological Authentication. If each person’s reality is constructed and
reconstructed as that person gains experience, interacts with others,
and deals with the consequences of various personal actions and beliefs, an
appropriate criterion to apply is that of improvement in the individual’s (and
group’s) conscious experiencing of the world. What have sometimes been
termed false consciousness (a neo-Marxian term) and divided consciousness are
part and parcel of this concept. The aim of some forms of disciplined inquiry,
including evaluation (Lincoln and Guba, 1985b) ought to be to raise
consciousness, or to unite divided consciousness, likely via some dialectical
process, so that a person or persons (not to exclude the evaluator) can
achieve a more sophisticated and enriched construction. In some instances,
this aim will entail the realization (the “making real”) of contextual shaping
that has had the effect of political, cultural, or social impoverishment; in
others, it will simply mean the increased appreciation of some set of com-
plexities previously not appreciated at all, or appreciated only poorly.
Educative Authentication. It is not enough that the actors in some contexts
achieve, individually, more sophisticated or mature constructions, or those
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
TRUSTWORTHINESS AND AUTHENTICITY IN NATURALISTIC EVALUATION 23

that are more ontologically authentic. It is also essential that they come to
appreciate (apprehend, discern, understand)—not necessarily like or agree
with—the constructions that are made by others and to understand how those
constructions are rooted in the different value systems of those others. In this
process, it is not inconceivable that accommodations, whether political, strate-
gic, value-based, or even just pragmatic, can be forged. But whether or not that
happens is not at issue here; what the criterion of educative validity implies
is increased understanding of (including possibly a sharing, or sympathy
with) the whats and whys of various expressed constructions. Each stakeholder
in the situation should have the opportunity to become educated about others
of different persuasions (values and constructions), and hence to appreciate
how different opinions, judgments, and actions are evoked. And among those
stakeholders will be the evaluator, not only in the sense that he or she will
emerge with “findings,” recommendations, and an agenda for negotiation that
are professionally interesting and fair but also that he or she will develop a
more sophisticated and complex construction (an emic-etic blending) of both
personal and professional (disciplinary-substantive) kinds.
How one knows whether or not educative authenticity has been reached
by stakeholders is unclear. Indeed, in large-scale, multisite evaluations, it may
not be possible for all—or even for more than a few—stakeholders to achieve
more sophisticated constructions. But the techniques for ensuring that stake-
holders do so even in small-scale evaluations are as yet undeveloped. At a
minimum, however, the evaluator’s responsibility ought to extend to ensur-
ing that those persons who have been identified during the course of
the evaluation as gatekeepers to various constituencies and stakeholding
audiences ought to have the opportunity to be “educated” in the variety of
perspectives and value systems that exist in a given context.
By virtue of the gatekeeping roles that they already occupy, gatekeep-
ers have influence and access to members of stakeholding audiences.
As such, they can act to increase the sophistication of their respective
constituencies. The evaluator ought at least to make certain that those from
whom he or she originally sought entrance are offered the chance to
enhance their own understandings of the groups they represent. Various
avenues for reporting (slide shows, filmstrips, oral narratives, and the like)
should be explored for their profitability in increasing the consciousness
of stakeholders, but at a minimum the stakeholders’ representatives and
gatekeepers should be involved in the educative process.
Catalytic Authentication. Reaching new constructions, achieving under-
standings that are enriching, and achieving fairness are still not enough.
Inquiry, and evaluations in particular, must also facilitate and stimulate
action. This form of authentication is sometimes known as feedback-action
validity. It is a criterion that might be applied to conventional inquiries and
evaluations as well; although if it were virtually all positivist social action,
inquiries and evaluations would fail on it. The call for getting “theory into
action”; the preoccupation in recent decades with “dissemination” at the
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
24 ENDURING ISSUES IN EVALUATION

national level; the creation and maintenance of federal laboratories, centers,

and dissemination networks; the non-utilization of evaluations; the notable
inaction subsequent to evaluations that is virtually a national scandal—all
indicate that catalytic authentication has been singularly lacking. The nat-
uralistic posture that involves all stakeholders from the start, that honors
their inputs, that provides them with decision-making power in guiding the
evaluation, that attempts to empower the powerless and give voice to
the speechless, and that results in a collaborative effort holds more promise
for eliminating such hoary distinctions as basic versus applied and theory
versus practice.
Tactical Authenticity. Stimulation to action via catalytic authentication
is in itself no assurance that the action taken will be effective, that is, will
result in a desired change (or any change at all). The evaluation of inquiry
requires other attributes to serve this latter goal. Chief among these is the
matter of whether the evaluation is empowering or impoverishing, and to
whom. The first step toward empowerment is taken by providing all persons
at risk or with something at stake in the evaluation with the opportunity
to control it as well (to move toward creating collaborative negotiation).
It provides practice in the use of that power through the negotiation of
construction, which is joint emic-etic elaboration. It goes without saying that
if respondents are seen simply as “subjects” who must be “manipulated,”
channeled through “treatments,” or even deceived in the interest of some
higher “good” or “objective” truth, an evaluation or inquiry cannot possibly
have tactical authenticity. Such a posture could only be justified from the
bedrock of a realist ontology and an “objective,” value-free epistemology.

Summary
All five of these authenticity criteria clearly require more detailed explica-
tion. Strategies or techniques for meeting and ensuring them largely remain
to be devised. Nevertheless, they represent an attempt to meet a number of
criticisms and problems associated with evaluation in general and natural-
istic evaluation in particular. First, they address issues that have pervaded
evaluation for two decades. As attempts to meet these enduring problems,
they appear to be as useful as anything that has heretofore been suggested
(in any formal or public sense).
Second, they are responsive to the demand that naturalistic inquiry
or evaluation not rely simply on parallel technical criteria for ensuring
reliability. While the set of additional authenticity criteria might not be the
complete set, it does represent what might grow from naturalistic inquiry
were one to ignore (or pretend not to know about) criteria based on the
conventional paradigm. In that sense, authenticity criteria are part of an
inductive, grounded, and creative process that springs from immersion with
naturalistic ontology, epistemology, and methodology (and the concomitant
attempts to put those axioms and procedures into practice).
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev
TRUSTWORTHINESS AND AUTHENTICITY IN NATURALISTIC EVALUATION 25

Third, and finally, the criteria are suggestive of the ways in which new
criteria might be developed; that is, they are addressed largely to ethical and
ideological problems, problems that increasingly concern those involved in
social action and in the schooling process. In that sense, they are confluent
with an increasing awareness of the ideology-boundedness of public life and
the enculturation processes that serve to empower some social groups
and classes and to impoverish others. Thus, while at first appearing to be
radical, they are nevertheless becoming mainstream. An invitation to join
the fray is most cheerfully extended to all comers.

References
Guba, E. G. “Criteria for Assessing the Trustworthiness of Naturalistic Inquiries.”
Educational Communication and Technology Journal, 1981, 29, 75–91.
Guba, E. G., and Lincoln, Y. S. “Do Inquiry Paradigms Imply Inquiry Methodologies?” In
D. L. Fetterman (Ed.), The Silent Scientific Revolution. Beverly Hills, Calif.: Sage, in press.
Guba, E. G., and Lincoln, Y. S. Effective Evaluation: Improving the Usefulness of Evaluation
Results Through Responsive and Naturalistic Approaches. San Francisco: Jossey-Bass, 1981.
Guba, E. G., and Lincoln, Y. S. “The Countenances of Fourth Generation Evaluation:
Description, Judgment, and Negotiation.” Paper presented at Evaluation Network
annual meeting, Toronto, Canada, 1985.
House, E. R. “Justice in Evaluation.” In G. V. Glass (Ed.), Evaluation Studies Review
Annual, no. 1. Beverly Hills, Calif.: Sage, 1976.
Lehne, R. The Quest for Justice: The Politics of School Finance Reform. New York:
Longman, 1978.
Lincoln, Y. S., and Guba, E. G. Naturalistic Inquiry. Beverly Hills, Calif.: Sage, 1985a.
Lincoln, Y. S., and Guba, E. G. “Ethics and Naturalistic Inquiry.” Unpublished manu-
script, University of Kansas, 1985b.
Morgan, G. Beyond Method: Strategies for Social Research. Beverly Hills, Calif.: Sage, 1983.
Strike, K. Educational Policy and the Just Society. Champaign: University of Illinois
Press, 1982.

At the time of publication YVONNA S. LINCOLN was an associate professor of

higher education in the Educational Policy and Administration Department,
School of Education, the University of Kansas. Egon G. Guba was a professor
of educational inquiry methodology in the Department of Counseling and
Educational Psychology, School of Education, Indiana University. They have
jointly authored two books, Effective Evaluation and Naturalistic Inquiry,
which sketch the assumptional basis for naturalistic inquiry and its application
to the evaluation arena. They have also collaborated with others on a third book,
Organizational Theory and Inquiry, Sage, 1985.
NEW DIRECTIONS FOR EVALUATION • DOI: 10.1002/ev