Realist Synthesis: An Introduction: Ray Pawson Trisha Greenhalgh Gill Harvey Kieran Walshe
Realist Synthesis: An Introduction: Ray Pawson Trisha Greenhalgh Gill Harvey Kieran Walshe
Realist Synthesis: An Introduction: Ray Pawson Trisha Greenhalgh Gill Harvey Kieran Walshe
an introduction
Ray Pawson
Trisha Greenhalgh
Gill Harvey
Kieran Walshe
Authors:
Ray Pawson
Reader in Social Research Methodology
University of Leeds
Trisha Greenhalgh
Professor of Primary Health Care
University College London
Gill Harvey
Senior Lecturer in Healthcare and Public Sector Management
Manchester Centre for Healthcare Management
Kieran Walshe
Professor of Health Policy and Management
Manchester Centre for Healthcare Management
Contact details:
Ray Pawson.
Tel: 0113 2334419
e-mail: r.d.pawson@leeds.ac.uk
ESRCrealsynWP i
Realist synthesis: an introduction
Acknowledgements
This working paper is an introduction to a new method of conducting systematic reviews of
the evidence base – namely, ‘realist synthesis’. The ideas herein have been bubbling away
for a number of years and will be presented in other forms as the brew matures. The aim in
this instance is to pull together in a detailed, accessible and comprehensive form an account
of the principles and practices that underlie the approach.
The authors would like to acknowledge the support of a variety of funders and funding
streams that helped the paper along to the present formulation. Pawson developed the basic
model as part of his fellowship on the ESRC Research Methods Programme, though the
ideas hark back to work he conducted at the ESRC UK Centre for Evidence Based Policy
and Practice. In this paper he joins forces with Greenhalgh, Harvey and Walshe, who
themselves have laboured long in the field of evidence-based healthcare and who provide
the substantive focus to this paper. This collaboration has been possible thanks to further
funding from the NHS Service Delivery Organisation and the Canadian Health Services
Research Foundation.
The paper presents an introductory overview of realist synthesis as applied to the review of
primary research on healthcare systems, and so responds to the requirements of these
assorted commissions. We hope, nevertheless, that the examples will be recognisable
enough to researchers trying to get to grips with the literature in such field as social care,
welfare, education, environment, urban regeneration and criminal justice. Above all, we
believe that the methodological lessons are generic.
ESRCrealsynWP ii
Realist synthesis: an introduction
Rallying cries of this ilk have reverberated around the world, across government
departments, down through the corridors of power and onto the desks of managers and
researchers. Nowhere is the challenge of evidence-based policy as vexatious as when it is
called upon to inform and support the delivery of modern health services.
The problem is one of complexity. The health interventions in question are not singular
schemes or finite treatments but concern the design, implementation, management and
regulation of entire services. These services have a multiplicity of goals, many of them
relating to the fulfilment of long-term ambitions. By the same token, the evidence base for
health service decision making is also gargantuan. In getting to grips with so many activities
of so many actors, the seeker of evidence has to call on the entire repertoire of social
science and health services research. A review may thus involve a dissection of
experimental and quasi-experimental trials, process and developmental evaluations,
ethnographic and action research, documentary and content analysis, surveys and opinion
polls. Even this formidable list overlooks the pearls of wisdom to be found in the grey
literature, including administrative records, annual reports, legislative materials, conceptual
critique, personal testimony and so on.
This paper offers a new model of research synthesis that is compatible with the complexities
of modern health service delivery and sympathetic to the usage of a multi-method, multi-
disciplinary evidence base. It is based on the emerging ‘realist’ approach to evaluative
research. It cuts through complexity by focusing on the ‘theories’ that underlie social
interventions. Health service reforms are theories in the sense that they begin in the heads
of policy makers, pass into the hands of practitioners and managers and, sometimes, into
the hearts and minds of users and participants. Realist synthesis, understood at its simplest
level, is the process of gathering together existing evidence on the success (or otherwise) of
this journey.
Complexity is acknowledged throughout in the task of scouring the evidence base. The
success of an intervention theory is not simply a question of the merit of its underlying ideas
but depends, of course, on the individuals, interpersonal relationships, institutions and
infrastructures through which and in which the intervention is delivered. The hard slog of
realist synthesis is about building up a picture of how various combinations of such contexts
and circumstance can amplify or mute the fidelity of the intervention theory.
With its insistence that context is critical and that agents interact with and adapt to policies
and interventions, realist synthesis is sensitive to diversity and change in programme
delivery and development. Its fundamental purpose is to improve the thinking that goes into
service building. And in doing so, it provides a principled steer away from issuing misleading
‘pass/fail’ verdicts of entire families of interventions and away from failed ‘one-size-fits-all’
ways of responding to problems.
1
Cabinet Office (1999) Modernising government. London: Stationery Office.
ESRCrealsynWP iii
Realist synthesis: an introduction
It is worth spelling out what we mean by complex social interventions, and why reviews of
their effectives require a different approach. In the main body of the paper, we provide a
detailed worked example of one such intervention – the public disclosure of information
about the performance of healthcare professionals or organisations (‘league tables’). Seven
key characteristics should be considered:
• The intervention is a theory or theories – when performance league tables and the like
are published there is an implicit (and rarely stated) rationale about how they will affect
people and organisations (and hence how they will bring about change).
• The intervention involves the actions of people – so understanding human intentions
and motivations, what stakeholders know and how they reason, is essential to
understanding the intervention.
• The intervention consists of a chain of steps or processes – in our example, the
development of indicators, their publication and dissemination, the creation of sanctions
or incentives, and the response of those being measured. At each stage, the intervention
could work as expected or ‘misfire’ and behave differently.
• These chains of steps or processes are often not linear, and involve negotiation and
feedback at each stage. For example, healthcare organisations and professionals may
have to provide the data for performance measurement, and securing their cooperation
may involve a number of tradeoffs and distorting influences.
• Interventions are embedded in social systems and how they work is shaped by this
context. For example, publishing performance data for cardiac surgeons and for
psychiatrists may produce very different behaviours because of the different nature and
context of those services and specialties.
• Interventions are prone to modification as they are implemented. To attempt to ‘freeze’
the intervention and keep it constant would miss the point, that this process of adaptation
and local embedding is an inherent and necessary characteristic. It means that different
applications of the ‘same’ intervention (such as publishing performance league tables),
will often be different in material ways.
• Interventions are open systems and change through learning as stakeholders come
to understand them. For example, once performance measures are put in place and
published, those being measured soon learn to ‘game’ or optimise the way they score,
and the developers of the measures have to respond by changing the system to prevent
such gaming distorting the process and intended effects of measurement.
In short, social interventions are complex systems thrust amidst complex systems. Attempts
to measure ‘whether they work’ using the conventional armoury of the systematic reviewer
will always end up with the homogenised answer ‘to some extent’ and ‘sometimes’, but this
is of little use to policy makers or practitioners because it provides no clue as to why the
interventions sometimes work and sometimes don’t, or in what circumstances or conditions
they are more or less likely to work, or what can be done to maximise their chances of
success and minimise the risk of failure.
Realist review is part of a wider family of ‘theory driven’ approaches to evaluation. The core
principle is that we should make explicit the underlying assumptions about how an
intervention is supposed to work (this is what we call the ‘programme theory’), and should
then go about gathering evidence in a systematic way to test and refine this theory. Rather
than seeking generalisable lessons or universal truths, it recognises and directly addresses
ESRCrealsynWP iv
Realist synthesis: an introduction
the fact that the ‘same’ intervention never gets implemented identically and never has the
same impact, because of differences in the context, setting, process, stakeholders and
outcomes. Instead, the aim of realist review is explanatory – ‘what works for whom, in what
circumstances, in what respects, and how?’.
The main steps in a realist review are summarised in figure A on the next page, which draws
a contrast between the ‘conventional’ approach (on the left) and the ‘realist’ (on the right).
Four essential characteristics of this approach to review should be highlighted:
• The initial stage in which the scope of the review is defined involves a negotiation
with the commissioners or decision makers intended to ‘unpick’ their reasons for needing
the review and understand how it will be used. It also involves a careful dissection of the
theoretical underpinnings of the intervention, using the literature in the first instance not
to examine the empirical evidence but to map out in broad terms the conceptual and
theoretical territory.
• The subsequent search for and appraisal of evidence is then undertaken to ‘populate’
this theoretical framework with empirical findings, using the theoretical framework as the
construct for locating, integrating, comparing and contrasting empirical evidence. The
search for evidence is a purposive one, and its progress is shaped by what is found.
When theoretical saturation in one area is reached, and no significant new findings are
emerging, searching can stop. It would not be desirable or feasible to attempt to build a
‘census’ of all the evidence that might be relevant.
• The process is, within each stage and between stages, iterative. There is a constant
to-ing and fro-ing as new evidence both changes the direction and focus of searching
and opens up new areas of theory.
• The results of the review and synthesis combine both theoretical thinking and
empirical evidence, and are focused on explaining how the intervention being studied
works in ways that enable decision makers to use this understanding and apply it to their
own particular contexts. The commissioners or decision makers are closely involved in
shaping the conclusions and recommendations to be drawn from the review.
When a realist review is undertaken, the high degree of engagement it involves for policy
makers and decision makers should make the communication of its key findings and
conclusions easier. But the aim is not an instrumental one, that the review should lead to an
immediate change in a given programme. That happens sometimes, but a realist review is
more likely to contribute to policy makers’ and practitioners’ ‘sense-making’ – the way they
understand and interpret the situations they encounter and the interventions they deploy.
The aim is therefore to bring about a longer term and more sustained shift in their thinking, in
which research results play their part alongside other legitimate influences like ideologies
and social values.
ESRCrealsynWP v
Realist synthesis: an introduction
Define the scope Identify the • What is the nature and content of the intervention?
of the review question • What are the circumstances or context for its use?
• What are the policy intentions or objectives?
• What are the nature and form of its outcomes or impacts?
• Undertake exploratory searches to inform discussion with
review commissioners/decision makers
Clarify the • Theory integrity – does the intervention work as predicted?
purpose(s) of • Theory adjudication – which theories about the intervention
the review seem to fit best?
• Comparison – how does the intervention work in different
settings, for different groups?
• Reality testing – how does the policy intent of the intervention
translate into practice?
Find and • Search for relevant theories in the literature
articulate the • Draw up ‘long list’ of programme theories
programme • Group, categorise or synthesise theories
theories • Design a theoretically based evaluative framework to be
‘populated’ with evidence
Search for and Search for the • Decide and define purposive sampling strategy
appraise the evidence • Define search sources, terms and methods to be
evidence used(including cited reference searching)
• Set the thresholds for stopping searching at saturation
Appraise the • Test relevance – does the research address the theory under
evidence test?
• Test rigour – does the research support the conclusions
drawn from it by the researchers or the reviewers?
Extract and Extract the • Develop data extraction forms or templates
synthesise results • Extract data to populate the evaluative framework with
findings evidence
Synthesise • Compare and contrast findings from different studies
findings • Use findings from studies to address purpose(s) of review
• Seek both confirmatory and contradictory findings
• Refine programme theories in the light of evidence
Draw conclusions • Involve commissioners/decision makers in review of findings
and make • Draft and test out recommendations and conclusions based
recommendations on findings with key stakeholders
• Disseminate review with findings, conclusions and
recommendations
ESRCrealsynWP vi
Realist Synthesis
INTRODUCTION....................................................................................................................1
Introduction
Realist review is a relatively new strategy for synthesising research, which has an
explanatory rather than judgemental focus. Specifically, it seeks to ‘unpack the mechanism’
of how complex programmes work (or why they fail) in particular contexts and settings.
Realism has roots in philosophy, the social sciences, and evaluation, but is as yet largely
untried as an approach to the synthesis of evidence in healthcare and other policy arenas in
which programmes are delivered through an intricate institutional apparatus. We believe that
it fills an important methodological need, long identified by heath service decision makers,
for a synthesis method that can cope effectively with management and service delivery
interventions. Compared to clinical treatments, which are conceptually simple and have been
evaluated in randomised controlled trials, the literature on service interventions is
epistemologically complex and methodologically diverse. As such, it presents additional
challenges for the reviewer. The time is long overdue for developing distinct ways of drawing
the evidence together.
Throughout this paper, we have tried to support our arguments with reference to real policy
issues that raise practical questions for the reviewer. Because realist review is especially
appropriate for multi-component, multi-site, multi-agent interventions, we have deliberately
used complex examples. In particular, we present a detailed ‘work-up’ of a review on the
public disclosure of performance data (Marshall et al, 2000). This example forms a thread
through which the different aspects of the method can be illustrated and (hopefully)
deciphered. It is important to make clear that Marshall and colleagues did not operate from
the realist fold, though they made an important step towards it in unearthing and analysing
the evidence in respect to the theories that underpinned the introduction of hospital league
tables. Our example is, indubitably, a reworking of the original.
New research methods are never invented ab ovo; they never proceed from scratch. Rather,
they codify and formalise methods that are already being used, if somewhat instinctively and
pragmatically. They only have meaning and authority if they carry a sense of recognition in
the minds of those who have wrestled with the everyday practicalities of research. In some
respects, realist review is a way of adding rigour and structure to what has been called the
‘old fashioned narrative review’ which, if approached in a scholarly fashion, was able to
present highly detailed and reasoned arguments about the mechanisms of programme
success or failure and about the apparently conflicting results of ‘similar’ studies. It is for this
reason we are pleased to be able to make use of the Marshall review. Readers interested in
a synthesis conducted squarely and avowedly in realist mode might like to consult Pawson’s
(2004) review of mentoring programmes, which was also carried out under the auspices of
the Research Methods Programme.
ESRCrealsynWP 1
Realist synthesis: an introduction
It is the pathway leading from the application of realism to evaluation (and the ideas of the
last group of authors) that we will pursue here. But its wealth of applications in the range of
disciplines listed above reinforces the point that realism is not a research technique as such.
Rather, it is a logic of inquiry that generates distinctive research strategies and designs, and
then utilises available research methods and techniques within these.
The quest to understanding ‘what works?’ in social interventions is, at root, a matter of trying
to establish causal relationships, and the hallmark of realist inquiry is its distinctive
‘generative’ understanding of causality. This is most easily explained by drawing a contrast
with the ‘successionist’ model, which underpins clinical trials. On the latter account what is
needed to infer causation is the ‘constant conjunction’ of events: when the cause X is
switched on (experiment) effect Y follows, and when the cause is absent (control) no effect is
observed. The generative model calls for a more complex and systemic understanding of
connectivity. It says that to infer a causal outcome (O) between two events (X and Y) one
needs to understand the underlying generative mechanism (M) that connects them and the
context (C) in which the relationship occurs.
To use a physical science example, researchers would not claim that repeated observations
of the application of a spark (X) to gunpowder and the subsequent explosions (Y) was a
sufficient base on which to understand the causal relationship. Rather the connection (O) is
established by what they know about the chemical composition of gunpowder and its
instability when heat is applied (M). They also know that this mechanism is not always fired
and that the explosion depends on other contextual features (C) such as the presence of
oxygen and the absence of dampness.
ESRCrealsynWP 2
Realist synthesis: an introduction
This explanatory formula has been used prospectively (in formative evaluations) and
concurrently (in summative evaluations) and this paper shows how it may be operated
retrospectively (in research synthesis). The realist approach, moreover, has no particular
preference for either quantitative or qualitative methods. Indeed it sees merit in multiple
methods, marrying the quantitative and qualitative, so that both the processes and impacts
of interventions may be investigated. The precise balance of methods to be used is selected
in accordance with the realist hypothesis being tested, and with the available data. A handy,
downloadable overview of the different styles of realist evaluation may be found in Pawson
and Tilley (2004, appendix A). When we come to exploring the details of realist synthesis we
shall see that this same preference for multi-method inquiry is retained.
Realist evaluation is often, and quite properly, associated with the ‘theory-driven’ family of
evaluation methodologies (Chen and Rossi, 1992; Bickman, 1987; Connell et al, 1995;
Weiss, 1997; Rogers et al, 2000). The core principle of the theory-driven approach is to
make explicit the underlying assumptions about how an intervention is supposed to work –
that is, to search out a ‘programme theory’ or mechanism-of-action – and then to use this
theory to guide evaluation.
It is perhaps surprising that reviewers rarely ask themselves about the mechanism by which
they expect the learning about interventions to accumulate. The reviewer’s task is to bring
together the findings from many different inquiries, but what is the line of development? In
what respects does knowledge grow? Equally one could confront the policy maker with a
similar question: What are you expecting to get from a review? What is the nature of the
guidance that you anticipate?
There are in fact quite different ways of contemplating and answering such basic questions.
The most traditional reply from the policy maker might well be ‘A review should tell me the
interventions that work best.’ This pragmatic objective is reflected in some forms of meta-
analysis, which perceive that transferable knowledge is achieved through ‘heterogeneous
replication’ (Shadish et al, 1991, pp363-365). According to this principle, the review should
seek out enduring empirical generalisations so as to discover (with a view to replicating)
those interventions likely to have lasting effect across many different applications and
populations.
An alternative response by the policy maker might be ‘A review should find me a list of
generalisable principles of any effective programme of this kind, and I will then try to design
them into the initiative I am planning’. Transferable knowledge is thus achieved through what
is known as ‘proximal similarity’ (Shadish et al, 1991, pp363-365). This approach
acknowledges much more variability in the implementation of programmes and services.
Accordingly, the goal of the review is to produce a sort of recipe, a list of the vital ingredients
that appear to be needed for an intervention to be successful.
Realist review upholds neither of these goals. The reason for avoiding ‘best buys’ and
‘exemplary cases’ will become clearer in Section 2. Briefly, when it comes to the delivery of
complex programmes and services, the ‘same’ intervention never gets implemented in an
identical manner and even if it did, the particular recipe for success gained in one setting
might not be transferable to a different social and institutional setting. Partly as a reaction to
the unedifying search for policy panaceas, the ultimate realist goal is always explanatory.
Realist evaluation asks of a programme, ‘What works for whom in what circumstances, in
what respects and how?’ Realist review carries exactly the same objective, namely
programme theory refinement. What the policy maker should expect is knowledge of some
of the many choices to be made in delivering a particular service and some insight into why
they have succeeded and/or failed in previous incarnations. Captured as a pithy policy
ESRCrealsynWP 3
Realist synthesis: an introduction
maker’s demand, the task might be expressed so: ‘Show me the options and explain the
main considerations I should take into account in choosing between them’.
Methods of systematic review and meta-analysis are much better developed for pooling
research results originating from the ‘clinical treatment’ end of this spectrum, and there is
grave danger in assuming that research strategies developed for such syntheses will have
utility elsewhere. We pursue this critique no further here, though the reader might usefully be
referred to Pawson (2002a). The key task is to match review method to subject matter, and
the purpose of the remainder of this section is to capture some of the essential features of
non-clinical, service delivery interventions.
Let us begin with a rather perky example of an intervention hypothesis. Some health
education theories blame the unhealthy lifestyles of adolescents on the influence of
unhealthy role models created by film, soap and rock stars. This has led to the programme
theory of trying to insinuate equally attractive but healthier role models (e.g. sports stars) into
prominent places in the teen media. Such a conjecture, known amongst denizens of health
education as ‘Dishy David Beckham theory’, runs risks in both diagnosis and remedy.
Teenagers are indeed happy to pore over pictures of Beckham and friends, but the evidence
to date suggests that no associated change towards a healthier lifestyle occurs (Mitchell,
1997).
This example illustrates the first principle of realist review. Broadly speaking, we should
expect reviews to pick up, track and evaluate the programme theories that implicitly or
explicitly underlie families of interventions.
ESRCrealsynWP 4
Realist synthesis: an introduction
And so it is with the vast majority of programme incentives, management strategies, service
delivery changes, and so on. The fact that policy is delivered through active interventions to
active participants has profound implications for research method. In clinical trials, human
volition is seen as a contaminant. The experimental propositions under test relate to whether
the treatment (and the treatment alone) is effective. As well as random allocation of
participants, safeguards such as the use of ‘placebos’ and ‘double blinding’ are utilised to
protect this causal inference. The idea is to remove any shred of human intentionality from
the investigation. Active programmes, by contrast, only work though the stakeholders’
reasoning, and knowledge of that reasoning is integral to understanding its outcomes.
This feature illustrates the second principle of research synthesis. Broadly speaking, we
should expect that in tracking the successes and failures of interventions, reviewers will find
at least part of the explanation in terms of the reasoning and personal choices of different
actors and participants.
Let us introduce our main example: the policy of public disclosure of information on
performance (hospital star ratings, surgeon report cards, and so on). There are several
distinct stages and stakeholders to work through for such an intervention to take effect. The
first stage is ‘problem identification’, in which the performance in question is measured,
rated, and ranked. The second is ‘public disclosure’ in which information on differential
performance is disclosed, published, and disseminated. The third is ‘sanction instigation’ in
which the broader community acts to boycott, censure, reproach or control the under-
performing party. The fourth might be called ‘miscreant response’ in which failing parties are
shamed, chastised, made contrite, and so improve performance in order to be reintegrated.
The key point is that the different theories underlying this series of events are all fallible. The
intended sequence above may misfire at any point, leading to unintended outcomes as
depicted in Figure 1. The initial performance measure may amount to ‘problem
misidentification’ if it is, for instance, not properly risk adjusted. Dissemination may amount
to ‘dissimulation’ if the data presented to the public are oversimplified or exaggerated. Wider
ESRCrealsynWP 5
Realist synthesis: an introduction
public reactions may take the form of ‘apathy’ or ‘panic’ rather than reproach. And rather
than being shamed into pulling up their socks, named individuals or institutions may attempt
to resist, reject, ignore or actively discredit the official labelling.
This illustrates the third principle of realist review. Broadly speaking, we should expect
reviews to inspect the integrity of the implementation chain, examining which intermediate
outputs need to be in place for successful outcomes to occur, and noting and examining the
flows and blockages and points of contention.
There are several modes whereby a top-down intervention becomes, in some respects,
bottom-up. The most obvious is the negotiation between stakeholders at every transaction
within a scheme. If we return to the hospital rating example, we see a struggle between
professional associations and management authorities about the fairness of the indicators
(on the need for risk-adjusted and value-added indicators, etc). The actual intervention takes
shape according to the punching power of the respective parties. We depict this in Figure 2
by applying dotted, double heads to some of the arrows in a typical implementation chain.
ESRCrealsynWP 6
Realist synthesis: an introduction
This illustrates the fourth principle of realist review. Broadly speaking, we should expect the
review to examine how the relative influence of different parties is able to affect and direct
implementation.
Take, for example, school-based sex education for teenagers, which a policy maker may be
thinking of introducing with the goal of reducing unwanted teenage pregnancy and sexually
transmitted diseases. Any proposed scheme will consist of a theory about how the
intervention is assumed to work – for example, that education provides knowledge about
specific risks and strategies for reducing them – which in turn changes both personal
motivation and risk-taking behaviour, which in turn reduces adverse outcomes. The theory is
presented to stakeholders as a set of new resources: for example, a policy statement
providing the underpinning values and mission; a defined list of knowledge objectives and
skills-based activities; a training programme for the staff intended to deliver this package,
perhaps provided by local public health experts; a reallocation of curriculum time to
accommodate the initiative; and plans for evaluation and audit.
Of course, the theory about how school-based sex education will reduce adverse outcomes
may be fundamentally flawed at a number of the above stages, as we demonstrated in
Section 1.23 in relation to the example of public disclosure of performance data. But even if
it were not, whether the new policy will succeed in practice also depends critically on the
setting into which it will be introduced. The ‘same’ sex education package will unfold very
differently in a progressive suburban arts college than in a single-sex Catholic boarding
school or a ‘failing’ inner city comprehensive with 20% of its staff off sick with stress.
To summarise, as well as the integrity of the programme theory, four additional contextual
factors should be considered:
(a) The individual capacities of the key actors and stakeholders. In the above example,
do the teachers and support staff have the interest, attitudes, capability and credibility
with pupils to play an effective part in the intervention?
ESRCrealsynWP 7
Realist synthesis: an introduction
(b) The interpersonal relationships required to support the intervention. Are lines of
communication, management and administrative support, union agreements, and
professional contracts supportive or constraining to the delivery of sex education by
teaching staff?
(c) The institutional setting. Do the culture, charter, and ethos of the school support a
sex education intervention (specifically, how well do they chime with the one
proposed)? Is there clear and supportive leadership from top management (in this
case, from the head teacher and board of governors)?
(d) The wider infra-structural and welfare system. Are there political support and funding
resources to support the intervention? Are there influential lobbies – for example from
religious organisations or gay rights campaigners – that will bring pressure to bear on
the implementation of this policy locally? Is sex education legally required or
otherwise sanctioned?
These layers of contextual influence on the efficacy of a programme are depicted in Figure
3. They represent the single greatest challenge to evidence-based policy. Generating
transferable lessons about interventions will always be difficult because they are never
embedded in the same structures.
This illustrates the fifth principle of realist review. Broadly speaking, we should expect the
‘same’ intervention to meet with both success and failure (and all points in between), when
applied in different contexts and settings. The reviewer must contextualise any differences
found between primary studies in terms of (for example) policy timing, organisational culture
and leadership, resource allocation, staffing levels and capabilities, interpersonal
relationships, and competing local priorities and influences.
Infrastructure
INTERVENTION
Institution
Interpersonal relations
Individuals
ESRCrealsynWP 8
Realist synthesis: an introduction
The reason for this is all too obvious. Practitioners and managers implement change and in
the process of doing so, talk to each other. When it comes to putting flesh on the bones of
an intervention strategy, practitioners will consult with colleagues and cross-fertilise ideas,
for example, when teachers from different local schools meet up at formal or informal events
locally. Especially when it comes to ironing out snags, there will be a considerable amount of
‘rubbernecking’ from scheme to scheme as stakeholders compare notes on solutions. For
example, if all local schools are required to implement a sex education package, word might
get around that separating boys from girls smoothes the delivery of the intervention, and this
‘good idea’ will be taken up rapidly even if it was not part of the original protocol. More subtle
modifications will also be thrown into a rummage bin of ideas, from which they will be
retrieved and variously adapted by other stakeholders.
Setting A
Setting B
They key point here is that informal knowledge exchange about a scheme may sometimes
standardise it and may sometimes fragment it, but will always change it. Reviewers must
always beware of so-called ‘label naiveté’ (Øvretveit and Gustafson, 2002). The intervention
to be reviewed will carry a title and that title will speak to a general and abstract programme
ESRCrealsynWP 9
Realist synthesis: an introduction
theory. But that conjecture may not be the one that practitioners and managers have actually
implemented, nor the one that empirical studies have evaluated.
This illustrates the sixth principle of realist review. Broadly speaking, we should expect the
‘same’ intervention to be delivered in a mutating fashion. The reviewer should consider how
the outcomes are dynamically shaped by refinement, reinvention and adaptation to local
circumstances.
The best known example of this in the evaluation literature is the so-called ‘arms-race’ in
criminal justice interventions. Criminals may be detained or stymied by the introduction of
some new crime prevention device or system. But once they become of aware of how it
works (decode the programme theory) they are able to adapt their modus operandi, so that
impact is lost and a fresh intervention is required. Rarely are health service innovations
decoded and resisted to such dramatic effect. There is, however, a modest self-defeating
effect in many interventions. On their first introduction, performance targets and progress
reviews can lead to a significant period of self-reflection on the activities in question. If such
monitoring becomes routinised, various short-cuts and tricks-of-the-trade may also follow,
and the desired introspection on performance can become perfunctory, as is arguably
occurring in relation to the NHS appraisal scheme for senior clinicians (Evans, 2003).
There are other conditions that lead interventions to become self-fulfilling, at least in the
short and medium term. Management innovations tend to work if they curry favour with
existing staff. This transformation can be greatly assisted if recruitment and promotion of
programme-friendly staff is also part of the package. Such a condition remains self-affirming
only in so far as staff restructuring can keep pace with innovation in ideas. Otherwise,
managers are faced with the self-defeating task of teaching new tricks to old dogs. The pre-
conditioning of later outcomes by earlier inputs is illustrated in Figure 5. The long dashed
arrow represents the effect (for example) of appointing staff members with particular
capabilities and predispositions, who at some later stage contribute to an unintended
derailing of what the initiative has become.
This illustrates the seventh principle of realist review. Broadly speaking, we should expect
reviews to anticipate and chart both intended and unintended effects of innovations. The
latter may be reported in the longer-term studies of interventions.
ESRCrealsynWP 10
Realist synthesis: an introduction
(a) A limit on how much territory he or she can cover. An intervention may have multiple
stages, each with its associated theory, and endless permutations of individual,
interpersonal, institutional and infra-structural settings. The reviewer will need to
prioritise the investigation of particular processes and theories in particular settings.
(b) A limit on the nature and quality of the information that he or she can retrieve.
Empirical research studies will probably have focused on formal documentation (such
as policies, guidance, minutes of meetings), tangible processes (such as the
activities of steering groups), and easily measured outcomes (such as attendance
figures or responses to questionnaires). Information about the informal (and
sometimes overtly ‘off the record’) exchange of knowledge, the interpersonal
relationships and power struggles, and the subtle contextual conditions that can
make interventions float or sink in an organisation will be much harder to come by,
and is often frustratingly absent from reports.
(c) A limit on what he or she can expect to deliver in the way of recommendations. The
reviewer will never be able to grasp the totality of the constraints on the effectiveness
of interventions and will certainly not be able to anticipate all the circumstances in
which subsequent schemes might be implemented. This places critical limitations on
the recommendations that flow from a realist review and the certainty with which they
can be put forward.
These theoretical limitations lead to three practical consequences. The consequence of the
first limitation is that much greater emphasis must placed on articulating the review question
so as to prioritise which aspects of which interventions will be examined. In terms of the
processes identified in Sections 1.21 to 1.27, some will figure more strongly in the fate of
certain interventions than others. For instance, tracking the issue of the perpetual negotiation
of programmes is likely to be more of a priority in ‘user-oriented’ interventions; charting the
constraining effects of organisational culture will be more important when there is
considerable local autonomy in service delivery; tracing the interference of one service with
another will be a priority if different bodies are responsible for the overall provision of the
service, and so on. Different priorities raise quite different questions and hypotheses for the
review and, accordingly, there is no single format entailed in a realist review.
The consequence of the second limitation is that a much greater range of information from a
greater range of primary sources will need to be utilised. Searching for evidence will go far
beyond formal evaluations and may well involve action research, documentary analysis,
administrative records, surveys, legislative analysis, conceptual critique, personal testimony,
thought pieces and so on. Calling upon such a compendium of information also alters the
perspective on how the quality of the existing research should be assessed. Different
aspects of an intervention are uncovered through different modes of inquiry. Accordingly,
there is no simple hierarchy of evidence applicable in sifting the evidence. What is more,
quality checklists just do not exist for assessing legal frameworks, administrative records and
ESRCrealsynWP 11
Realist synthesis: an introduction
policy thought pieces. The realist review is more likely to have to scavenge for evidence than
pick and choose between different research strategies. All this does not imply that a realist
review is indifferent to notions of research quality, but that decisions about quality require
complex contextualised judgements rather than the application of a generalisable tick-list.
The consequence of the final limitation is that both academics and research commissioners
must change their expectations about what it is possible for research synthesis to deliver in
this context. A necessarily selective and prioritised review will generate qualified and
provisional findings and thus modest and cautious recommendations. Realist reviews do not
seek out ‘best buy’ programmes, nor discover 3 , 2 , 1 and 0 services. Rather they
attempt to place on the table an account of the workings of complex interventions and an
understanding of how theory may be improved. Commissioners of realist reviews should
thus expect ‘fine tuning’ rather than ‘verdicts’ for their money, and thus have a key role in
shaping the terms of reference for the review. We take up this point more fully in Part III.
ESRCrealsynWP 12
Realist synthesis: an introduction
ESRCrealsynWP 13
Realist synthesis: an introduction
Table 1: Design and sequence of traditional systematic review and realist review (see Figure 7,
page 29, for more details)
The realist approach, too, starts with a sharpening of the question to be posed but the task
goes well beyond the need for operational clarity. The divergence stems from the different
nature of the interventions studied (complex and multiply embedded rather than simple and
discrete) and the different purpose of the review (explanation rather than final judgement).
These differences bite enormously hard at stage one of a realist review, and effectively
break it into several sub-stages. Both reviewers and commissioners should anticipate that
‘focusing the question’ will be a time consuming and ongoing task, often continuing to the
half way mark and even beyond in a rapid review. We have previously referred to this stage
of the synthesis of complex evidence as ‘the swamp’, and advised that acknowledging its
uncertain and iterative nature is critical to the success of the review process (Greenhalgh,
2004).
One important aspect of conceptual ground clearing between commissioners and reviewers
is to agree the explanatory basis of the review. A realist review cannot settle with a
commissioner to discover ‘whether’ an intervention works, but trades instead on its ability to
discover ‘why’, ‘when’ and ‘how’ it might succeed. From the outset, the basic orientation is
about shaping and targeting interventions. An explanatory orientation is not a single point of
reference and so will tend to involve a whole range of sub-questions that might be
summarised as: ‘what is it about this kind of intervention that works, for whom, in what
circumstances, in what respects and why?’ Rather than commissioners merely handing over
an unspecified bundle of such questions, and rather than reviewers picking up those sticks
with which they feel most comfortable, both parties should (a) work together on a ‘pre-
review’ stage in which some of these particulars will be negotiated and clarified; and (b)
periodically revisit, and if necessary revise, the focus of the review as knowledge begins to
emerge.
ESRCrealsynWP 14
Realist synthesis: an introduction
This strategy is discussed in more detail in Sections 2.23 and 2.6 below, in relation to the
public disclosure of information example. The focus is on discovering which of several
competing theories actually operates in raising sanctions against under-performers. The
review can take on the task of uncovering evidence to adjudicate which (or more likely,
which permutation) is the driver. Many interventions are unleashed in the face of some
ambiguity about how they will actually operate and the synthesis can take on the job of
coming to an understanding of how they work. This strategy of using evidence to adjudicate
between theories is the hallmark of realist inquiry and, some would say, of the scientific
method itself (Pawson, 1989).
Although it is essential to clarify at some stage which of these approaches will drive the
review, it may not be possible to make a final decision until the review is well underway.
Certainly, we counsel strongly against the pre-publication of realist review ‘protocols’ in
ESRCrealsynWP 15
Realist synthesis: an introduction
which both the review question and the purpose of the review must be set in stone before
the real work begins!
An important initial strategy is discussion with commissioners, policy makers and other
stakeholders to tap into ‘official conjecture’ and ‘expert framing’ of the problem. This is likely
to identify certain theories as the ‘pre-given’ subject matter of the review, but at some point
the reviewer must enter the literature with the explicit purpose of searching it for the theories,
the hunches, the expectations, the rationales and the rationalisations for why the intervention
might work. As we have seen, interventions never run smoothly. They are subject to
unforeseen consequences as a result of resistance, negotiation, adaptation, borrowing,
feedback and, above all, context, context, context. The data to be collected here relate not to
the efficacy of the intervention but to the range of prevailing theories and explanations of
how it was supposed to work – and why things ‘went wrong’.
We can demonstrate this idea using the example of interventions based on the public
disclosure of health care information (Figure 6). Public disclosure interventions consist of a
warren of activities (and thus theories), beginning with the production of performance
measures. Classifications are made at the individual and institutional levels and cover
anything from report cards on individual surgeons to hospital star ratings. Such
classifications are not ‘neutral’ or ‘natural’; they are made for a purpose and that purpose is
to identify clearly the difference between good and poor performance. ‘Performance’ covers
a host of feats and a multitude of sins and so the classification has to decide which
configuration of indicators (patient turnover, mortality rates, satisfaction rates, waiting list
lengths and times, cleanliness measures, etc. etc.) constitutes satisfactory levels of
accomplishment.
• Theory two is about the impact of publicity: ‘Sunlight is the best of disinfectant: electric
light the most efficient policeman’ (Brandeis, quoted in Fisse and Braithwaite, 1983,
pvii). The intervention is not meant to work metaphorically, however, and getting the
ESRCrealsynWP 16
Realist synthesis: an introduction
theory into practice involves multiple choices about what information is released,
though what means, to whom. But data never speak for themselves (especially if they
cover the performance of multiple units on multiple indicators). Some reports offer raw
scores, some compress the data into simple rankings, some include explanations and
justifications, and some draw inferences about what is implied in the records (Marshall
et al, 2000). Dissemination practices also vary widely from the passive (the report is
‘published’ and therefore available) to the active (press conferences, media releases
etc), and rely on further theories (see below).
• Theory three is about actions by recipients of the message and the impact of those
actions. Because information is now in the public domain, a wider group of
stakeholders is encouraged and empowered to have a say on whether the reported
performance is adequate. The ‘broader community’ is thus expected to act on the
disclosure by way of shaming, or reproaching, or boycotting, or restraining, or further
monitoring the failing parties (and taking converse actions with high-flyers). Again,
there are multiple choices to be made: which members of the wider community are
the intended recipients? How are they presumed to marshal a sanction? A range of
contending response theories is possible (Marshall et al, 2000). One idea (3a in
Figure 6) is that the public release is used to support closer regulation of public
services. The performance data provide an expert and dispassionate view of a
problem, which signals the agents and agencies in need of greater supervision and/or
replacement. A second theory (3b) is that disclosure stimulates consumer choice. The
‘informed purchaser’ of health care is able to pick and choose. Lack of demand for
their services drives the subsequent improvement of poor performers. Theory (3c) is a
variant of this supply and demand logic, arguing that the key consumers are not
individuals but institutions (fundholders, managed care organisations, primary care
groups). Theory (3d) reckons that ‘naming and shaming’ is the working mechanism
and the underperformers pull up (their own) socks in response to the jolt of negative
publicity. All these competing theories will need to be explored in the review.
Incidentally, the actual review, when we get that far, might declare on ‘none-of-the-
above’, and discover that theory (3e) about the procrastination, indifference and
apathy of the wider public is the one that tends to hold sway.
• Theory four concerns the actions of those on the receiving end of the disclosure. The
basic expectation is that that good performers will react to disclosure by seeking to
maintain position and that miscreants will seek reintegration (Braithwaite, 1989). The
precise nature of the latter’s reaction depends, of course, on the nature of the
sanction applied at Step 3. If they are in receipt of a decline in purchasing choice it is
assumed that they will attempt to improve their product; if they are shamed it is
assumed they will feel contrite; and if they are subject to further regulation it is
assumed that they will take heed of the submissions that flow from tighter inspection
and supervision.
Theories five to seven in Figure 6 arise from another major phase in the initial theory
mapping process. The four theories discussed to date arose from the ‘expert framing’ of the
programme, as described in Figure 1. However, as previously discussed in the description of
interventions, we have seen that interventions rarely run smoothly and are subject to
unforeseen consequence due to resistance, negotiation, adaptation, borrowing, feedback
and, above all, contextual influence. It is highly likely, therefore, that in the initial trawl for the
theories underlying the intervention being studied, the researcher will encounter rival
conjectures about how a scheme might succeed or fail. Three of these rival conjectures are
illustrated by theories five, six and seven on the preliminary theory map in Figure 6.
ESRCrealsynWP 17
Realist synthesis: an introduction
• Theory six postulates how a process external to the intervention might impinge on its
potential success. To ‘go public’ is to let a cat out of the bag that is not entirely within
the control of those compiling the performance data. The measures are applied with a
specific problem and subsequent course of action in mind (the expert framing of the
issue). Whether these sit easy with ‘media frames’ (Wolfsfeld, 1997) is a moot point.
The presentational conventions for handling such ‘stories’ are likely to revolve around
‘shame’ and ‘failure’ and these rather than the intended ‘reintegration’ message may
be the ones that get heard.
• Theory seven is another potential mechanism for resisting the intervention. This
occurs when the whole measurement apparatus is in place and the public is primed to
apply sanctions. It consists of discovering ways to outmanoeuvre the measures. This
may involve marshalling the troops to optimal performance on the day the tape
measure falls, or applying effort to activities gauged at the expense of those left
unmonitored. As a result, the reviewer must try to estimate the extent to which any
reported changes under public disclosure are real or ersatz.
These seven theories, which will each require separate testing in the next stage of the
review, are not an exhaustive set of explanations. For instance, a programme that mounted
‘rescue packages’ for ‘failing’ hospitals following the collection and public disclosure of
performance information (for example, the work of the NHS Modernisation Agency’s
Performance Development Team with zero-star NHS trusts), would put in place a whole raft
of further procedures, whose underlying theories could be unpacked and pursued.
In general terms, and given our endless refrain about complexity, it should be clear that the
ideas unearthed in a theory mapping exercise will be many and varied. They might stretch
from macro theories about health inequalities to meso theories about organisational capacity
to micro theories about employee motivation. They might stretch though time, relating, for
example, to the impact of long-term ‘intervention fatigue’. This abundance of ideas provides
the final task for this first stage in a realist review, namely to decide upon which
combinations and which subset of theories are going to feature on the short list. A simple but
key principle is evident: that totally comprehensive reviews are impossible and that the task
is to prioritise and agree on which programme theories are to be inspected.
We have noted that reviews may spend half of their time in the conceptual quagmire, and
the detailed illustrations used here are meant to affirm the importance of refining the
question to be posed in research synthesis. We have demonstrated, in the case of realist
review, that this is more than a matter of conceptual tidiness. Articulating the theories that
are embedded within interventions provides a way of recognising their complexity and then
finding an analytic strategy to cut into that complexity.
Before moving on to the ‘search for papers’ stage, it is worth reflecting that the initial phase
of theory stalking and sifting has utility in its own right. There is a resemblance here to the
strategies of concept mapping in evaluation and the use of logic models in management
ESRCrealsynWP 18
Realist synthesis: an introduction
(Knox, 1995). Many interventions are built via thumbnail sketches of programme pathways
such as Figure 6. Surfacing, ex post facto, the full range of programme theories in a mature
programme lays bare for managers and policy makers the multitude of decision points in an
intervention and the thinking that has gone into them.
It is also worth reiterating that, as Section 2.22 illustrated, there is no single, formulaic way of
cutting through this complexity and expressing the hypotheses to be explored. The
reviewer’s work will sometimes consist of comparing the intervention in different locations,
and at other times tracking it through its various phases, or arbitrating the views of different
stakeholders. This emphasises again that realist review is not a review technique but a
review logic.
Finally, we hope this section has demonstrated that user participation in the review process
is not mere tokenism. Above all others, this stage of theory mapping and prioritisation is not
a matter of abstract ‘data extraction’. Rather, it requires active and ongoing dialogue with the
people who develop and deliver the interventions, since they are the people who embody
and enact the theories that are to be identified, unpacked and tested.
We exit the swamp (having identified an area of inquiry and rooted out and prioritised the
key theories to review) with the real work still to do. In the next two stages, we will first
gather empirical evidence and then formally test those theories against it.
1. A background search to get a ‘feel’ for the literature – what is there, what form it takes,
where it seems to be located, how much there is etc. This is almost the very first thing
the reviewer should do.
2. A search to track the programme theories – locating the administrative thinking, policy
history, legislative background, key points of contention in respect of the intervention,
and so on. This was described in Section 2.23 and forms part of the ‘Clarifying the scope
of the review’ stage.
3. A search for empirical evidence to test a subset of these theories – locating apposite
evidence from a range of primary studies using a variety of research strategies. This is in
some senses the ‘search’ proper, in which the reviewer has moved on from browsing
and for which a formal audit trail should be provided in the write-up.
4. A final search once the synthesis is almost complete, to seek out additional studies that
might further refine the programme theories that have formed the focus of analysis.
ESRCrealsynWP 19
Realist synthesis: an introduction
studies, and often reducing their set of papers from thousands to a mere handful. This may
be an appropriate strategy when testing the effectiveness of simple interventions, but it is
unhelpful in a realist review of a complex social intervention, for two reasons. First, there is
not a finite set of ‘relevant papers’ which can be defined and then found. There are many
more potentially relevant sources of information than any review could practically cover, and
so some kind of purposive sampling strategy needs to be designed and followed. Second,
excluding all but a tiny minority of relevant studies on the grounds of ‘rigour’ would reduce
rather than increase the validity and generalisability of review findings, since (as explained in
Sections 2.5 and 2.6) different primary studies contribute different elements to the rich
picture that constitutes the overall synthesis of evidence.
Realist reviews, in contrast, use search strategies which make deliberate use of purposive
sampling, aiming to retrieve materials purposively to answer specific questions or test
particular theories. If one thinks of the aim of the review exercise as identifying, testing out,
and refining programme theories, then an almost infinite set of studies could be relevant.
Consequently, a decision has to be made, not just about which studies are fit for purpose in
identifying, testing out or refining the programme theories, but also about when to stop
looking – when sufficient evidence has been assembled to satisfy the theoretical need or
answer the question. This test of saturation can only be applied iteratively, by asking after
each stage or cycle of searching whether the literature retrieved adds anything new to our
understanding of the intervention, and whether further searching is likely to add new
knowledge.
In summary, realist review uses an approach to searching that is more iterative and
interactive (involving tracking back and forth from the literature retrieved to the research
questions and programme theories) than a traditional systematic review, and the search
strategies and terms used are likely to evolve as understanding grows. Because useful
studies in this respect will often make reference to companion pieces that have explored the
same ideas, searching makes as much use of ‘snowballing’ (pursuing references of
references by hand or by means of citation-tracking databases) as it does of conventional
database searching using terms or keywords. In a recent systematic review conducted along
realist lines, one of us found that 52% of all the quality empirical studies referenced in the
final report were identified through snowballing, compared with only 35% through database
searching and 6% through hand searching (Greenhalgh et al, 2004).
Purposive approaches to searching do not have the kind of neat, predefined sampling frame
achievable through probability sampling. For instance, the reviewer might choose to venture
across policy domains in picking up useful ideas on a programme theory. Schools have
undergone a similar programme of public disclosure of performance data and there is no
reason why that literature cannot reveal crucial accounts of intervention theories or even
useful comparative tests of certain of those theories. Purposive sampling is also ‘iterative’ in
that it may need to be repeated as theoretical understanding develops. An understanding,
say, of the media’s influence in distorting publicly disclosed information, may only develop
relatively late in a review and researchers may be forced back to the drawing board and the
search engines to seek further studies to help sort out an evolving proposition.
There is a further important parallel with purposive sampling when it comes to the perennial
question of how many primary studies are needed to complete a realist review?
Methodologically, the reviewer should aim not for encyclopaedic coverage of all possibly
relevant literature but for a concept borrowed from qualitative research, that of theoretical
saturation (Glaser and Strauss, 1967). In other words, the reviewer should stop searching at
the point when no new information is added (that is, the theory under investigation meets no
new challenges) by the accumulation of further ‘cases’ (that is, papers or other primary
evidence). Let us imagine that a realist review is blessed by a thousand user satisfaction
surveys of a particular service and that they all tell much the same tale. The reviewer is likely
ESRCrealsynWP 20
Realist synthesis: an introduction
to want that information, for instance, to contrast these opinions with that of the assumptions
of another group of stakeholders. This comparison may then reveal something about the
different respects in which an intervention is working. The useable data is thus about the
content of the different opinions, and in this respect sheer weight of numbers is not the
issue. The synthesis could learn from and be happy to report the apparent consistency of
this material but there would be no need to afford it its corresponding number of column
inches in the synthesis.
As far as the mechanics of searching goes, realist review uses index headings, key word
searches, search engines, databases and so forth in the same way as conventional
systematic review. There are some different points of emphasis, however.
(a) Because it deals with the inner workings of interventions, realist review is much more
likely to make use of the administrative ‘grey literature’ rather than relying solely on
formal research in the academic journals.
(b) Because it takes the underpinning mechanism of action rather than any particular
topic area as a key unit of analysis, a much wider breadth of empirical studies may
be deemed relevant, and these will sometimes be drawn from different bodies of
literature. As mentioned above, studies on the public disclosure of performance data
by schools will have important lessons for health care organisations and vice versa.
Hence, a tight restriction on ‘databases to be searched’ is inappropriate.
(c) Because it looks beyond treatments and outcomes, the key words chosen to instigate
a search are more difficult to fix. As a rough approximation one can say that in terms
of their ability to score useful ‘hits’, proper nouns (such as ‘The Lancet’) outstrip
common nouns (such as ‘publications’), which in turn outdo abstract nouns (such as
‘publicity’). Theory building utilises these terms in the opposite proportion.
Accordingly, if, say, one is trying to locate material on ‘shame’ as experienced under
‘publicity’, snowballing is likely to be many times more fruitful than putting specific
words into a Medline or similar search.
Realist review supports this principle but takes a different position on how research quality is
judged. Systematic review of biomedical interventions is based firmly on the use of a
‘hierarchy of evidence’ with the randomised controlled trial (RCT) sitting atop, and non-
randomised controlled trials, before and after studies, descriptive case studies, and (lowest
of all) ‘opinion pieces’ underneath. Realist review rejects a hierarchical approach as an
example of the law of the hammer (to a man with a hammer, everything is a nail).
ESRCrealsynWP 21
Realist synthesis: an introduction
The problem with RCTs in testing complex service interventions is that because service
interventions are always conducted in the midst of (and are therefore influenced by) other
programmes, they are never alike in their different incarnations. Any institution chosen as a
‘match’ in a comparison will also be in the midst of a maelstrom of change. The hallowed
comparison of ‘treatment’ and ‘control’ thus becomes a comparison between a partial and a
complete mystery. One cannot simply finger the intervention light-switch to achieve a clean
‘policy-on’ / ‘policy-off’ comparison. It is of course possible to perform RCTs on service
delivery interventions, but such trials are meaningless because the RCT design is explicitly
constructed to wash out the vital explanatory ingredients. Process evaluations may be
conducted alongside RCTs to enable more detailed explanations, but the basic issue of
standardising interventions remains.
Hence, whereas it is right and proper to demand rigorous, controlled experiments when the
task is to evaluate treatments, it is foolish to privilege this form of evidence when the task is
something quite different. Realist review, in the spirit of true scientific enquiry, seeks to
explore complex areas of reality by tailoring its methods eclectically to its highly diverse
subject matter. Much contemporary effort and thinking has gone into producing appraisal
checklists for non-RCT research, such as the Cabinet Office’s framework for assessing
qualitative research (which runs to 16 appraisal questions and 68 potential indicators)
(Spencer et al, 2003). But such checklists are not the ‘answer’ to the complex challenge of
realist review, for three reasons. Firstly, such synthesis calls not merely upon conventional
qualitative and quantitative research designs, but on impact evaluations, process
evaluations, action research, documentary analysis, administrative records, surveys,
legislative analysis, conceptual critique, personal testimony, thought pieces and so on, as
well as an infinite number of hybrids and adaptations of these.
Secondly, the ‘study’ is rarely the appropriate unit of analysis. Very often, realist review will
choose to consider only one element of a primary study in order to test a very specific
hypothesis about the link between context, mechanism and outcome. Whilst an empirical
study must meet minimum criteria of rigour and relevance to be considered, the study as a
whole does not get ‘included’ or ‘excluded’ on the fall of a single quality axe. Finally appraisal
checklists designed for non-RCT research acknowledge the critical importance of ‘judgement
and discretion’ (Spencer et al, 2003, p110). For instance, a checklist for qualitative research
might include a question on ‘clarity and coherence of the reportage’ (Spencer et al, 2003,
p27). In such cases, the checklist does little more than assign structure and credibility to
what are actually highly subjective judgements. There comes a point when cross-matching
hundreds of primary studies with dozens of ‘appraisal checklists’, often drawing on more
than one checklist per study, brings diminishing returns.
The realist solution is to cut directly to the judgement. As with the search for primary studies,
it is useful to think of quality appraisal as occurring by stages.
(a) Relevance – as discussed in Section 2.1, relevance in realist review is not about
whether the study covered a particular topic, but whether it addressed the theory
under test.
(b) Rigour – that is, whether a particular inference drawn by the original researcher has
sufficient weight to make a methodologically credible contribution to the test of a
particular intervention theory.
In other words, both relevance and rigour are not absolute criteria on which the study floats
or sinks, but dimensions of ‘fitness for purpose’ for a particular synthesis. Let us consider an
example. If we were searching for evidence on public disclosure of performance data, we
might well wish to consider the extent to which such records play a part in patients’ decisions
about whether to use a particular service, surgeon or hospital. Research on this might come
in a variety of forms. We might find, for example, self-reported data on how people use
ESRCrealsynWP 22
Realist synthesis: an introduction
performance data as part of a wider telephone survey on user views. We might find an
investigation testing out respondents’ understanding of the performance tables by asking
them to explain particular scores and ratios. We might find an attempt to track fluctuations in
admissions and discharges against the publication of the report and other contiguous
changes. We might find a quasi-experiment attempting to control the release of information
to some citizens and not others, and following up for differences in usage. Finally, we might
find qualitative studies of the perceptions that led to actual decisions to seek particular
treatments. (See Marshall et al (2000) for a profile of actual studies on this matter).
All of these studies would be both illuminating and flawed. The limitations of one would often
be met with information from another. The results of one might well be explained by the
findings from another. Such a mixed picture is routine in research synthesis and reveals
clearly the perils of using a single hierarchy of evidence. But neither does it require taking on
board all of the evidence uncritically. Good practice in synthesis would weigh up the relative
contribution of each source, and this might involve dismissing some sources as flimsy. The
point is that in good synthesis, one would see this reasoning set out on the page. To
synthesise is to make sense of the different contributions. The analysis, for instance, would
actually spell out the grounds for being cautious about A, because of what we have learned
from B, and what was indicated in C. Such a little chain of reasoning illustrates our final point
in this section and, indeed, the basic realist principle of quality assessment, namely, that the
worth of studies is established in synthesis. True quality appraisal comes at the coup de
grâce and not as a preliminary pre-qualification exercise. Further examination of quality
issues in systematic review from the realist perspective may be found in Pawson (2003).
The realist reviewer may well make use of ‘data extraction forms’ to assist the sifting, sorting
and annotation of primary source materials. But such aids do not take the form of a single,
standard list of questions. Rather, a menu of bespoke forms may be developed and/or the
reviewer may choose to complete different sections for different sources. The need to ‘cut
the data extraction question according to the cloth’ is a consequence of the many-sided
hypothesis that a realist review might tackle and the multiple sources of evidence that might
ESRCrealsynWP 23
Realist synthesis: an introduction
be taken into account. As discussed in Section 2.23, some primary sources may do no more
than identify possible relevant concepts and theories; for these, ‘data extraction’ can be
achieved by marking the relevant sentences with a highlighter pen. Even those empirical
studies that are used in ‘testing’ mode are likely to have addressed just one part of the
implementation chain and thus come in quite different shapes and sizes.
Realist reviews thus assimilate information more by note-taking and annotation than by
‘extracting data’ as such. If one is in theory tracking mode, documents are scoured for ideas
on how an intervention is supposed to work. These are highlighted, noted and given an
approximate label. Further documents may reveal neighbouring or rival ideas. These are
mentally bracketed together until a final model is built of the potential pathways of the
intervention’s theories. Empirical studies are treated in a similar manner, being scrutinised
for which programme idea they address, what claims are made with respect to which
theories, and how the apposite evidence is marshalled. These are duly noted and revised
and amended as the testing strategy becomes clarified.
Two further features of the realist reading of evidence are worth noting. The first is that, as
with any mode of research synthesis, one ends up with the inevitable piles of paper on the
floor as one tries to recall which study speaks to which process, and whether a particular
study belongs hither or thither. Just as a conventional review will append a list of studies
consulted and then give an indication of which contributed to the statistical analysis, so too
should a realist review trace the usage and non-usage of primary materials, although the
archaeology of decision making is more complex and thus harder to unearth here. One is
inspecting multiple theories, and specific studies may speak to none, one, more, or all of
them. Nevertheless, as the method develops, the reviewer should expect to develop a
record of the different ways in which studies have been used (and omitted).
The second point is to note is that the steps involved in realist review are not in fact linear;
studies are returned to time and again and thus ‘extraction’ occurs all the way down the line.
There always comes a rather ill-defined point in the sifting and sorting of primary models
where one changes from framework building to framework testing and from theory
construction to theory refinement. The reviewer experiences a shift from divergent to
convergent thinking as ideas begin to take shape and the theories underpinning the
intervention gain clarity. Accounts of systematic review which make claims for its
reproducible and thus mechanical nature are being economical with the truth in not
recognising this ineffable point of transformation.
ESRCrealsynWP 24
Realist synthesis: an introduction
• WHAT is it about this kind of intervention that works, for WHOM, in what
CIRCUMSTANCES, in what RESPECTS and WHY?
In opening up the black box of service implementation, realist review does not claim that it is
possible to get to grips with the full complexity of interventions. Down the line, delivering
programmes and services relies on the activities of Joe and Josephine Bloggs (Section 1.22)
and there is no accounting for the eccentricities of these two. Rather more significantly,
realist synthesis assumes that interventions sit amidst open systems (Section 1.27) so that,
for instance, the early success of an intervention may go on to create new conditions that
lead to its downfall. What we have suggested, therefore, is that a realist synthesis takes a
particular ‘cut’ though key phases of the existing warren of intervention theories (Section
2.22) and tries to improve understanding of various claims, hopes and aspirations at that
point. We consider the different ‘cuts’ introduced earlier in turn below:
ESRCrealsynWP 25
Realist synthesis: an introduction
Note that this particular approach to synthesis may be especially interested in settings
beyond health care. Public disclosure of performance data occurs not only for hospitals but
also increasingly for schools. The synthesis we have in mind would not be a matter of saying
that theory N works in education but not in health care (or vice versa). The disparities in
intervention efficacy are likely to stem from differences in consumer power, professional
autonomy, payment systems, availability of alternatives, audit familiarity, and so on. These
matters are precisely where learning lies if we are to understand public disclosure, and are
thus vital to the success of synthesis.
These examples show that realist synthesis can take at least four different ‘slants’. What
they have in common is a focus on the programme theory rather than the primary study as
the unit of analysis, and the need to interrogate and refine the theory as synthesis
progresses.
The first is for commissioners of reviews to be much more closely involved in the production
of the research synthesis, a state of play that Lomas has called ‘linkage’ (Lomas, 2000).
Researchers can only address themselves to a question, and decision makers can only find
pertinence in the answer, if that question has been adequately honed, refined and left
without major ambiguity. The second form of redemption is for reviewers to bring their
technical expertise closer to the policy question in question. Research synthesis needs to be
able to locate recommendations in relation to the policy options on the table and this
objective is supported if the research takes cognisance of the practical needs of a range of
stakeholders in the shaping of an intervention. Both requirements place a premium on
avoiding overly technical language in dissemination, cutting instead to the quick and using
the parlance of decision making.
ESRCrealsynWP 26
Realist synthesis: an introduction
meaningfully occur in the absence of input from practitioners and policy makers, because it
is their questions and their assumptions about how the world works that form the focus of
analysis. Furthermore, the ‘findings’ of a realist review must be expressed not as universal
scientific truths [such as ‘family intervention for schizophrenia has a mean impact of xxx’
(Pharoah et al 2003)] but in the cautious and contextualised grammar of policy discourse.
What do we mean by this? Realist review initiates a process of thinking through the tortuous
pathways along which a successful programme has to travel. It concludes with reflections
and considerations on how to navigate some significant highways and byways. Accordingly,
what the ‘recommendations’ describe are the main series of decision points through which
an initiative has proceeded, and the findings are put to use in alerting the policy community
to the caveats and considerations that should inform those decisions. For each decision
point, the realist evaluators should be able to proffer the following kind of advice: ‘remember
A’, ‘beware of B’, ‘take care of C’, ‘D can result in both E and F’, ‘Gs and Hs are likely to
interpret I quite differently’, ‘if you try J make sure that K, L and M have also been
considered’, ‘N’s effect tends to be short lived’, ‘O really has quite different components – P,
Q and R’, and ‘S works perfectly well in T but poorly for U. The review will, inevitably, also
reflect that ‘little is known about V, W, X, Y and Z’.
Given such an objective it is easy to see why the realist reviewer generally finds that further
linkage with the policy making community at the writing-up stage accelerates rather than
interferes with this task (whereas, for obvious reasons, the producer of a traditional
systematic review generally finds the opposite). We have described the elongated process of
theory mapping that precedes the evidence synthesis. We have also described how, in the
face of complexity, that realist review does not take on the full A-to-Z of an implementation
chain but concentrates on a subset of the lexicon. What we have argued for in this respect is
for synthesis to take some strategic ‘cuts’ through the implementation chain. The rationale
for choosing this sequence or that comparison was essentially methodological, namely that a
particular design would forward explanation. The policy maker might well prefer to throw
another desideratum into the design, namely that it should concentrate on the policy levers
that can actually be pulled. We interpret this facility rather broadly; realist review can attend
to any point in the implementation chain and so can focus on the perspective of any
stakeholder.
Such a resource leaves open the question of when the liaison between reviewers and
decision makers should occur. Whilst the popular recommendation is, perhaps, that they
should hold hands throughout the review, this is a prospect that is somewhat unrealistic. The
tryst is surely best located at the beginning of the process. In practice, this means the
commissioner coming to the reviewer with a broad list of questions about an intervention.
The reviewer questions the questions, and suggests further angles that have resonated
through the existing literature. Then there is more negotiation and, eventually, a firm
agreement about which particular lines of inquiry to follow.
As well as this initial meeting of minds, realist review also anticipates that the review itself
will partly reorder expectations about what is important. The realist perspective can hardly
speculate on the likelihood of unintended consequences of interventions without applying
the rule reflexively. This means that room for further rounds of negotiation must be left open
about whether an unforeseen chink in the implementation chain deserves closer inspection.
But, at several points in between, there are long periods when reviewers should be left to
their own devices. They should, for example, be able to apply their expertise on matters
such as the methodological rigour and relevance of the primary research materials.
The analysis and conclusions section of realist review is not a final judgement on ‘what
works’ or ‘size of effect’. Rather, it takes the form of revisions to the initial understanding of
how an intervention was thought to work. Should close assignation between commissioners
ESRCrealsynWP 27
Realist synthesis: an introduction
and researchers continue at this point? We advocate a precise division of labour. Realist
review has the traditional role of providing an independent and dispassionate assessment of
how and how well an intervention has worked as viewed though the existing research.
Conclusions and recommendations have to reflect this objective and this standpoint.
However, the end product is a more refined theory rather than a final theory. Refinement
may take the form of deducing that theory A provides a better understanding than theory B,
but this leaves uncovered the potential explanatory import of theory C. It may be inferred that
the intervention works better in context D rather than context E but this might leave another
set of circumstances at C relatively uncovered. The progress made in a review is not one
from ‘ignorance’ to ‘answer’ but from ‘some knowledge’ to ‘some more knowledge’.
Accordingly, there is room for debate about the precise scope of the policy implications of
realist review. Extraordinary care must be taken at the point where findings are transformed
into recommendations, and close involvement with decision makers is once again required in
thrashing this out.
The diagrammatic representation of stages in Figure 7 disguises the fact that research
synthesis is not in fact linear. Realist review is about refining theories, and second thoughts
can occur at any time. Thus, in the course of a review, the researcher may happen on an
unconsidered theory that might improve understanding of the balance of successes and
failures of programme. Checking out this possibility may involve reinvigorating the search
procedures and dovetailing new information alongside the developing analysis. Ultimately, of
course, other researchers may question the emerging explanations and are free to consider
ESRCrealsynWP 28
Realist synthesis: an introduction
additional theories and supplementary primary sources in order to understand further what
works for whom in what circumstances and in what respects.
Define the scope Identify the • What is the nature and content of the intervention?
of the review question • What are the circumstances or context for its use?
• What are the policy intentions or objectives?
• What are the nature and form of its outcomes or impacts?
• Undertake exploratory searches to inform discussion with
review commissioners/decision makers
Clarify the • Theory integrity – does the intervention work as predicted?
purpose(s) of • Theory adjudication – which theories about the intervention
the review seem to fit best?
• Comparison – how does the intervention work in different
settings, for different groups?
• Reality testing – how does the policy intent of the intervention
translate into practice?
Find and • Search for relevant theories in the literature
articulate the • Draw up ‘long list’ of programme theories
programme • Group, categorise or synthesise theories
theories • Design a theoretically based evaluative framework to be
‘populated’ with evidence
Search for and Search for the • Decide and define purposive sampling strategy
appraise the evidence • Define search sources, terms and methods to be used
evidence (including cited reference searching)
• Set the thresholds for stopping searching at saturation
Appraise the • Test relevance – does the research address the theory under
evidence test?
• Test rigour – does the research support the conclusions
drawn from it by the researchers or the reviewers?
Extract and Extract the • Develop data extraction forms or templates
synthesise results • Extract data to populate the evaluative framework with
findings evidence
Synthesise • Compare and contrast findings from different studies
findings • Use findings from studies to address purpose(s) of review
• Seek both confirmatory and contradictory findings
• Refine programme theories in the light of evidence
Draw conclusions • Involve commissioners/decision makers in review of findings
and make • Draft and test out recommendations and conclusions based
recommendations on findings with key stakeholders
• Disseminate review with findings, conclusions and
recommendations
ESRCrealsynWP 29
Realist synthesis: an introduction
In Section 3.1, we remind the reader of the general realist orientation: that realist review can
bring enlightenment but not final judgement to policy decisions. In Section 3.2, conscious of
having laboured a single example throughout this paper, we zoom out to the task of building
the broader evidence base for policy making – a dynamic and pressurised world in which
dozens of reviews and hundreds of evaluations must be commissioned and made sense of.
Finally, in Section 3.3, we recall realist review’s position as the ‘new kid on the block’ for
evidence synthesis in healthcare, and we see how it measures up to the indigenous locals
(systematic review and meta-analysis) and to other potential newcomers.
The school of theory-based evaluation, of which realist evaluation is a member, has always
described its appointed task as offering ‘enlightenment’ as opposed to technical or partisan
support (Weiss, 1986; Weiss and Bucuvalas, 1980). The metaphor of enlightenment
describes rather well the working relationship between research and policy (slow dawning –
sometimes staccato, sometimes dormant, and sometimes antagonistic). Endless studies of
research utilisation have described these chequered liaisons and the ‘realistic’ assumption
remains that politics, in the last analysis, will always trump research. However,
enlightenment’s positive prospect, for which there is a great deal of empirical evidence (for
example: Deshpande, 1981; Lavis et al, 2002; Dobrow et al, 2004), is that the influence of
research on policy occurs through the medium of ideas rather than of data. This is described
by Weiss (1980) as 'knowledge creep' to illustrate the way research actually makes it into the
decision maker's brain. Research is unlikely to produce the thumping ‘fact’ that changes the
course of policy making. Rather, policies are born out of clash and compromise of ideas and
the key to enlightenment is to insinuate research results into this reckoning (Exworthy et al,
2002).
On this score, realist review has considerable advantages. Policy makers may struggle with
data that reveals, for instance, the respective statistical significance of an array of mediators
and moderators in meta-analysis. They are more likely to be able to interpret and to utilise
an explanation of why a programme mechanism works better in one context than another.
Note that these two research strategies are serving to answer rather similar questions, the
crucial point being that the one that focuses on ‘sense-making’ has the advantage. This is
especially so if the investigation has the tasks of checking out rival explanations (i.e.
adjudication), which then provide justification for taking one course of action rather than
another (i.e. politics). Here, then, is the positive message on research utilisation. Explanatory
evaluations throw light on the decisions in decision making.
ESRCrealsynWP 30
Realist synthesis: an introduction
A problem, perhaps, with this vision of research-as-illumination is that it tells us rather more
about the form of the message than its content. If evaluators and reviewers cannot tell policy
makers and practitioners exactly what works in the world of service delivery, how should
their advice proceed? What should we expect a programme of theory-testing to reveal?
What the realist approach contributes is a process of thinking though the tortuous pathways
along which a successful intervention has to travel. What is unearthed in synthesis is a
reproduction of a whole series of decision points through which an initiative has proceeded,
and the findings are put to use in alerting the policy community to the caveats and
considerations that should inform those decisions. Perhaps the best metaphor for the end-
product is to imagine the research process as producing a sort of highway code to
programme building, alerting policy makers to the problems that they might expect to
confront and some of the safest (i.e. best-tried and with widest applications) measures to
deal with these issues. A realist review highway code could never provide the level of
prescription or proscription achieved in the real thing, the point of the parallel being that the
highway code does not tell you how to drive but how to survive the journey by flagging
situations where danger may be lurking and extra vigilance needed.
The three diagrams in Figure 8 show some conventional approaches to policy evaluation.
Figure 8a depicts the standard ‘evaluation of X’ approach in which an evaluation is
commissioned as and when a new intervention is mounted. This approach can be rolled out
geographically (by doing multiple evaluations of X in multiple regional settings) and across
time (by following X through successive phases of process and outcome evaluations)
(Figure 8b). The key linkage remains, however, in as much as evaluation activities are firmly
attached to the current intervention.
This direct connection has become broken somewhat in the current trend towards review
and synthesis in evidence-based policy. One obvious drawback with ‘real-time’ evaluation
(Figures 8a and 8b) is that lessons get learned only after implementation and spending
decisions have been made. But in the fast changing world of modern healthcare policy
making, decision makers try their best to learn from research on previous incarnations of
bygone interventions (Figure 8c). The assumption (which we call the ‘isomorphic’
perspective) is that much the same programmes get tried and tried again and repeatedly
researched again (depicted by the direct linkage of an evaluation to each intervention in
Figure 8c). The crucial assumption is that learning accumulates by pooling together the
findings of primary studies (sometimes literally, into a ‘grand mean’ of effect size, as
ESRCrealsynWP 31
Realist synthesis: an introduction
8a ‘Evaluation of X’
Programme
Evaluation
Programme
Process Outcome
Evaluation Evaluation
Repeated
programmes
Repeated
evaluations
Systematic
review
Future programme
ESRCrealsynWP 32
Realist synthesis: an introduction
Although this manoeuvre gets the evidence horse before the policy cart, there remains an
assumption about the one-to-one relationship between each intervention and each atom of
evidence. In Figure 8a, the logic-in-use is that the evidence relates to ‘a programme’ and by
the time we get to 8c, the working assumption is that evidence relates to ‘a type of
programme’. This dramatic widening of the evidence net is based on three premises that
tend to go unquestioned, namely that:
For the realist these are remarkable, not to say foolhardy suppositions. As we have seen,
service delivery interventions are complex systems thrust into complex systems and are
never implemented the same way twice. Non-equivalence is the norm. Realists envision
interventions as whole sequences of mechanisms that produce diverse effects according to
context, so that any particular intervention will have its own particular signature of outputs
and outcomes. Understanding how a particular intervention works requires a study of the
fate of each of its many, many intervention theories.
This disaggregation of a programme into its component theories provides the impetus for a
new look at how the evidence base is constructed, commissioned and drawn upon. The
intervention theory (the basis of any realist evaluation) is retained as the unit of analysis
when it comes to research synthesis and this allows for a more promising strategy for
building an evidence base to cope with the vicissitudes of complex systems. The starting
point, as Section 2 spelt out, is the initial ‘mapping’ of interventions into their component
theories as in the first part of Figure 9. The various flows and feedback lines therein are
intended to represent the negotiation, the borrowing, the leakage, the user involvement, the
self-affirming or self-denying processes and so on that typify programme-level interventions
(see Sections 1.21 – 1.27).
We have suggested that the only way to synthesise the evidence on such programmes is to
review the primary sources not study by study, but programme theory by programme theory.
This change in the unit of analysis is depicted in the lower half of Figure 9. Evidence is
sought and accumulates (to a greater or lesser extent) in respect of each component
process and the respective accumulations are represented by the number of ‘evidence
arrows’. There is no absolute expectation on this score. Sometimes ‘process evidence’ will
outdo ‘outcome evidence’ and vice versa. Sometimes there will be more data about
practitioners than service users, but often this may be reversed. Likewise, the grey literature
and academic research balance will not be stable. Note that with this model, there is no
uniform mode of synthesis (and thus no parallel to the ‘funnelling’ of evidence in
conventional systematic review as shown in Figure 8c).
ESRCrealsynWP 33
Realist synthesis: an introduction
Programme
Evidence
Synthesis
Future
Programme
ESRCrealsynWP 34
Realist synthesis: an introduction
of programme theories but anticipates that they will meet a different pattern of negotiation,
resistance, bloom and fade.
The reason why a ‘systematic review’ of interventions A, B, C, D and E can inform the
design of an entirely new intervention F is that the synthesis works at the level of ideas.
Since the evidence base is built at the level of programme theories, we suggest that
research synthesis is able to draw in and advise upon a heterogeneous range of
interventions. The relationship between the evidence base and future programmes is thus
‘many-to-many’ and, as such, this method is uniquely linked to the creative design phase of
policy interventions.
Again, the contrast with traditional evidence synthesis should be stressed. This starts with
evaluation’s one-to-one relationship with a particular programme. Although systematic
review enlarges on the circle of studies, this same equivalence is assumed: this body of
evidence relates to this type of programme. Accordingly and traditionally, it has been
supposed that the future design of heath service learning collaboratives will be supported by
reviews of existing health service learning collaboratives, that learning about future
healthcare public disclosure initiatives will come from bygone studies of published hospital
league tables and surgeons’ report cards, that a synthesis on the evidence of health advice
and help lines is what is needed to inform a new NHS Direct intervention, and so on.
Rather than assume that review will lead decision makers to imitate schemes lock, stock and
barrel, realist review assumes that what transfers are ideas. So where can one look for ideas
and what are the potential targets for these ideas? The really useful sources and recipients
turn out to be many and varied, the key point being that the policy comparisons made and
drawn upon will benefit from the developing theory.
ESRCrealsynWP 35
Realist synthesis: an introduction
The same principle can be applied to the utilisation of the products of realist review. What
are passed on are ideas about ideas, and good ideas can take root the world over. The
orthodox ‘implications and recommendations’ of realist review still remain within the same
family of interventions. To take our standard example of the publication of hospital ratings,
the finding that practitioners have often sought to ‘outmanoeuvre’ the key performance
indicators would be of considerable use for those seeking to initiate or improve such a
scheme. But the same idea of ‘indicator resistance’ would probably apply even if the ratings
were not made public and used only for internal regulation and audit.
This example suggests a much more radical principle for constructing the evidence base.
We know that quite diverse interventions share common components. Most obviously, they
all need designing, leading, managing, staffing, monitoring, reviewing, and so on. And they
all suffer from problems of communication, reward systems, staff turnover and resistance,
competing priorities, resource constraints, and so on. At the extreme, there are probably
some common processes (such as how people react to change) and thus generic theories
(e.g. about human nature) that feed their way into all service delivery initiatives. If synthesis
were to concentrate on these underlying mechanisms, then the opportunity for the utilisation
of review materials would be much expanded. A hypothesis that might be tested about
realist review, therefore, is that reviews should align with specific intervention theories.
Insofar as they concentrate on these very general levers, reviews can be ‘recycled’ to inform
all manner of future programmes.
An example of how this might work was suggested by a recent review by Greenhalgh et al
(2004), which drew on the principles of realist review. The review topic was the ‘diffusion,
dissemination and sustainability of innovations in health service delivery and organisation’.
Here indeed is a generic topic – ‘how to spread good ideas’ no less. What the review
unearthed was a complex model of the mechanism and contexts that condition the
transmission and acceptance of new ideas. The entire model cannot be reproduced here,
but successful diffusion was found to rest on the specific attributes of the innovation, the
characteristics and concerns of potential adopters, the lines and processes of
communication and influence, the organisational culture and climate, the inter-organisational
and political backdrop, and so on. The key point for present purposes is that this model
(which had been painstakingly constructed from primary studies on a range of interventions)
was then tested on four quite different and highly diverse interventions, namely, integrated
care pathways, GP fundholding, telemedicine and the UK electronic health record. Despite
extracting the ideas from studies unrelated to these interventions, the authors were able to
make sense of the rather different footprint of outcomes and outputs associated with each of
them. The very heterogeneity of these case studies signals the potential for using a theory-
based explanatory framework to inform the development of upcoming initiatives.
These ideas on programme design represent the most speculative edge of the emerging
realist framework. With that caveat in mind, a further potential advantage (and economy) of
‘generic theory reviews’ can be noted. This concerns the difficult business of getting the
evidence horse before the policy cart. As noted, this is impossible with summative
evaluations, which are conducted after the main bulk of implementation and spending
decisions are made. But even if the chosen instrument for evidence gathering is the
systematic review, there is still the tricky business of choosing which family of programmes
to chew over. Normally this is done in ‘anticipation’ that some new policy surge is brewing
and there is time to fix on an appropriate subject matter for a preliminary review. Realist
review does not require advance notice of the bus timetable. However ‘new’ the next vehicle
that comes along, it is supposed that it will comprise common components and suffer similar
setbacks of certain generic theories that, if the evidence base for policy making was so
organised, would be the subject matter of the review library.
ESRCrealsynWP 36
Realist synthesis: an introduction
However, realist review has important shortcomings that limit its applications. We list three of
these below.
In this conviction, we depart sharply from the most ferocious advocates of procedural
uniformity and protocol in research synthesis (Straus and McAlister, 2000, Cochrane
Reviewers’ Handbook, 2004). One of the great themes of the Cochrane and Campbell
collaborations is that in order to rely on reviews they need to be reproducible. This
desideratum is conceived in terms of technical standardisation and clarity, so that by
following the formula it matters not whether team A or team B has carried out the review. It is
the procedure itself that is considered to furnish certainty.
Our objections to the ‘reproducibility principle’ are twofold. The first lies with the sheer
impossibility of making transparent every single decision involved in research synthesis.
When one is reviewing the vast literature associated with complex service delivery
interventions, and if one admits all manner of empirical research, grey literature and even
policy thought pieces as potential evidence, one is faced with an endless task that has at
some stage to be arbitrarily terminated. And that requires judgement. We are inclined to
believe that this happens anyway in all forms of review. When faced with search results that
have generated a thousand documents, one has to rely on a mixture of experience and
sagacity to sift out those with greatest relevance. And, yes, this depends on intuition on such
ESRCrealsynWP 37
Realist synthesis: an introduction
matters as to whether one can rely on titles and abstracts to make the cut or how much effort
to put into finding that obscure paper that seems beyond retrieval.
Our second objection is more philosophical. We question whether objectivity in science has
ever stemmed from standardisation of procedure. Our preference is for a model of validity
that rests on refutation rather than replication. In the context of research synthesis this does
require ‘showing one’s working’, ‘laying down one’s methodological tracks, ‘surfacing one’s
reasoning’, but clarity on this model is for the purpose of exposing a developing theory to
criticism. A fundamental principle of realist review is that its findings are fallible. The whole
enterprise is about sifting and sorting theories and coming to a provisional preference for
one explanation. Constant exposure to scrutiny and critique is thus the engine for the
revision and refinement of programme theories. It is based on a system in which reviewers
challenge rather than police each other. In the words of Donald Campbell (after whom the
Campbell Collaboration was named):
‘The objectivity of physical science does not come from turning over the running of
experiments to people who could not care less about outcomes, nor from having a
separate staff to read the meters. It comes from a process that can be called
‘competitive cross-validation’ and from the fact that there are many independent
decision makers capable of rerunning an experiment … The resulting dependability of
the reports comes from a social process rather than from dependence on the honesty
and competence of any single experimenter. Somehow in the social system of science
a systematic norm of distrust, combined with ambitiousness, leads people to monitor
each other for improved validity. Organized distrust produces trustworthy reports.’
(Campbell and Russo, 1999, p.143)
As we have argued in Part I, realist review focuses on analytically defined theories and
mechanisms rather than on lumpy, leaky and incongruent whole programmes. With its
emphasis on contextual contingency and temporal changes in the ways programmes are
implemented and understood by their participants, it is chary about serving up ‘net effects’
conclusions. Enduring empirical generalisation can only be discovered in artificially closed
systems and health service delivery is palpably an open system. Whether this modesty of
the conclusions of realist review is a drawback or a virtue depends on the eye of the
beholder. It certainly should be made clear to policy makers, anxious to help wrest greatest
utility from our precious and finite resources, that it cannot give an easy answer to the
question of how to get more bang for the buck.
ESRCrealsynWP 38
Realist synthesis: an introduction
the precise questions to be put in the review. It requires know-how in respect of a range of
disciplines, methodologies and literatures to be able seek out, digest and assess the
appropriate bodies of evidence. It demands the skills of an intellectual generalist rather than
those of a super-specialist. It requires some finesse in respect of research design to be able
to match the developing theory to the available data. And whilst it does not require workaday
familiarity with the precise intervention or service under review, it does trade on the
possession of a general nous about programme implementation. It is not, therefore, a task
that can be handed down to newly doctored research assistants, working to an established
formula.
This ‘experts only’ feature of realist review again contrasts sharply with the cry from the
evidence-based medicine camp that the knowledge embodied in personal expertise is
‘anecdotal’ and not to be trusted. Rather (such protagonists claim), any competent reviewer,
armed with a focused question and a set of rigorously developed checklists, can find the
relevant papers, develop a robust critique of the evidence, and produce a summary with a
clear estimate of effect size and quantified level of confidence. But this is surely only true
when the decision maker is not required to ski off piste. The research literature on expert
decision making finds it to be a rapid, intuitive, and seemingly idiosyncratic process, which
incorporates and makes sense of multiple and complex pieces of data including subtle
contextual evidence. In grey areas, the expert breaks the rules judiciously and justifies
himself reflectively. Novice decision making, on the other hand, is rule-bound, formulaic, and
reductionist. It ignores anything that is seen as ‘complicating factors’ and makes little
concession to context. In grey areas, the novice persists in applying the formula and proves
unable to bend the rules to accommodate the unanticipated (Eraut, 1994).
We are not claiming here that the realist approach is inherently ‘cleverer’ than conventional
systematic review, nor indeed that repeated attempts at the technique will make an
individual good at it. It is because realist review involves so many grey zones (including, but
not confined to, ‘grey literature’), so much off-piste work, so much wallowing in the subtle
and contextual, so much negotiation of meaning with real-world practitioners, that we set so
much store by our ‘novices beware’ warning.
Patience and memory are not, of course, the prime characteristics of decision makers. But
realist review is fundamentally pragmatic, and much can be achieved through the drip, drip,
drip of enlightenment. This metaphor leads us to rather more positive thoughts on the
nimble-fingered and sideways-glancing policy maker. In the days before ‘evidence-based
policy’ we had policy from the seat-of-the-pants of experience. Reasoning went something
like this: ‘we are faced with implementing this new scheme A but it’s rather like the B one we
tried at C, and you may recall that it hit problems in terms of D and E, so we need to watch
out for that again. Come to think of it I’ve just heard they’ve just implemented something
ESRCrealsynWP 39
Realist synthesis: an introduction
rather like A over in the department of K, so I’ll ask L whether they’ve come up with any new
issues etc etc.’
Not only is realist review equipped to uphold and inform this kind of reasoning (if you like, to
give it an evidence base), it is also well suited to tapping into the kind of informal knowledge-
sharing that is being encouraged through such schemes as the ‘Breakthrough’ quality
improvement collaboratives that are part of the NHS Modernisation Programme and which
explicitly seek to transfer the ‘sticky knowledge’ that makes for success in complex
organisational innovations by bringing policy makers and practitioners together in informal
space (Bate et al, 2002). Realist review supplements this approach to organisational
learning by thinking through the configurations of contexts and mechanisms that need to be
attended to in fine-tuning a programme. With a touch of modernisation, via the importation of
empirical evidence, it may still be the best model.
Throughout the document, we have attempted to contrast the realist approach with the more
traditional approaches to systematic review, within the Cochrane and Campbell traditions.
We have argued for a different methodology and set of methods for the review process to
deal with the complexity of interventions that are considered in the process of health service
policy making and decision making. Whilst acknowledging the emergence of new
developments within the more traditional approaches (for example, the concept of ‘mediator
and moderator’ versions of meta-analysis and the integration of qualitative and quantitative
evidence within reviews), we believe the theory-driven and explanatory nature of realist
review offers something new and complementary to existing approaches.
However, in rejecting the standardisation of the review process advocated by the more
traditional approaches (for example, in terms of rigid inclusion and exclusion criteria or the
use of standard data extraction templates), some may question whether and how realist
reviews are different from the old time literature review. As is well known, the problem with
‘old fashioned’ literature reviews is that no one knows what they are and, methodologically
speaking, they come out differently every time. Indeed, that is one of the main reasons why
the science of systematic review emerged. We believe that to some extent realist review
draws on the strengths of the traditional literature review in that it aims to address a wider
set of questions and is less restrictive about where to look for evidence. However, the
methods outlined in this document bring a logic and a structure to the review process, which
may in fact formalise what the best narrative reviews have done instinctively and ensure that
the process of realist review is transparent and open to critique and challenge by others.
ESRCrealsynWP 40
Realist synthesis: an introduction
others to engage in dialogue and to embark on realist reviews to refine and develop the
approach further.
Endnote
One final point of clarification is due and we are presented with this opportunity thanks to a
query from an anonymous referee of this paper. The question put to us was whether our
subject matter is treated as a ‘complex’ or, merely, as a ‘complicated’ system? In truth, we
have been happy to go along with ordinary langue usage and have also thrown in ‘intricate’
as a further synonym to describe service interventions and innovations. Such distinctions
are, however, crucial if one comes at these issues from a background in complexity theory
(and its sisters such as chaos theory, artificial life, evolutionary computing, etc.) The rule
differentiating the two goes something like this - what distinguishes a complex system from a
merely complicated one is that some behaviours and patterns emerge in complex systems
as a result of patterns of relationships between elements (see, for instance, Mitleton-Kelly
2003)
So the systems to which we refer are indeed complex. Health service delivery is self-
transformational. Policy interventions which aim to change it spring unanticipated and
emergent leaks all of the time. As we have noted, in 'solving' a problem an intervention can
create new conditions that eventually render the solution inoperable. Our interrogator’s
doubts probably spring from the analytic strategies we put forward for synthesising evidence.
These approaches, surely enough, go no further than treating interventions as complicated
systems. That is to say, the advice is to break programmes into component theories and
review the evidence on those bits. We recognise that this is nothing other that the good old
'analytic method'. Like much else in our proposals the reasoning here is pragmatic. Whilst
we appreciating that any particular theory adjudication will leave some further tortures of
chaos and systems theory untouched, we have yet to find any evaluation tools or review
methods that are not selective.
ESRCrealsynWP 41
Realist synthesis: an introduction
References
Abrams P (1984) The uses of British sociology 1831-1981. In: Bulmer M, ed. Essays on the
History of British Social Research. New York: Cambridge University Press.
Bhaskar R (1978, 2nd edition) A realist theory of science. Brighton: Harvester Press.
Bickman, L ed. (1987) Using program theory in evaluation. San Francisco: Jossey Bass.
(New Directions for Evaluation No 33.
Chen, H, Rossi, P (eds) (1992) Using theory to improve program and policy evaluations.
Westport: Greenwood Press.
Cochrane Reviewers’ Handbook 4.2.0 (updated March 2004). The Cochrane Library.
Department of Health (1998) Our healthier nation: a contract for health: a consultation paper.
London: The Stationery Office (Cm 3852).
Department of Health (2002) Research governance framework for health and social care.
London: The Stationery Office.
Eraut M (1994) Developing professional knowledge and competence. London: Falmer Press.
ESRCrealsynWP 42
Realist synthesis: an introduction
Fisse B, Braithwaite J (1983) The impact of publicity on corporate offenders. Albany: State
University of New York Press.
Glaser B, Strauss A (1967) The discovery of grounded theory: strategies for qualitative
research. Chicago: Aldine.
Harré R (1978) Social being: a theory for social psychology. Oxford: Blackwell.
Lavis J N, Ross S E, Hurley J E et al (2002). Examining the role of health services research
in public policymaking. Milbank Quarterly; 80(1):125-154.
Layder D (1998) Sociological practice: linking theory and social research. London:Sage.
Lomas J (2000) Using ‘linkage and exchange’ to move research into policy at a Canadian
foundation. Health Affairs, 19(3): 236-240.
ESRCrealsynWP 43
Realist synthesis: an introduction
McEvoy P, Richards D (2003) Critical realism: a way forward for evaluation research in
nursing? Journal of Advanced Nursing; 43(4): 411-420.
Mitchell K (1997) Encouraging young women to exercise: can teenage magazines play a
role? Health Education Journal; 56(2): 264-273.
Norrie A (1993) Crime, reason and history: a critical introduction to criminal law. London:
Weidenfeld and Nicolson.
Pawson R (1989) A measure for measures: a manifesto for empirical sociology. London:
Routledge.
Pawson R (2002b) Does Megan's Law Work? A theory-driven systematic review. London:
ESRC UK Centre for Evidence Based Policy and Practice. (Working Paper 8). Available via:
www.evidencenetwork.org
Pawson R (2003) Assessing the quality of evidence in evidence-based policy: why, how and
when? Working Paper No. 1. ESRC Research Methods Programme. Available at
www.ccsr.ac.uk/methods
Putnam H, Conant, J (ed) (1990) Realism with a human face. Cambridge, Mass.: Harvard
University Press.
ESRCrealsynWP 44
Realist synthesis: an introduction
Steinmetz G (1998) Critical realism and historical sociology. a review article. Comparative
Studies in Society & History; 40(1): 170-186.
Weiss, C (1980) Knowledge creep and decision accretion. Knowledge: Creation, Diffusion,
Utilization 1(3): 381-404.
Weiss C (1997) Theory-based evaluation: past, present and future. In: D Rog, D Fournier
(eds) Progress and future directions in evaluation: perspectives on theory, practice and
methods. San Francisco: Jossey Bass. (New Directions for Evaluation No. 76).
Weiss C (2000) Which links in which theories shall we evaluate? In: Rogers P, Hasci I,
Petrosino A, Hubner T (eds). Program theory in evaluation: challenges and opportunities.
San Francisco: Jossey Bass. (New Directions for Evaluation No. 87).
Weiss C H, Bucuvalas M J (1980) Social science research and decision-making. New York:
Columbia University Press
Wolfsfeld G (1997) Media and political conflict: news from the Middle East. Cambridge:
Cambridge University Press.
ESRCrealsynWP 45
Realist synthesis: an introduction
Figure 6: An initial ‘theory map’ of the public disclosure of health care information.
ESRCrealsynWP 46
ESRC Research Methods Programme
CCSR
Faculty of Social Sciences
Crawford House,
University of Manchester,
ManchesterM13 9PL
www.ccsr.ac.uk/methods/