1 s2.0 S004016252031310X Main

Technological Forecasting & Social Change 163 (2021) 120484
Contents lists available at ScienceDirect
Technological Forecasting & Social Change

journal homepage: www.elsevier.com/locate/techfore
Consensus in the delphi method: What makes a decision change?

Maite Barrios a, b, *, Georgina Guilera a, b, Laura Nuño b, c, Juana Gómez-Benito a, b
a
Department of Social Psychology and Quantitative Psychology, University of Barcelona, Barcelona, Spain
b
Group on Measurement Invariance and Analysis of Change (GEIMAC), Institute of Neurosciences, University of Barcelona, Barcelona, Spain
c
Clinical Institute of Neurosciences (ICN), Hospital Clinic, Barcelona, Spain
A R T I C L E I N F O A B S T R A C T
Keywords: We examined whether giving feedback to participants in a Delphi study about the level of agreement across the
Delphi method expert panel had an effect on opinion change between rounds. We also considered the potential influence of
Opinion change participants’ sociodemographic and professional characteristics. Five three-round Delphi studies were conducted
Consensus threshold
independently, in which a total of 628 mental health experts responded to all three rounds. In each study,
Group agreement
Feedback effect
participants had to decide, based on their experience, whether a series of categories were relevant. The per
centage of group agreement (i.e., percentage of participants who considered each category as relevant) in round
2 was shown as feedback in round 3, and responses in rounds 2 and 3 were considered to analyze opinion change.
Results showed that when the feedback given in round 3 indicated that ≥75% of experts considered a category to
be relevant, there was a further shift in opinion towards the group opinion (i.e., the category then yielded even
greater consensus), whereas if the feedback indicated <75% group agreement, individual opinions tended to shift
against the group opinion (i.e., consensus over the category decreased). Neither sociodemographic nor profes
sional variables had a significant effect in explaining opinion shift. These results show that in Delphi studies,
feedback has an influence on individual responses and the achievement of consensus.
1. Introduction and summarizes the responses of the panel of experts in order to provide
feedback to participants for the following round.
The Delphi method has been widely used in recent decades and is Whereas anonymity reduces the effect of dominant individuals on
now regarded in academic research as a valuable technique for reaching participants’ responses, the use of controlled feedback encourages ex
consensus about specific issues when empirical evidence is scarce or perts to reassess their initial judgments based on the information pro
contentious. The method aims to achieve consensus about a specific vided by the research team in each round. Feedback thus allows each
topic by using several rounds of questionnaires to collect data from a participant to generate additional insights about the specific questions
panel of selected experts on the topic of interest (Keeney et al., 2006), or items and, consequently, change his or her responses in light of the
and it has been employed to make predictions or help decision-making group’s opinion. This means that the response given to each item can
in numerous disciplines, including medicine (Prinsen et al., 2014; vary across rounds, thereby favoring the convergence of opinions (Lin
Sinha et al., 2011), nursing (Keeney et al., 2006), psychology (van der stone and Turoff, 1975). In this respect, the Delphi method is well suited
Vaart et al. 2014), education (Zawacki-Richter, 2009), and business as a consensus-building technique.
(El-Gazzar et al., 2016; Jiang et al., 2017). Although a number of vari
ants of the Delphi method have been proposed, inherent characteristics
of the technique include anonymity among experts and a controlled 1.1. Consensus in delphi studies
feedback process provided in a series of rounds (Linstone and Turoff,
1975; Nowack et al., 2011; Rowe and Wright, 1999). Anonymity is Consensus is one of the most controversial methodological issues in
guaranteed since the process is coordinated by a research team using, in the Delphi process. Although achieving consensus among participants is
most cases, an online platform or e-mail, thus avoiding any interaction a key feature in Delphi studies, what is accepted as consensus, or how it
between participants. After the first round the research team analyzes is reached, remains unclear (Boulkedid et al., 2011; Von der Gracht,
2012).
* Corresponding author.
E-mail address: mbarrios@ub.edu (M. Barrios).
https://doi.org/10.1016/j.techfore.2020.120484
Received 4 November 2019; Received in revised form 15 October 2020; Accepted 19 November 2020
Available online 1 December 2020
0040-1625/© 2020 Elsevier Inc. All rights reserved.
M. Barrios et al. Technological Forecasting & Social Change 163 (2021) 120484
In theory the Delphi process can be iterated until consensus is ach opinion change (Bolger et al., 2011; Kauko et al., 2014; Makkonen et al.,
ieved or a minimum degree of stability among panelists’ responses is 2016; Rowe et al., 2005). Research is scant regarding the potential in
reached (Von der Gracht, 2012). Indeed, stability, defined as the con fluence of inherent characteristics of the expert panel such as gender or
sistency of experts’ responses between successive rounds of a Delphi age. Hussler et al. (2011) found that although older participants tended
study, has been considered a necessary criterion in order to assess to attribute a higher level of importance to the items they needed to rate,
consensus (Dajani et al., 1979; Von der Gracht, 2012). However, most the age of participants was unrelated to opinion change. Similarly, and
Delphi studies tend to conduct a specific number of rounds with the aim more recently, Makkonen et al. (2016) found no evidence that age and
of eliciting consensus among participants, without controlling if stability gender had explanatory power with regard to opinion change.
of responses is attained, without a formal definition of what is going to Although controlled feedback is a fundamental characteristic of the
be considered consensus or even without specifying a threshold value Delphi method, Delphi studies differ widely in the kind of feedback
that determines when consensus has been achieved (Humphrey-Murto provided to participants (Nowack et al., 2011). A broad distinction can
et al., 2017; Boulkedid et al., 2011; Diamond et al., 2014; Foth et al., be made between statistical summaries, which show the majority
2016). These are important issues, since stopping a Delphi process based opinion, and argumentative feedback, which indicates why experts hold
merely on a specific number of rounds may lead to invalid and mean certain opinions (Rowe et al., 2005). It is assumed that the lack of clues
ingless results (Schmidt, 1997). as to the sources of the feedback (i.e., the identity of other participants)
Although there are several approaches to defining or reporting allows participants to focus upon the content of feedback rather than
consensus (Birko et al., 2015; Boulkedid et al., 2011; Keeney et al., 2006; being distracted by extraneous social information, which may reduce
Meijering et al., 2013; Von der Gracht, 2012), the systematic reviews by the likelihood of their being influenced in a certain direction (Rowe
Diamond et al. and Foth et al. (Diamond et al., 2014; Foth et al., 2016) et al., 2005). However, although the use of anonymous information can
indicate that consensus is most commonly defined based on the per reduce the effect of dominant individuals on participants’ responses, this
centage of agreement with a specific response, followed by the per information nonetheless allows participants to rethink items based on
centage of participants who rate items at the upper extremes of the the responses of others. In this regard, several studies indicate that
Likert scales used (e.g., items scored as 4 and 5 on a 5-point Likert scale). feedback may influence opinion change among participants, who tend to
Thus, if a Delphi study is performed over a specific number of rounds, follow the view of the majority (Makkonen et al., 2016; Meijering and
the study should ideally specify how the degree of agreement reached by Tobi, 2018; Rowe et al., 2005; Scheibe et al., 1975), although this in
the experts at the end of the study will be interpreted. However, a fluence partly depends on the perceived importance of whom the in
threshold percentage is not always provided a priori in most Delphi formation comes from (Brookes et al., 2016; Harman et al., 2015;
studies, and the range reported as an accepted consensus is very wide Turnbull et al., 2018). In this respect, Harman et al. found, in a Delphi
(50 – 97%); the median threshold accepted as consensus is 75% agree study which included feedback from different stakeholder groups, that
ment among participants (Diamond et al., 2014; Foth et al., 2016). the responses of parents and children and other health professional
Furthermore, even if a cut-off point is specified a priori, this may seem groups have a different impact on the perceived importance of outcomes
arbitrary from a conceptual or practical point of view if certain items than do those of their peer group alone (Harman et al., 2015).
considered as relevant fall just below the cut-off. The kind of feedback provided to participants (i.e., argumentative or
statistical) has also been associated with the tendency to shift opinion
1.2. Change of opinion between Delphi rounds (Bolger et al., 2011; Rowe and Wright, 1996;
Turnbull et al., 2018). Specifically, it has been found that argumentative
In addition to the uncertainty about what should be considered feedback (i.e., reasons and justifications), as opposed to statistical
consensus, little attention has been paid to what favors the achievement feedback (e.g., median and range of group responses), provokes less
of a consensus response. Notably, the potential effect of the specific opinion change among participants (Rowe and Wright, 1996). However,
characteristics of the expert panel (e.g., perceived expertise, years of in a recent study based on the responses of a post-Delphi study ques
experience, gender, age) or the influence of controlled feedback on in tionnaire, participants reported having been influenced to a similar
dividual opinion change have scarcely been investigated (Hussler et al., extent by the scores of other participants and by the written comments
2011; Makkonen et al., 2016; Meijering and Tobi, 2016, 2018; Rowe from other panel members during the re-rating process (Turnbull et al.,
et al., 2005). 2018). At all events, there are still no agreed guidelines about how to
Although it is accepted that the panel in a Delphi study should be provide feedback in a Delphi study, although it has been noted that the
composed of participants with considerable expertise and knowledge in most common way is to give a summary of statistics using a mean, a
relation to the issue being investigated (Adler and Ziglio, 1996; Powell, median or the percentage of agreement (Boulkedid et al., 2011).
2003; Trevelyan et al., 2015), it is also regarded as beneficial for there to In light of the above, our goal in this methodological study was to
be a certain level of diversity in the demographic characteristics of explore the influence of controlled feedback on participants’ responses
participants, as well as in aspects related to their professional experience across Delphi rounds. Given that any consensus threshold is open to
(Förster and Von der Gracht, 2014; Hussler et al., 2011; Powell, 2003). It question, even if it is commonly used, a better understanding of the
has been suggested that this diversity allows for greater variability in the impact which controlled feedback can have on the Delphi process could
skills and points of view of panelists, thus ensuring that opinions come help to determine how consensus is reached and what the most suitable
from multiple independent sources (Förster and Von der Gracht, 2014). consensus threshold might be. Thus, our specific aim was to examine the
Regarding opinion change, and based on the theory of errors, those influence of controlled feedback on opinion change between two Delphi
experts with more experience or greater perceived expertise should, a rounds and how it may favor or hinder the reaching of consensus among
priori, be less inclined to change their opinion once they have responded participants. To this end, the influence that feedback has on individual
to questions in the first round (Dalkey, 1975). In this context, Hussler opinion change was compared across different levels of group agree
et al. (2011) compared opinion change between experts and laypersons ment. We also explored the potential influence of participants’ socio
and found that experts were more confident in their original opinions demographic and professional variables.
and were more reluctant to amend their opinions, even if feedback
confronted them with contradictory judgments. Indeed, Hussler et al. 2. Method
(2011) found that only 3% of the opinions given by experts were
modified in the second round, as compared with 21% of the responses 2.1. Participants
provided by laypersons. However other studies have not found support
for an association between the greater experience of experts and less The experts included in this study (hereinafter referred to as experts
2
or participants) were all health professionals (specifically, psychiatrists, list of categories. The whole Delphi process is summarized in Fig. 1.
psychologists, nurses, occupational therapists, and social workers) with
at least one year of experience in the direct treatment of individuals with 2.3. Data analysis
schizophrenia. They were recruited in the context of five independent
Delphi studies (one for each of the aforementioned health professions) Descriptive statistics were used to describe the sociodemographic
conducted between April 2016 and July 2018. Several strategies were and professional characteristics of participants, obtaining measures of
used to recruit experts from around the world: international professional central tendency and dispersion (i.e., mean and standard deviation) for
associations, universities with healthcare professional training pro quantitative variables and frequency distribution and percentage for
grams, and hospitals were contacted, and we also made use of literature qualitative variables. The kappa coefficient and its 95% confidence in
searches, LinkedIn contacts, and personal recommendations. In order to terval were calculated in order to check the stability of responses be
avoid language barriers and to increase the representativeness and tween rounds 2 and 3, based on the respective percentages of group
participation of experts from around the world, all study materials (i.e., agreement for each category (i.e., percentage of participants who
contact letters, questionnaires, etc.) were available in five languages (i. selected that category as “relevant” in each round).
e., Chinese, English, French, Russian, and Spanish) and participants In order to examine the effect of feedback on responses in the third
could choose the language in which they wished to respond. All the round, we calculated the percentage of categories that were rated as
study materials were initially written in Spanish and were then trans “relevant” in the third Delphi round for different intervals of group
lated and checked for accuracy by teams of at least two independent agreement feedback (e.g., less than 50% of group agreement, 50–54.9%
native speakers in each language, all of whom had expertise in the field of group agreement, etc.). Group agreement feedback was also consid
of health sciences. The initial contact included an invitation to take part ered in order to study the shift in individual opinions between the sec
and a detailed description of the project targets, the Delphi process, and ond and third rounds. Specifically, for each of the intervals of group
the timeline. Demographic and professional data were also requested. Of agreement considered in the previous analysis we calculated the per
the 1555 health professionals who agreed to participate, 777 experts centage of categories for which the degree of group agreement changed
finally participated in the first round. (either increasing or decreasing, based on participants’ individual re
sponses) between rounds 2 and 3. An odds ratio and its 95% confidence
2.2. Delphi process interval (95% CI) were then calculated in order to assess whether the
percentage of group agreement achieved in round 2 and shown as
The five independent worldwide Delphi studies (Nuño et al., 2018, feedback in the next round had an effect on the degree of agreement
Nuño et al., 2019, Nuño et al., 2019; Nuño et al., 2021a; Nuño et al., achieved in round 3.
2021b) were carried out following the same research design so as to We also analyzed what we termed “congruent” and “incongruent”
ensure a high level of comparability. Each study was conducted in order opinion change in round 3 following feedback about the level of group
to identify, from the perspective of one of the five aforementioned health agreement achieved in round 2. An opinion change was considered
professions, the most common problems presented by people with congruent when a participant changed his/her response in order to make
schizophrenia. All the Delphi studies comprised three rounds, were it congruent with the group opinion (e.g., a participant rates a category
conducted through an online survey system (www.qualtrics.com), and as “not relevant” in the second round, but after receiving the group
lasted around three months. For each round, participants always had agreement feedback indicating that the category is mainly considered
two weeks to respond, and three reminders were sent per round: the “relevant”, the participant decides to change his/her response in the
first, one week before the deadline; the second, two days before the third round and rates the category as “relevant”). By contrast, an
deadline; and the third, on the deadline day itself. Participants were able incongruent opinion change occurred when a participant changed his/
to answer parts of the survey at different times, and the expected her response in the opposite direction to the group opinion (e.g., a
completion time for each survey round was about 15 min. participant rates a category as “relevant” in the second round and the
Each Delphi study began with six open-ended questions about issues group agreement feedback received in the third round shows high group
related to functioning in schizophrenia. All the responses collected in agreement, indicating that the category is also considered “relevant” by
this first round were linked to any of the more than 1400 categories of most of the experts; however, in the third round the participant decides
the International Classification of Functioning, Disability, and Health to change his/her response and rates the category as “not relevant”).
(ICF; World Health Organization, 2001) by two health professionals Similarly, a congruent non-change of opinion occurred when a partici
trained in the use of the ICF and with experience of providing care to pant did not change his/her response in the third round because it was
individuals with schizophrenia (further details about this process can be already congruent with the group opinion (e.g., a participant rates the
consulted in Nuño et al., 2018). Those categories reported by at least 5% category as “relevant” in the second round and the group agreement
of the experts were selected for inclusion in the second Delphi round feedback received in the third round shows high group agreement,
(Faulks et al., 2016; Selb et al., 2015). In this second round, the experts indicating that the category is also considered “relevant” by most of the
received a list of the selected ICF categories, along with their respective experts; thus, congruently, the participant decides to maintain his/her
definitions. Participants were then asked to judge, for each category, response and again rates the category as “relevant”). Finally, an incon
whether they thought the category was relevant from their professional gruent non-change of opinion occurred when a participant did not
perspective to the assessment and/or treatment of individuals with change his/her response in the third round despite it being incongruent
schizophrenia, taking into account that the final list should be as short as with the group opinion (e.g., a participant rates the category as “not
possible to be practical but as comprehensive as necessary to capture the relevant” in the second round, but the group agreement feedback
most relevant needs of this population. Each participant judged between received in the third round shows high group agreement, indicating that
160 and 184 categories. In the third round, participants were once again the category is considered “relevant” by most of the experts; however,
asked to judge each category, but this time they were given feedback (for the participant, incongruently, decides to maintain his/her response and
each category) about the responses of the expert panel as a whole. The rates the category as “not relevant”).
feedback provided to participants consisted in the percentage of par To examine the influence of sociodemographic and professional
ticipants who had considered the category relevant in the second round variables on the percentage of opinion change we performed multiple
(i.e., percentage of group agreement feedback), as well as a reminder of regression analysis using the forced entry method. A total of eight
their own previous response (i.e., relevant or not relevant). Thus, in this multiple linear regression models were calculated to predict the
third round, participants had the opportunity to consider the panel’s following eight dependent variables: percentage of congruent and
opinion, to revise their previous responses, and to respond again to the incongruent opinion change and of congruent and incongruent non-
3
Fig. 1. The Delphi process.
change of opinion for each participant based on a specific threshold level collinearity effects among predictors were also assessed.
of agreement achieved in round 2 (i.e., 75%). Age, gender, profession,
years of professional experience, perceived expertise (rated using a 5- 3. Results
point Likert scale), and the participant’s geographical region of origin
were included as independent variables in each model. Potential An overview of the five independent Delphi studies is presented in
4
Table. 1 75%. It can also be seen in Fig. 2 that once feedback in round 3 indicated
Overview of the five Delphi studies. at least 65% group agreement in round 2, a small proportion of cate
Delphi Professional Participants Participants Number of gories showed no change in their relevance rating.
study profile round 1 n (%) round 3 n (%) categories Fig. 3 shows how the difference in the percentage of group agree
rated ment between rounds 2 and 3 increases as we move away from the
1 Psychiatrists 352 (45.30) 303 (48.25) 166 threshold of 75% consensus. Above and below this threshold, group
2 Psychologists 175 (22.52) 137 (21.82) 176 agreement becomes, respectively, progressively stronger and weaker
3 Social workers 57 (7.34) 36 (5.73) 160 between rounds 2 and 3.
4 Occupational 92 (11.84) 73 (11.62) 184
therapists
Based on the 75% threshold of group agreement, 514 categories
5 Nurses 101 (13) 79 (12.38) 177 achieved group agreement of at least 75% in round 2 and showed a
Total 777 628 (80.82) 863 higher level of agreement in round 3. Conversely, 129 categories ach
ieved group agreement below 75% in round 2 and showed a lower level
of agreement in round 3 (75 categories showed the same group agree
Table 1. A total of 777 health professionals completed the first Delphi
ment in rounds 2 and 3). Calculation of an odds ratio showed that when
round, of whom 628 completed the second and third rounds. This im
a category achieved group agreement of at least 75% in round 2 the
plies a response rate across rounds one to three of 80.8%. Data from
probability of its group agreement increasing in round 3 was higher than
those participants who completed both the second and the third round
that of a category with group agreement below 75% in round 2 (OR =
were used for the analysis conducted in this study. The sociodemo
15.327; 95% CI = 10.198 – 23.036; p < .0001). We also calculated the
graphic and professional data of these participants are shown in Table 2.
percentage of congruent and incongruent shifts of opinion for each
The kappa coefficient was 0.776 (95% CI 0.772 - 0.780), which ac
participant. Data showed that when the feedback given in round 3
cording to the guidelines in Landis and Koch (1977) indicates substantial
indicated group agreement below 75% in round 2, the categories in
stability in the response of experts between rounds 2 and 3. In terms of
question were associated with a significantly higher percentage of
participants’ responses regarding the relevance of each category, 95,547
opinion change in the third round (both congruent and incongruent).
category ratings (88.8%) did not change between the second and the
Conversely, when the feedback in round 3 indicated group agreement of
third round, whereas for the remaining 12,038 (11.2%) a change was
at least 75%, participants were significantly more likely to maintain the
observed. On average, the proportion of opinion changes from the sec
same response in the third round (i.e., congruent with the group
ond to the third round per participant was quite low (Median 0.08; min.
agreement feedback they had received). Table 3 shows the percentage
0.0, max. 0.82). This means that 50% of participants changed their
shift of opinion in each condition based on this 75% threshold.
opinion for less than 8% of the categories they had to rate.
Multiple regression showed the presence of collinear predictors, and
Data showed that feedback in round 3 about the level of group
hence perceived expertise was removed from the analysis. Neither
agreement achieved in round 2 had an effect on category ratings in the
sociodemographic nor professional variables had a significant effect in
third round, causing a shift of opinions. Fig. 2 plots the percentage of
terms of explaining the shift of opinion. Values of R2 ranged from 0.03 to
categories selected as “relevant” in round 3 for each interval of group
0.07, suggesting that the set of predictor variables accounts for a very
agreement feedback.
low percentage of variance in the dependent variable of each regression
It can be seen that as the percentage of group agreement (which is
model. Table 4 summarizes the results of the regression analysis.
also the information given as feedback to participants in round 3) in
creases, so too does the percentage of categories yielding greater group
4. Discussion
agreement in round 3 compared with round 2 (i.e., the percentage of
experts who agree that a category is relevant increases between rounds 2
This methodological study explores the process through which
and 3). Conversely, when the feedback given in round 3 indicated lower
consensus is reached in Delphi studies, focusing specifically on the in
levels of group agreement in round 2, the categories in question were
fluence of controlled feedback on participants’ responses across rounds.
more likely to yield an even lower percentage of group agreement in
Based on the data of five empirical Delphi studies, our results indicate
round 3. More specifically, the figure shows that once the feedback given
that providing participants with feedback about the level of group
to participants in round 3 indicated agreement among 75% or more of
agreement reached in the previous round has an effect on the level of
experts in round 2, then an increasing proportion of categories yield
consensus that is achieved subsequently.
greater agreement in round 3 compared with round 2. The opposite
Although the Delphi method is better able to avoid conformity
effect can be observed for levels of group agreement feedback below
Table. 2
Demographic and professional characteristics of participants in the second and third Delphi rounds.
Total participants Psychiatrists Psychologists Nurses Occupational therapist Social Workers
Age (years) 45.6 (15.9) [23 - 81] 47.5 (10.0) [30 - 81] 42.8 (10.6) [25 - 67] 47.2 (11.4) [24 - 74] 38.2 (11.4) [23 - 67] 42.5 (11.7) [26 - 72]
mean (SD) [min - max]
Gender (Female) n (%) 302 (48.1) 86 (28.4) 85 (62.0) 48 (60.8) 60 (82.2) 23 (63.9)
World regiona n (%)
Africa 39 (6.2) 20 (6.6) 9 (6.6) 4 (5.1) 5 (6.8) 1 (2.8)
Americas 142 (22.6) 62 (20.5) 37 (27.0) 20 (25.3) 14 (19.2) 9 (25.09)
Eastern Mediterranean 38 (6.1) 14 (4.6) 10 (7.3) 7 (8.9) 7 (9.6) 0 (0.0)
Europe 201 (32.0) 74 (24.4) 55 (40.1) 22 (27.8) 38 (52.1) 12 (33.3)
South-East Asia 102 (16.2) 67 (22.1) 15 (10.9) 9 (11.4) 3 (4.1) 8 (22.2)
Western Pacific 106 (16.9) 66 (21.8) 11 (8.0) 17 (21.5) 6 (8.2) 6 (16.7)
Professional experience 18.8 (10.6) [1 - 55] 19.1 (9.8) [2 - 55] 15.4 (8.9 [4 – 32] 21.8 (12.2) [2 – 52] 14.3 (10.6) [1 - 40] 16.0 (10.3) [2 - 46]
(years)
Perceived expertise 4.0 (0.9) [1 - 5] 4.3 (0.7) [1 - 5] 3.8 (0.9) [1 - 5] 3.8 (0.9) [1 - 5] 3.5 (0.8) [1 - 5] 3.6 (1.0) [2 - 5]
Note: SD = standard deviation. Perceived expertise: Self-rating of schizophrenia expertise from 1 = limited expertise to 5 = extensive expertise. aWorld regions as
defined by the World Health Organization.
5
Fig. 2. Effect of feedback about the group agreement achieved in round 2 on category ratings in round 3.
Fig. 3. Mean percentage difference between the third and second rounds in the level of consensus achieved, according to each interval of group agreement feedback.
pressure in decision making than are other group research techniques that the effect of controlled feedback about group agreement depends on
such as focus groups (Landeta et al., 2011; Zimmermann et al., 2012), the level of agreement that is shared with participants as feedback.
the bandwagon effect (i.e., majority opinion leading people to adopt the Specifically, and based on the large number of category ratings
majority view) seems to play a role in participants’ responses after analyzed, our study has been able to identify a specific threshold on
specific controlled feedback is given. The power of feedback has been either side of which the response trend differed. When the feedback
demonstrated by several studies (Bolger et al., 2011; Makkonen et al., given in round 3 indicated that at least 75% group agreement had been
2016; Rowe et al., 2005; Scheibe et al., 1975), and it has also been achieved in round 2, those participants who had not regarded a category
shown that Delphi participants who change their opinion are more likely as relevant in the previous round tended to shift their view towards the
to shift towards the majority group opinion (Bolger et al., 2011; Mak majority opinion. By contrast, when feedback indicated less than 75%
konen et al., 2016; Rowe et al., 2005). However, our findings indicate group agreement in round 2, participants who had previously rated a
6
Table. 3
Mean percentage of change of opinion based on the 75% threshold of group
agreement.
0.093
0.096
0.987
1.287
1.465
2.097
1.756
1.955
1.242
1.951
1.356
1.318
SE B
(incongruent)
Change of opinion Group agreement of 75% or Group agreement lower
higher than 75%
− 0.104
− 0.363
− 0.623
− 1.126
− 1.886
− 1.973
Change
Mean (SD) [95% CI] Mean (SD) [95% CI]
0.104
2.462
3.471
0.974
1.896
1.669
0.040
0.058
Non-change 84.4 (15.0) [83.3 – 85.6] 40.2 (22.4) [38.5 – 42.0]
B
(congruent)
0.155
0.159
1.639
2.137
2.433
3.482
2.916
3.246
2.063
3.239
2.252
2.189
Non-change 6.4 (9.2) [5.7 – 7.2] 39.9 (22.2) [38.2 – 41.7]
SE B
Change (congruent)
(incongruent)
Change (congruent) 5.5 (8.3) [4.8 – 6.1] 13.7 (16.7) [12.4 – 15.0]
Change (incongruent) 3.7 (8.5) [3.0 – 4.3] 6.2 (9.6) [5.4 – 6.9]
− 2.267
− 3.609
− 9.930
− 3.722
− 4.830
− 0.905
0.031
0.021
2.756
0.952
1.345
0.683
0.043
0.034
Note: SD = Standard deviation. CI = Confidence interval.
B
category as relevant were more likely to change their response in round
3. Thus, when the percentage of group agreement is shared with experts
as controlled feedback, their views tend to shift towards the majority
0.211
0.217
2.240
2.922
3.326
4.760
3.986
4.436
2.820
4.427
3.079
2.992
opinion only when support is strong among the group as a whole,
SE B
(incongruent)
Conversely, if the level of group agreement is not perceived as indicating
Non-change
strong group support, participants are more likely to change their
< 0.001
− 0.215
15.622
− 3999
opinion and stop supporting the majority view, thus hindering the
0.031
3.862
0.632
8.732
2.258
1.034
5.338
6.871
2.921
0.072
achievement of consensus. More specifically, our results suggest that
B
Sociodemographic and professional variables and their relevance in explaining the shift of opinion based on the 75% threshold of group agreement.
feedback indicating group agreement of at least 75% tends to favor even
greater consensus, whereas below the 75% threshold, feedback may
make consensus less likely (i.e., the percentage of group agreement
0.214
0.220
2.268
2.958
3.367
4.818
4.036
4.491
2.855
4.482
3.116
3.029
<75% threshold
SE B
decreases). Interestingly, systematic reviews by Diamond et al. (2014)
Non-change
(congruent)
and Foth et al. (2016) both found that 75% agreement was commonly
− 6.255
− 0.827
− 3.997
− 9.163
− 0.493
− 2.404
− 7.636
− 1.631
accepted as a consensus threshold in Delphi studies.
0.079
0.052
3.669
0.490
0.052
0.008
Importantly, our data also suggest that feedback about the level of
B
group agreement may have a progressively greater effect the further
away agreement is from the 75% threshold. Thus, in the Delphi studies
analyzed here, the highest levels of group agreement feedback led to
even greater consensus among the panel of experts. By contrast, as the
0.082
0.084
0.868
1.131
1.288
1.843
1.544
1.718
1.092
1.715
1.192
1.159
SE B
level of group agreement decreased further below the 75% threshold,

(incongruent)
experts were increasingly likely to change their mind regarding the

− 0.061
2.926
3.668
0.982
3.078
− 1.744
− 1.528
− 1.013
Change
relevance of a category. A possible explanation for this finding is that the
Note: a Psychiatrists was used as the reference category. b Europe was used as the reference category.
0.108
1.524
0.267
0.584
0.041
0.046
experts were specifically told that the final list of categories should be as
B
−
−
−
−
short as possible but comprehensive enough to capture the needs of
individuals with schizophrenia. This instruction may have encouraged
0.082
0.085
0.873
1.138
1.295
1.854
1.553
1.728
1.098
1.729
1.199
1.165
SE B
some experts to change their mind when the level of group agreement
appeared to fall short of a consensus.
(congruent)
Our panel of participants showed wide variability with regard to age,

− 0.054
− 1.414
− 0.935
− 0.222
Change
0.039
1.110
1.385
0.329
3.165
0.896
0.909
0.060
0.023
0.457
gender, professional experience, region of origin, and health profes
B
sional profile. However, in line with some previous studies (Bolger et al.,
2011; Kauko and Palmroos, 2014; Makkonen et al., 2016), we found no
evidence of a relationship between opinion change and sociodemo
− 4.738
− 5.324
− 2.971
graphic and professional characteristics of participants, including for

0.087
0.089
0.923
1.204
1.370
1.962
1.643
1.828
1.162
SE B
variables related to professional experience that have been regarded,

(incongruent)
Non-change
theoretically, as being associated with shifts of opinion among Delphi

participants. This finding suggests that sociodemographic and profes
− 0.058
− 0.007
− 1.554
− 2.382
− 3.128
− 2.403
− 1.61
− 2.20
1.443
1.070
2.135
0.032
0.005
0.16
sional characteristics of panelists are of no relevance in this kind of

B
consensus-building process. However, given that our results are contrary

to those of other authors (Mulgrave and Ducanis, 1975; Rowe and
Dependent variables
Wright, 1996, 1999; Yaniv and Milyavsky, 2007) who reported a rela
0.146
0.150
1.551
2.023
2.302
3.295
2.760
3.071
1.952
3.065
2.131
2.071
≥75% threshold
SE B
tionship between high relative expertise and low propensity to change

opinion over rounds, further experimental studies are needed to shed
Non-change
(congruent)
light on this issue. An important point to note in this context is that even
− 0.139
− 6.951
− 0.476
0.173
1.453
0.369
0.610
1.716
2.365
5.381
5.434
2.327
0.036
0.102
if the characteristics of the panel are not important in terms of reaching a

B
consensus, the quality of the Delphi process and its results do depend on
the appropriate recruitment and selection of qualified experts who are
Professional experience
Eastern Mediterranean
Independentvariables
able to provide a truly representative view of the issue under investi

gation (Donohoe and Needham, 2009; Keeney et al., 2006; Sobaih et al.,
Occup. therapists
South-East Asia
Western Pacific
Social workers
2012).
Psychologists
Professiona
Regarding opinion change, it is worth noting that the percentage of

Regionb
America
Table. 4
p-value
Gender
Nurses
opinion shift between rounds in the Delphi studies analyzed here is low
Africa
Age
compared with the number of responses that did not change between the
R2
7
second and third rounds. It should be noted that across the five Delphi a high level of group agreement might be achieved in the next round or,
studies a large number of categories reached the agreed consensus on the contrary, whether consensus is unlikely.
threshold (i.e., ≥ 75% group agreement) in the second round, and hence The results of this study have a number of implications, although
participants did not need subsequently to change their opinion insofar as caution should be exercised when seeking to extrapolate them to Delphi
their initial opinion was already congruent with the majority view. studies with a different design (i.e., in terms of the rating scale used, type
Moreover, these were real-world Delphi studies involving participants of feedback provided, or lack of stability in panel responses). First, re
with considerable expertise in relation to the issue they were asked to searchers in any Delphi study need to be aware that controlled feedback
rate, and this, as Hussler et al. (2011) point out, makes it less likely that in a given round can, depending on the information provided, have
experts will change their original opinions. A further point to consider is different effects on participants’ subsequent responses. Thus, although
that each of the five Delphi studies followed the recommendation to the feedback effects observed in this study strongly support the use of
provide experts, in the third round, with information about their own the Delphi technique for consensus building, since high group agree
ratings in the previous round (Boulkedid et al., 2011; Keeney et al., ment favors even greater consensus, decision makers who use the
2006; Murphy et al., 1998), an approach which, according to Meijering technique for the purposes of forecasting should also consider, when
and Tobi (2018), is associated with less opinion change. drawing conclusions and making recommendations based on their re
A key strength of the present analysis is that it is based on five real- sults, that giving controlled feedback has the potential to introduce
world Delphi studies involving a large sample of experts from around the desirability bias (Ecken et al., 2011). Consequently, using the Delphi
world, each with extensive experience concerning the issue they were method in this context does not necessarily mean that forecasting ac
asked to appraise. This ecological validity compensates to some extent curacy will be improved, even if greater consensus is achieved. Further
for the lack of experimental control, since as several authors have research is required to assess the effect of controlled feedback when the
pointed out (Rowe and Wright 1999; Rowe et al., 1991; Meijering and Delphi technique is used as a forecasting method.
Tobi, 2018) some experimental Delphi studies derive their findings from Another implication of our analysis is that by studying opinion
samples of students who are asked to make judgments about topics on change across rounds it is possible to determine whether consensus was
which they cannot be considered experts. A further strength is that the present from the outset or was only achieved as a result of feedback
large sample size and number of category ratings enabled us to identify being given. Consideration of these issues is important with regard to the
response patterns in participants’ item ratings after receiving feedback. validity and reliability of the panel’s final decision, and it brings greater
However, there are also several potential weaknesses of our study transparency to the decision-making process. This is why it is highly
that should be mentioned. First, participants’ responses are based on a advisable, when conducting a Delphi study, to establish a priori a
dichotomous variable (i.e., they were simply asked to rate whether each threshold above which consensus is considered to have been reached.
category was relevant or not), whereas several Delphi studies have used The recommended threshold based on our results would be 75%
Likert-type scales (Lin et al., 2015). It should be noted, however, that agreement, since the pattern of responses differs on either side of this
despite the use of Likert-type scales (ranging from 3 to 10 points) in level of consensus. When feedback indicates group agreement of at least
these studies some authors subsequently dichotomized the scale since 75% in the previous round, it is likely that even greater consensus (i.e., a
they noticed that the distribution of responses was bimodal (Hussler higher percentage of group agreement) will be achieved in the next
et al., 2011), while others based their definition of agreement on ratings round, whereas consensus over a given item will weaken if participants
at the upper end of the scale used (e.g., items scored as 4 and 5 on a are told that agreement is below 75%. At all events, further studies using
5-point Likert scale) (Foth et al., 2016; Lynch et al., 2020). Ultimately, a Likert-type rating scale are needed to confirm this pattern.
and as indicated in a recent study by Lange et al. (Lange et al., 2020), a
consensus-oriented Delphi process always ends with a dichotomous 5. Conclusions
result (i.e., a selected item, a recommendation). Nevertheless, although
we were interested here in examining the effect of feedback on partici Based on our findings in this study we conclude that the likelihood of
pants’ responses, it could be that the type of rating scale used may also opinion change among participants in a Delphi study is influenced by the
influence opinion change and how consensus is achieved. In this respect, controlled feedback they receive. By contrast, the sociodemographic and
it should be noted that the percentage of opinion change found in other professional characteristics of the panel of experts appear to be of no
studies that have used a Likert scale was higher than in the present relevance in this respect. Importantly, the effect of controlled feedback
study, even though it was still low (Ecken et al., 2011; Makkonen et al., depends on the level of agreement that is shared as feedback, and thus it
2016; Meijering et al., 2018). Moreover, Lange et al. recently found that may facilitate or hinder the consensus-building process. When the
different rating scales (i.e., three-point, five-point, and nine-point rating feedback given indicates strong agreement among the group as a whole,
scales) lead to different percentages of agreement, and consequently, participants tend to shift towards the majority opinion, whereas when
different numbers of items surpass the consensus threshold (Lange et al., the feedback is not perceived as indicating strong group support, opin
2020). Future studies should therefore aim to examine in greater detail ions are more likely to change in a way that hinders consensus. Our data
the relationship between the type of rating scale used and opinion indicate that group agreement of 75% acts as a threshold, since the
change. A second limitation of our study is that the analysis is based on pattern of responses observed differs on either side of this level of
the shift of opinion between two rounds, and further studies involving consensus. More specifically, consensus among participants increases
more rounds are needed to confirm the feedback effect we observed when feedback indicates group agreement of at least 75% and decreases
here. Third, although controlled feedback in the form of information when it is less than 75%. This finding highlights the importance of
about the percentage of group agreement is frequently used in Delphi looking at the consensus-building process across Delphi rounds in order
studies (Diamond et al., 2014), this measure is not exempt from criticism to ensure that the decisions made are valid and reliable.
(von der Gracht, 2012). Our findings therefore need to be corroborated
by studies that examine the effect of other kinds of feedback, for Funding
example, measures of central tendency and dispersion (e.g., mean, me
dian, range, interquartile range), argumentative feedback (e.g., justifi This work was supported by Spain’s Ministry of Economy and
cations, reasons), or even a mix of both statistical and argumentative Competitiveness [grant PSI2015–67,984-R; PID2019–109887GB-I00],
information (e.g., providing a justification in support of the personal and by the Agency for the Management of University and Research
rating of participants). Finally, the present study provides no insight into Grants of the Government of Catalonia [grant 2017SGR1681].
why experts changed their opinion, although our findings may be useful
for predicting, based on the feedback given to the participants, whether
8
References Linstone, H.A., Turoff, M., 1975. The Delphi Method: Techniques and Applications.
Addison-Wesley, London.
Lynch, T.S., Minkara, A., Aoki, S., Bedi, A., Bharam, S., Clohisy, J., Harris, J., Larson, C.,
Adler, M., Ziglio, E, 1996. Gazing Into the oracle: The Delphi Method and Its Application
Nepple, J., Nho, S., Philippon, M., Rosneck, J., Safran, M., Stubbs, A.J.,
to Social Policy and Public Health. Jessica Kingsley Publishers, London.
Westermann, R., Byrd, J.W.T., 2020. Best practice guidelines for hip arthroscopy in
Birko, S., Dove, E.S., Özdemir, V., Dalal, K., 2015. Evaluation of nine consensus indices in
femoroacetabular impingement: results of a Delphi process. The J American
Delphi foresight research and their dependency on Delphi survey characteristics: a
Academy of Orthopedic Surgeons 28 (2), 81–89. https://doi.org/10.5435/JAAOS-d-
simulation study and debate on Delphi design and interpretation. PLoS ONE 10 (8),
18-00041.
1–14. https://doi.org/10.1371/journal.pone.0135162.
Makkonen, M., Hujala, T., Uusivuori, J., 2016. Policy experts’ propensity to change their
Bolger, F., Stranieri, A., Wright, G., Yearwood, J., 2011. Does the Delphi process lead to
opinion along Delphi rounds. Technol Forecast Soc Change 109, 61–68. https://doi.
increased accuracy in group-based judgmental forecasts or does it simply induce
org/10.1016/j.techfore.2016.05.020.
consensus amongst judgmental forecasters? Technol Forecast Soc Change 78 (9),
Meijering, J.V., Kampen, J.K., Tobi, H., 2013. Quantifying the development of agreement
1671–1680. https://doi.org/10.1016/j.techfore.2011.06.002.
among experts in Delphi studies. Technol Forecast Soc Change 80 (8), 1607–1614.
Boulkedid, R., Abdoul, H., Loustau, M., Sibony, O., Alberti, C., 2011. Using and reporting
https://doi.org/10.1016/j.techfore.2013.01.003.
the Delphi method for selecting healthcare quality indicators: a systematic review.
Meijering, J.V., Tobi, H., 2018. The effects of feeding back experts’ own initial ratings in
PLoS ONE 6 (6), e20476. https://doi.org/10.1371/journal.pone.0020476.
Delphi studies: a randomized trial. Int J Forecast 34 (2), 216–224. https://doi.org/
Brookes, S.T., Macefield, R.C., Williamson, P.R., McNair, A.G., Potter, S., Blencowe, N.S.,
10.1016/j.ijforecast.2017.11.010.
Blazeby, J.M., 2016. Three nested randomized controlled trials of peer-only or
Meijering, J.V., Tobi, H., 2016. The effect of controlled opinion feedback on Delphi
multiple stakeholder group feedback within Delphi surveys during core outcome and
features: mixed messages from a real-world Delphi experiment. Technol Forecast Soc
information set development. Trials 17 (1), 1–14. https://doi.org/10.1186/s13063-
Change 103, 166–173. https://doi.org/10.1016/j.techfore.2015.11.008.
016-1479-x.
Mulgrave, N.W., Ducanis, A.J., 1975. Propensity to change responses in a Delphi round
Dajani, J.S., Sincoff, M.Z., Talley, W.K., 1979. Stability and agreement criteria for the
as a function of dogmatism. In: Linstone, H.A., Turoff, M. (Eds.), The Delphi Method:
termination of Delphi studies. Technol Forecast Soc Change 13 (1), 83–90. https://
Techniques and Applications. Addison-Wesley, London, pp. 288–290.
doi.org/10.1016/0040-1625(79)90007-6.
Murphy, M.K., Black, N.A., Lamping, D.L., McKee, C.M., Sanderson, C., Askham, J.,
Dalkey, N.C., 1975. Toward a theory of group estimation. In: Linstone, H.A., Turoff, M.
Marteau, T., 1998. Consensus development methods and their use in creating clinical
(Eds.), The Delphi Method — Techniques and Applications. Addison-Wesley,
guidelines: a review. Health Technol Assess (Rockv) 2 (3), 1–88.
London.
Nowack, M., Endrikat, J., Guenther, E., 2011. Review of Delphi-based scenario studies:
Diamond, I.R., Grant, R.C., Feldman, B.M., Pencharz, P.B., Ling, S.C., Moore, A.M.,
quality and design considerations. Technol Forecast Soc Change 78 (9), 1603–1615.
Wales, P.W., 2014. Defining consensus: a systematic review recommends
https://doi.org/10.1016/j.techfore.2011.03.006.
methodologic criteria for reporting of Delphi studies. J Clin Epidemiol 67 (4),
Nuño, L., Barrios, M., Moller, M.D., Calderón, C., Rojo, E., Gómez-Benito, J., Guilera, G.,
401–409. https://doi.org/10.1016/j.jclinepi.2013.12.002.
2019. An international survey of Psychiatric-Mental-Health Nurses on the content
Donohoe, H., Needham, R.D., 2009. Moving best practice forward: delphi characteristics,
validity of the international classification of functioning, disability and health core
advantages, potential problems, and solutions. International Journal of Tourism
sets for schizophrenia. Int J Ment Health Nurs. https://doi.org/10.1111/inm.12586.
Research 11 (5), 415–437.
Nuño, L., Barrios, M., Rojo, E., Gómez-Benito, J., Guilera, G., 2018. Validation of the ICF
Ecken, P., Gnatzy, T., Von der Gracht, H.A., 2011. Desirability bias in foresight:
Core Sets for schizophrenia from the perspective of psychiatrists: an international
consequences for decision quality based on Delphi results. Technol Forecast Soc
Delphi study. J Psychiatr Res 103, 134–141. https://doi.org/10.1016/j.
Change 78 (9), 1654–1670. https://doi.org/10.1016/j.techfore.2011.05.006.
jpsychires.2018.05.012.
El-Gazzar, R., Hustad, E., Olsen, D.H., 2016. Understanding cloud computing adoption
Nuño, L., Guilera, G., Bell, M., Rojo, E., Gómez-Benito, J., Calderon, C., Barrios, M., 2021.
issues: a Delphi study approach. J Systems and Software 118, 64–84. https://doi.
An occupational therapist perspective on the ICF Core Sets for schizophrenia.
org/10.1016/J.JSS.2016.04.061.
American J Occupational Therapy. https://doi.org/10.5014/ajot.2021.041509.
Faulks, D., Molina, G., Eschevins, C., Dougall, A., 2016. Child oral health from the
Powell, C., 2003. The Delphi technique: myths and realities. J Adv Nurs 41 (4), 376–382.
professional perspective – a global ICF-CY survey. Int J Paediatric Dentistry 26 (4),
https://doi.org/10.1046/j.1365-2648.2003.02537.x.
266–280. https://doi.org/10.1111/ipd.12195.
Prinsen, C.A.C., Vohra, S., Rose, M.R., King-Jones, S., Ishaque, S., Bhaloo, Z., … Terwee,
Foth, T., Efstathiou, N., Vanderspank-Wright, B., Ufholz, L.-.A., Dütthorn, N.,
C.B. (2014). Core Outcome Measures in Effectiveness Trials (COMET) initiative:
Zimansky, M., Humphrey-Murto, S., 2016. The use of Delphi and Nominal Group
protocol for an international Delphi study to achieve consensus on how to select
Technique in nursing education: a review. Int J Nurs Stud 60, 112–120. https://doi.
outcome measurement instruments for outcomes included in a ‘core outcome set.’
org/10.1016/j.ijnurstu.2016.04.015.
Trials, 15(1), 247. 10.1186/1745-6215-15-247.
Förster, B., von der Gracht, H., 2014. Assessing Delphi panel composition for strategic
Rowe, G., Wright, G., 1996. The impact of task characteristics on the performance of
foresight - A comparison of panels based on company-internal and external
structured group forecasting techniques. Int J Forecast 12, 73–89.
participants. Technol Forecast Soc Change 84, 215–229. https://doi.org/10.1016/j.
Rowe, G., Wright, G., 1999. The Delphi technique as a forecasting tool: issues and
techfore.2013.07.012.
analysis. Int J Forecast 15, 353–375.
Harman, N.L., Bruce, I.A., Kirkham, J.J., Tierney, S., Callery, P., O’Brien, K.,
Rowe, G., Wright, G., McColl, A., 2005. Judgment change during Delphi-like procedures:
Williamson, P.R., 2015. The importance of integration of stakeholder views in core
the role of majority influence, expertise, and confidence. Technol Forecast Soc
outcome set development: Otitis media with effusion in children with cleft palate.
PLoS ONE 10 (6), 1–22. https://doi.org/10.1371/journal.pone.0129514.
Scheibe, M., Skutsch, M., Schofer, J., 1975. Experiments in Delphi methodology. In:
Humphrey-Murto, S., Varpio, L., Wood, T.J., Gonsalves, C., Ufholz, L.-.A., Mascioli, K.,
Linstone, H.A., Turoff, M. (Eds.), The Delphi Method: Techniques and Applications.
Foth, T., 2017. The use of the Delphi and other consensus group methods in medical
Addison-Wesley, Reading, MA, pp. 262–287.
education research. Academic Medicine 92 (10), 1491–1498. https://doi.org/
Schmidt, R.C., 1997. Managing Delphi surveys using nonparametric statistical
10.1097/ACM.0000000000001812.
techniques. Decision Sciences 28 (3), 763–774. https://doi.org/10.1111/j.1540-
Hussler, C., Muller, P., Rond, P., 2011. Is diversity in Delphi panelist groups useful?
5915.1997.tb01330.x.
Evidence from a French forecasting exercise on the future of nuclear energy. Technol
Selb, M., Escorpizo, R., Kostanjsek, N., Stucki, G., Üstün, B., Cieza, A., 2015. A guide on
Forecast Soc Change 78 (9), 1642–1653. https://doi.org/10.1016/j.
how to develop an International Classification of Functioning, Disability and Health
techfore.2011.07.008.
Core Set. Eur J Phys Rehabil Med 51 (1), 105–117.
Jiang, R., Kleer, R., Piller, F.T., 2017. Predicting the future of additive manufacturing: a
Sinha, I.P., Smyth, R.L., Williamson, P.R., 2011. Using the Delphi technique to determine
Delphi study on economic and societal implications of 3D printing for 2030. Technol
which outcomes to measure in clinical trials: recommendations for the future based
Forecast Soc Change 117, 84–97. https://doi.org/10.1016/J.
on a systematic review of existing studies. PLoS Med. 8 (1), e1000393 https://doi.
TECHFORE.2017.01.006.
org/10.1371/journal.pmed.1000393.
Kauko, K., Palmroos, P., 2014. The Delphi method in forecasting financial markets: an
Sobaih, A.E., Ritchie, C., Jones, E., 2012. Consulting the oracle? Applications of modified
experimental study. Int J Forecast 30 (2), 313–327. https://doi.org/10.1016/j.
Delphi technique to qualitative research in the hospitality industry. Int J
ijforecast.2013.09.007.
Contemporary Hospitality Management 24 (6), 886–906.
Keeney, S., Hasson, F., McKenna, H., 2006. Consulting the oracle: ten lessons from using
Trevelyan, E.G., Robinson, N., 2015. Delphi methodology in health research: how to do
the Delphi technique in nursing research. J Adv Nurs 53 (2), 205–212. https://doi.
it? Eur J Integr Med 7 (4), 423–428. https://doi.org/10.1016/j.eujim.2015.07.002.
org/10.1111/j.1365-2648.2006.03716.x.
Turnbull, A.E., Dinglas, V.D., Friedman, L.A., Chessare, C.M., Sepúlveda, K.A.,
Landeta, J., Barrutia, J., 2011. People consultation to construct the future: a Delphi
Bingham, C.O., Needham, D.M., 2018. A survey of Delphi panelists after core
application. Int J Forecast 27 (1), 134–151. https://doi.org/10.1016/j.
outcome set development revealed positive feedback and methods to facilitate panel
ijforecast.2010.04.001.
member participation. J Clin Epidemiol 102 (410), 99–106. https://doi.org/
Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical
10.1016/j.jclinepi.2018.06.007.
data. Biometrics 33, 159–174. https://doi.org/10.2307/2529310.
Van der Vaart, R., Witting, M., Riper, H., Kooistra, L., Bohlmeijer, E.T., Van Gemert-
Lange, T., Kopkow, C., Lützner, J., Günther, K.P., Gravius, S., Scharf, H.P., Schmitt, J.,
Pijnen, L.J., 2014. Blending online therapy into regular face-to-face therapy for
2020. Comparison of different rating scales for the use in Delphi studies: Different
depression: content, ratio and preconditions according to patients and therapists
scales lead to different consensus and show different test-retest reliability. BMC
using a Delphi study. BMC Psychiatry 14 (1), 355. https://doi.org/10.1186/s12888-
Medical Research Methodology 20 (1), 1–11. https://doi.org/10.1186/s12874-020-
014-0355-z.
0912-8.
Von der Gracht, H.A., 2012. Consensus measurement in Delphi studies. Review and
Lin, V.S., Song, H., 2015. A review of Delphi forecasting research in tourism. Current
implications for future quality assurance. Technological Forecasting and Social
Issues in Tourism 18 (12), 1099–1131. https://doi.org/10.1080/
13683500.2014.967187.
9
Nuño, L., Guilera, G., Coenen, M., Rojo, E., Gómez-Benito, J., Barrios, M., 2019. Georgina Guilera, Ph.D., is a professor at the University of Barcelona, Spain. She is a
Functioning in schizophrenia from the perspective of psychologists: a worldwide member of the Research Group on Child and Adolescent Victimization (GReVIA) and the
study. PLoS ONE 14 (6), e0217936. https://doi.org/10.1371/journal.pone.0217936. Board of Directors of the Institute of Neurosciences (University of Barcelona). Her research
Nuño, L., Guilera, G., Solomon, P., Rojo, E., Gómez-Benito, J., Barrios, M., 2021. The interests include the development and adaptation of psychological measurement in
perspective of social workers on functioning for individuals with schizophrenia: a struments, bibliometric and meta-analytic techniques and their application in the fields of
Delphi study. J Soc Social Work Res 12 (2). mental health and developmental victimology. In these areas, she has contributed to
Organization, World Health, 2001. International Classification of Functioning. Disability several competitive research projects, publications, and conference presentations.
and Health (ICF). World Health Organization, Geneva.
Yaniv, I., Milyavsky, M., 2007. Using advice from multiple sources to revise and improve
Laura Nuño is a clinical psychologist at the Addictions Unit at Hospital Clinic of Barcelona.
judgments. Organ. Behav. Hum. Decis. Process. 103 (1), 104–120.
She got previously a scholarship in the Department of Personality, Assessment and Psy
Zawacki-Richter, O., 2009. Research areas in distance education: a Delphi study. The
chological Treatment of the University of Barcelona. She has also undertaken a research
International Review of Research in Open and Distributed Learning 10 (3). https://
placement in the Department of Allied Health Sciences of the UCONN (Connecticut, USA),
doi.org/10.19173/irrodl.v10i3.674.
focusing specifically on meta-analytic studies. In the healthcare setting, she focuses on
Zimmermann, M., Darkow, I.L., von der Gracht, H.A., 2012. Integrating Delphi and
patients with a range of psychiatric disorders, especially substance use disorders. Her main
participatory backcasting in pursuit of trustworthiness - The case of electric mobility
lines of research concern the structure of personality, trauma, test adaptation, meta-
in Germany. Technol Forecast Soc Change 79 (9), 1605–1621. https://doi.org/
analytic studies and cognitive impairment in psychiatric disorders.
10.1016/j.techfore.2012.05.016.
Juana Gómez-Benito is a professor of psychometrics at the University of Barcelona, Spain.

Maite Barrios, Ph.D., is an associate professor at the Faculty of Psychology of the Uni
Her research interests focus on systematic reviews, test development, structural equation
versity of Barcelona, Spain. She is a member of the Research Group on Measurement
models, and cross-cultural research. She has published over 200 scientific articles in
Invariance and Analysis of Change (GEIMAC). Her current research interests focus on
refereed journals, focusing on methodological and applied issues in psychological mea
psychometrics, qualitative and quantitative methodologies, systematic reviews, survey
surement and lack of bias. In the last decade, her research interests have focused on
studies, and bibliometrics, with a particular focus on their application to different scien
developing strategies to optimize the validity of measurements and their application to
tific fields. She has participated in numerous competitive research projects that have led to
health fields.
publishing several scientific articles in refereed journals. She is also co-author of several
books and texts on statistics, psychometrics and research techniques.
10

1 s2.0 S004016252031310X Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S004016252031310X Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S004016252031310X Main

Uploaded by

Copyright:

Available Formats

Technological Forecasting & Social Change 163 (2021) 120484

Contents lists available at ScienceDirect

Technological Forecasting & Social Change

Consensus in the delphi method: What makes a decision change?

Fig. 1. The Delphi process.

level of group agreement decreased further below the 75% threshold,

experts were increasingly likely to change their mind regarding the

relevance of a category. A possible explanation for this finding is that the

Our panel of participants showed wide variability with regard to age,

graphic and professional characteristics of participants, including for

variables related to professional experience that have been regarded,

theoretically, as being associated with shifts of opinion among Delphi

sional characteristics of panelists are of no relevance in this kind of

consensus-building process. However, given that our results are contrary

tionship between high relative expertise and low propensity to change

if the characteristics of the panel are not important in terms of reaching a

able to provide a truly representative view of the issue under investi­

Regarding opinion change, it is worth noting that the percentage of

Juana Gómez-Benito is a professor of psychometrics at the University of Barcelona, Spain.

You might also like

able to provide a truly representative view of the issue under investi