Papers by Barbara Mellers
The Amsterdam Symposium
When making most choices, people imagine how they will feel about the consequences. This chapter ... more When making most choices, people imagine how they will feel about the consequences. This chapter provides an account of the anticipation process and uses it to predict choice. Decisions from several experiments are consistent with a theory in which people are assumed to evaluate alternatives by making trade-offs between predicted pleasure and pain. Then they choose the alternative with greater expected pleasure. The field of decision making has long benefited from the interdisciplinary contributions of philosophers, economists, and statisticians, among others. These interdisciplinary contibutions can be categorized into two camps. One camp specifies how people should make choices if they wish to obey fundamental rules of logic and probability. The other camp focuses on what people actually do when making choices. While rational theories rely on beliefs and utilities, descriptive theories look to psychological processes including cognitive limitations, social norms, and cultural constraints to explain actual choices and the reasons behind alleged deviations from rationality. Both camps are well aware that emotions influence choice. Rational theorists have addressed the question of whether emotions should influence choice, and descriptive theorists have explored how emotions influence choice. This chapter presents a descriptive account of decision making that focuses on anticipated pleasure. We propose that, when making a choice, people imagine how they will feel about future consequences. Comparisons of qualitatively different feelings are made in terms of pleasure and pain. That is, people evaluate each alternative by balancing imagined pleasure against imagined pain and select the alternative with greater average pleasure. Disciplines
International Studies Quarterly, Mar 19, 2018
Scholars, practitioners, and pundits often leave their assessments of uncertainty vague when deba... more Scholars, practitioners, and pundits often leave their assessments of uncertainty vague when debating foreign policy, arguing that clearer probability estimates would provide arbitrary detail instead of useful insight. We provide the first systematic test of this claim using a data set containing 888,328 geopolitical forecasts. We find that coarsening numeric probability assessments in a manner consistent with common qualitative expressions-including expressions currently recommended for use by intelligence analysts-consistently sacrifices predictive accuracy. This finding does not depend on extreme probability estimates, short time horizons, particular scoring rules, or individual attributes that are difficult to cultivate. At a practical level, our analysis indicates that it would be possible to make foreign policy discourse more informative by supplementing natural language-based descriptions of uncertainty with quantitative probability estimates. More broadly, our findings advance longstanding debates over the nature and limits of subjective judgment when assessing social phenomena, showing how explicit probability assessments are empirically justifiable even in domains as complex as world politics. Before President John F. Kennedy authorized the Bay of Pigs invasion in 1961, he asked the Joint Chiefs of Staff to evaluate the plan. The Joint Chiefs found it unlikely that a group of Cuban exiles could topple Fidel Castro's government. Internally, they agreed that this probability was about 30 percent. But when the Joint Chiefs conveyed this view to the president in writing, they stated only that "[t]his plan has a fair chance of success." The report's author, Brigadier General David Gray, claimed that "[w]e thought other people would think that 'a fair chance' would mean 'not too good.'" Kennedy, by contrast, interpreted the phrase as indicating favorable odds. Gray later concluded that his vague
Judgment and Decision Making, 2023
The original version of Figure 2 showed forecasts for both answer options on binary questions, pr... more The original version of Figure 2 showed forecasts for both answer options on binary questions, producing a pattern of forced symmetry relative to horizontal axis mid-point (0.50), i.e., data points above 0.50 were the mirror images of those below 0.50. The manuscript (Dana et al., 2019) has been updated with a figure based on data that only include forecasts on the first answer option for each question. We believe this version more informative as it avoids duplication of data points. In the updated figure, most points lie above the diagonal line, denoting a pattern of higher prices relative to the corresponding aggregated beliefs.
Journal of Experimental Social Psychology, Sep 1, 2021
Crowd wisdom' refers to the surprising accuracy that can be attained by averaging judgments from ... more Crowd wisdom' refers to the surprising accuracy that can be attained by averaging judgments from independent individuals. However, independence is unusual; people often discuss and collaborate in groups. When does group interaction improve vs. degrade judgment accuracy relative to averaging the group's initial, independent answers? Two large laboratory studies explored the effects of 969 face-to-face discussions on the judgment accuracy of 211 teams facing a range of numeric estimation problems from geographic distances to historical dates to stock prices. Although participants nearly always expected discussions to make their answers more accurate, the actual effects of group interaction on judgment accuracy were decidedly mixed. Importantly, a novel, group-level measure of collective confidence calibration robustly predicted when discussion helped or hurt accuracy relative to the group's initial independent estimates. When groups were collectively calibrated prior to discussion, with more accurate members being more confident in their own judgment and less accurate members less confident, subsequent group interactions were likelier to yield increased accuracy. We argue that collective calibration predicts improvement because groups typically listen to their most confident members. When confidence and knowledge are positively associated across group members, the group's most knowledgeable members are more likely to influence the group's answers.
International Journal of Forecasting, Oct 1, 2017
Forecasters are typically evaluated via proper scoring rules such as the Brier score. These scori... more Forecasters are typically evaluated via proper scoring rules such as the Brier score. These scoring rules use only the reported forecasts for assessment, neglecting related variables such as the specific questions that a person chose to forecast. In this paper, we study whether information related to question selection influences our estimates of forecaster ability. In other words, do good forecasters tend to select questions in a different way from bad forecasters? If so, can we capitalize on these selections in estimating forecaster ability? To address these questions, we extend a recently-developed psychometric model of forecasts to include question selection data. We compare the extended psychometric model to a simpler model, studying its unidimensionality assumption and highlighting the unique information it can provide. We find that the model can make use of the fact that good forecasters select more questions than bad forecasters, and we conclude that question selection data can be beneficial above and beyond reported forecasts. As a side benefit, the resulting model can potentially provide unique incentives for forecaster participation. In many areas of forecasting, question selection is an issue of considerable importance. Does a forecaster look good because he/she chose to forecast only easy questions? Should
University of Pennsylvania Press eBooks, Dec 31, 2019
Journal of Forecasting, Oct 12, 2017
The Good Judgment Team led by psychologists P. Tetlock and B. Mellers of Q2 the University of Pen... more The Good Judgment Team led by psychologists P. Tetlock and B. Mellers of Q2 the University of Pennsylvania was the most successful of five research projects sponsored through 2015 by the Intelligence Advanced Research Projects Activity to develop improved group forecast aggregation algorithms. Each team had at least 10 algorithms under continuous development and evaluation over the 4-year project. The mean Brier score was used to rank the algorithms on approximately 130 questions concerning categorical geopolitical events each year. An algorithm would return aggregate probabilities for each question based on the probabilities provided per question by thousands of individuals, who had been recruited by the Good Judgment Team. This paper summarizes the theorized basis and implementation of one of the two most accurate algorithms at the conclusion of the Good Judgment Project. The algorithm incorporated a number of pre-and postprocessing steps, and relied upon a minimum distance robust regression method called L 2 E. The algorithm was just edged out by a variation of logistic regression, which has been described elsewhere. Work since the official conclusion of the project has led to an even smaller gap.
Judgment and Decision Making, Mar 1, 2019
Psychologists typically measure beliefs and preferences using self-reports, whereas economists ar... more Psychologists typically measure beliefs and preferences using self-reports, whereas economists are much more likely to infer them from behavior. Prediction markets appear to be a victory for the economic approach, having yielded more accurate probability estimates than opinion polls or experts for a wide variety of events, all without ever asking for self-reported beliefs. We conduct the most direct comparison to date of prediction markets to simple self-reports using a within-subject design. Our participants traded on the likelihood of geopolitical events. Each time they placed a trade, they first had to report their belief that the event would occur on a 0-100 scale. When previously validated aggregation algorithms were applied to self-reported beliefs, they were at least as accurate as prediction-market prices in predicting a wide range of geopolitical events. Furthermore, the combination of approaches was significantly more accurate than prediction-market prices alone, indicating that self-reports contained information that the market did not efficiently aggregate. Combining measurement techniques across behavioral and social sciences may have greater benefits than previously thought.
Judgment and Decision Making, Nov 1, 2017
Accountability pressures are a ubiquitous feature of social systems: virtually everyone must answ... more Accountability pressures are a ubiquitous feature of social systems: virtually everyone must answer to someone for something. Behavioral research has, however, warned that accountability, specifically a focus on being responsible for outcomes, tends to produce suboptimal judgments. We qualify this view by demonstrating the long-term adaptive benefits of outcome accountability in uncertain, dynamic environments. More than a thousand randomly assigned forecasters participated in a ten-month forecasting tournament in conditions of control, process, outcome or hybrid accountability. Accountable forecasters outperformed non-accountable ones. Holding forecasters accountable to outcomes ("getting it right") boosted forecasting accuracy beyond holding them accountable for process ("thinking the right way"). The performance gap grew over time. Process accountability promoted more effective knowledge sharing, improving accuracy among observers. Hybrid (process plus outcome) accountability boosted accuracy relative to process, and improved knowledge sharing relative to outcome accountability. Overall, outcome and process accountability appear to make complementary contributions to performance when forecasters confront moderately noisy, dynamic environments where signal extraction requires both knowledge pooling and individual judgments.
Journal of Behavioral Decision Making, Feb 17, 2016
In dynamic task environments, decision makers are vulnerable to two types of errors: sticking too... more In dynamic task environments, decision makers are vulnerable to two types of errors: sticking too closely to the rules (excessive conformity) or straying too far from them (excessive deviation). We explore the effects of process and outcome accountability on the susceptibility to these errors. Using a multiple‐cue probability‐learning task, we show that process accountability encourages conformity errors and outcome accountability promotes deviation errors. Two additional studies explore the moderating effects of self‐focused and other‐focused group norms. Self‐focused norms reduce the effect of process accountability on excessive conformity. Other‐focused norms reduce the effect of outcome accountability on excessive deviation. Our results qualify prevailing claims about the benefits of process over outcome accountability and show that those benefits hinge on prevailing group norms, on the effectiveness of prescribed decision rules, and on the amount of irreducible uncertainty in the prediction task. Copyright © 2016 John Wiley & Sons, Ltd.
Cognition, Jul 1, 2019
People often express political opinions in starkly dichotomous terms, such as "Trump will either ... more People often express political opinions in starkly dichotomous terms, such as "Trump will either trigger a ruinous trade war or save U.S. factory workers from disaster." This mode of communication promotes polarization into ideological in-groups and out-groups. We explore the power of an emerging methodology, forecasting tournaments, to encourage clashing factions to do something odd: to translate their beliefs into nuanced probability judgments and track accuracy over time and questions. In theory, tournaments advance the goals of "deliberative democracy" by incentivizing people to be flexible belief updaters whose views converge in response to facts, thus depolarizing unnecessarily polarized debates. We examine the hypothesis that, in the process of thinking critically about their beliefs, tournament participants become more moderate in their own political attitudes and those they attribute to the other side. We view tournaments as belonging to a broader class of psychological inductions that increase epistemic humility and that include asking people to explore alternative perspectives, probing the depth of their cause-effect understanding and holding them accountable to audiences with difficultto-guess views.
National Conference on Artificial Intelligence, Oct 19, 2012
Many methods have been proposed for making use of multiple experts to predict uncertain events su... more Many methods have been proposed for making use of multiple experts to predict uncertain events such as election outcomes, ranging from simple averaging of individual predictions to complex collaborative structures such as prediction markets or structured group decision making processes. We used a panel of more than 2,000 forecasters to systematically compare the performance of four different collaborative processes on a battery of political prediction problems. We found that teams and prediction markets systematically outperformed averages of individual forecasters, that training forecasters helps, and that the exact form of how predictions are combined has a large effect on overall prediction accuracy.
American Psychologist, Apr 1, 2019
From 2011 to 2015, the U.S. intelligence community sponsored a series of forecasting tournaments ... more From 2011 to 2015, the U.S. intelligence community sponsored a series of forecasting tournaments that challenged university-based researchers to invent measurably better methods of forecasting political events. Our group, the Good Judgment Project, won these tournaments by balancing the collaboration and competition of members across disciplines. At the outset, psychologists were ahead of economists in identifying individual differences in forecasting skill and developing methods of debiasing forecasts, whereas economists were ahead of psychologists in designing simple market mechanisms that distilled predictive signals from noisy individual-level data. Working closely with statisticians, psychologists eventually beat the markets by producing better probability estimates that funneled top forecasters into elite teams and aggregated their judgments using a log-odds formula tuned to the diversity of the forecasters. Our research group performed best when team members strove to get as much as possible from their home disciplines, but acknowledged their limitations and welcomed help from outsiders. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
International Journal of Forecasting, Apr 1, 2023
International Journal of Forecasting, 2023
Management Science, Dec 1, 2021
A four-year series of subjective probability forecasting tournaments sponsored by the U.S. intell... more A four-year series of subjective probability forecasting tournaments sponsored by the U.S. intelligence community revealed a host of replicable drivers of predictive accuracy, including experimental interventions such as training in probabilistic reasoning, anti‐groupthink teaming, and tracking of talent. Drawing on these data, we propose a Bayesian BIN model (Bias, Information, Noise) for disentangling the underlying processes that enable forecasters and forecasting methods to improve—either by tamping down bias and noise in judgment or by ramping up the efficient extraction of valid information from the environment. The BIN model reveals that noise reduction plays a surprisingly consistent role across all three methods of enhancing performance. We see the BIN method as useful in focusing managerial interventions on what works when and why in a wide range of domains. An R-package called BINtools implements our method and is available on the first author’s personal website. This paper was accepted by Manel Baucells, decision analysis.
Decision Analysis, Jun 1, 2016
We use results from a multiyear, geopolitical forecasting tournament to highlight the ability of ... more We use results from a multiyear, geopolitical forecasting tournament to highlight the ability of the contribution weighted model [Budescu DV, Chen E (2015) Identifying expertise to extract the wisdom of crowds. Management Sci. 61(2):267–280] to capture and exploit expertise. We show that the model performs better when judges gain expertise from manipulations such as training in probabilistic reasoning and collaborative interaction from serving on teams. We document the model’s robustness using probability judgments from early, middle, and late phases of the forecasting period and by showing its strong performance in the presence of hypothetical malevolent forecasters. The model is highly cost-effective: it operates well, even with random attrition, as the number of judges shrinks and information on their past performance is reduced.
Uploads
Papers by Barbara Mellers