System Dynamics at Sixty: The Path Forward
System Dynamics at Sixty: The Path Forward
System Dynamics at Sixty: The Path Forward
Abstract
The late Jay Forrester founded the field of system dynamics 60 years ago. On this anniversary I
ask what lessons his remarkable life of innovation and impact hold for the field. From a
Nebraska ranch to MIT, servomechanisms, digital computing and system dynamics, Jay lived
his entire life on the frontier, innovating to solve critical problems. Today, more than ever,
humanity faces grave threats arising from our mismanagement of increasingly complex systems.
Fortunately, progress in system dynamics, the natural and social sciences, computing, data
availability, and other modeling disciplines have advanced the frontier in theory and methods
to build reliable, useful knowledge of complex systems. We must therefore ask: What would Jay
do if he were young today? How would he build on the knowledge, technology, data and
methods now available to improve the science and practice of system dynamics? What must we
now do to explore the next frontier?
© 2018 System Dynamics Society
The challenge
The field of system dynamics just marked the 60th anniversary of its found-
ing by the late Jay Forrester, who passed away at the age of 98 in November
2016. The 50th anniversary provided an occasion to look back and celebrate
the many accomplishments in the field (Sterman, 2007). There have been
many important contributions since then, some presented in this special
issue marking the 60th anniversary. These include methodological advances
expanding the system dynamics toolkit to more robust methods to capture
network dynamics (Lamberson, 2018), the interaction of dynamics at
multiple timescales (Ford, 2018), experimental studies of group modeling
(McCardle-Keurentjes et al., 2018) and insights arising from integration of
system dynamics and operations research (Ghaffarzadegan and Larson,
2018), along with applications in medicine (Rogers et al., 2018), epidemiol-
ogy (Tebbens and Thompson, 2018), health care (Minyard et al., 2018),
climate change (Woodruff et al., 2018), environmental policy (Kapmeier and
Gonçalves, 2018), and organizational transformation (Rydzak and
Monus, 2018).
Despite these advances, we live in a time of growing threats to society and
human welfare. On this anniversary we must therefore also look forward.
What must be done now to improve theory and practice, education and
* Correspondence to: John D. Sterman, MIT Sloan School of Management, System Dynamics Group, 100
Main Street, E62-436, Cambridge, Massachusetts 02142, U.S.A. E-mail: jsterman@mit.edu
Accepted by Yaman Barlas, Received 27 July 2018; Revised 31 July 2018; Accepted 1 August 2018
5
6 System Dynamics Review
training, so that the next decades see even more progress and impact than
the last? Jay himself recognized the need for significant changes in system
dynamics to realize its full promise (Forrester, 2007a).
I begin with a brief review of Jay’s career and contributions in servomecha-
nisms, digital computation, the management of large-scale projects, and sys-
tem dynamics. I argue that his contributions are widely misunderstood. The
main lesson of Jay’s several careers does not lie in the particular tools or
methods he developed, but in the need for continual innovation to solve
important and difficult problems. Close examination of Jay’s life reveals a
relentless effort to make a difference on real and pressing problems. To do
so, in each of his careers, Jay studied the then-new advances in tools and
methods developed in any discipline relevant to the problem he sought to
address, mastering the state of the art—and then built on those advances.
The failure to appreciate Jay’s real contribution is a significant problem
today in the field of system dynamics. Despite many successes, too many in
the field continue to develop models the way Jay did in the 1950s, 60s and
70s, adhering to outdated methods for model development, testing and com-
munication, and closing themselves off from important developments in other
fields. The result is a gap between best practice and what is often done.
Sixty years after Jay brilliantly synthesized feedback control theory,
numerical methods, and digital computation to create the field of system
dynamics, we must close the “knowing–doing gap” (Pfeffer and Sutton,
2000) that has emerged, then innovate again. I outline developments in other
disciplines we must embrace to close the gap and improve the rigor, reliabil-
ity and impact of system dynamics research and practice.
A bold innovator
Jay’s remarkable life story has been told elsewhere, including by Jay himself
(Richardson, 1991; Forrester, 1992, 2007b; Green, 2010; Lane and Sterman,
2011, 2018). Born and raised on a ranch in western Nebraska where his par-
ents were the original homesteaders, Jay learned early on the value of educa-
tion, self-reliance and the need to solve difficult problems:
Fig. 1. Radar antenna stabilization servo designed by Jay Forrester. Photo: John Sterman [Color figure can be viewed at
wileyonlinelibrary.com]
After college, Jay joined MIT and the servomechanisms lab, then newly
founded by Gordon Brown. The war was underway; Jay’s master’s thesis was
to design and test a servomechanism to stabilize the radar antennae on naval
ships (Figure 1). To do so, Jay mastered the state of the art in feedback con-
trol systems, both theory and practice. Then he innovated: the servo would
have to generate large forces to keep the heavy antenna vertical despite high
winds and the pitch, roll and yaw of the ship beneath. Doing so required
very high gain in the control loop while maintaining stability. “Departing
from my training in electrical engineering, the work focused on designing
© 2018 System Dynamics Society
DOI: 10.1002/sdr
8 System Dynamics Review
one; we can’t wait for the production equipment” (Forrester, 1992). After
working well for 9 months:
1
I thank Judy, Nathan and Ned Forrester for permission to use the material from Jay’s scrapbook.
room at the Moana Hotel ($3 per night) for a little post-combat rest and
relaxation. Then, on 7 December 1943, he received an urgent message from
Gordon Brown (Figure 4):
Not wanting to interrupt his well-deserved R&R and a planned stop to visit
his family in Nebraska on the way back to MIT, Jay responded (Figure 5):
Gordon responded:
Fig. 5. Forrester’s
response to Gordon
Brown’s request to return
to MIT urgently,
11 December 1943.
Source: Forrester
Scrapbook [Color figure
can be viewed at
wileyonlinelibrary.com]
In characteristic fashion, Jay’s response did not lament the stubborn attitude
of the people using and maintaining the servos he had designed. He did not
blame the people in the system. Instead he sent pages of detailed instruc-
tions to improve the design of the servo so it would be more robust and even
easier to operate and maintain, including this telling passage:
2
See https://youtu.be/5ZQP4G3Qwb4 for the famous segment of Edward R. Murrow’s See It Now program, in
which Forrester demonstrated Whirlwind, including a simulation model, on live television, 16 December
1951. See https://youtu.be/JZLpbhsE72I?t=1s for a 1980 presentation Jay gave on the history of Whirlwind.
MIT’s collection of Whirlwind material is described at https://libraries.mit.edu/archives/research/collections/
collections-mc/mc665.html.
Fig. 6. “Speed-Storage Diagram of Computer Rating” by Forrester, 18 March 1949, showing the capabilities of existing and
planned computers; log–log scale, with “Speed—complete arithmetic operations per second” (x-axis) and “High speed
storage—equivalent binary digits” (y-axis). Dotted lines show the frontier of speed and memory required for different tasks,
from “Scientific and accounting applications” (done by hand or using punch cards; lower left), through “Automatic machines
with program on external tape or plugboard” as in the Mark I, Mark II and ENIAC machines, through “Automatic machines
with program in high speed storage” as in the Mark III and EDVAC, to the “Control and simulation applications” Whirlwind I
was designed to do. Photo: J. Sterman, from Forrester files [Color figure can be viewed at wileyonlinelibrary.com]
Fig. 7. Two of many hand-written notes Jay took summarizing the literature relevant to computing theory and technology for
the Whirlwind project [Color figure can be viewed at wileyonlinelibrary.com]
Managing the program provided first-hand insight into the operation and politics
of complex organizations. When Jay joined the MIT Sloan School of Manage-
ment in 1956 he used these insights in developing what became system dynam-
ics. At MIT Sloan, Jay spent the first year or so exploring the opportunities to
make a distinctive contribution. As he had done before in servomechanisms,
Whirlwind and SAGE, he explored new developments in operations research,
economics, management, organization theory, psychology and other disciplines.
He read widely, attended workshops and seminars, met new people, and learned
the latest work in management science. In the first chapter of Industrial Dynam-
ics (ID; Forrester, 1961, p. 14) he surveyed the state of the art, then identified.
building on them to create new tools where needed to propel the field of sys-
tem dynamics to new frontiers. Fortunately, many in the field are following
Jay’s process, connecting with other disciplines, forging links to other com-
munities, using the tools others have developed and then building
upon them.
3
This section is adapted and expanded, with permission, from Rahmandad and Sterman, https://www.
systemdynamics.org/assets/docs/sdorabm.pdf.
These comments must not be construed as suggesting that the model builder
should lack interest in the microscopic separate events that occur in a continuous-
flow channel … The study of individual events is one of our richest sources of
information about the way the flow channels of the model should be constructed
… The preceding comments do not imply that discreteness is difficult to represent,
nor that it should forever be excluded from a model. (ID, pp. 65–66).
The effect of the time response [impulse response distribution] of delays may
depend on where the delay is in the system. One should not draw the general
conclusion that the time response does not matter. However, for most of the
incidental delays within a large system we can expect to find that the time
response of the delay is not a critical factor. Third-order exponential delays are
usually a good compromise. (ID, pp. 420–421).
Jay clearly counsels against assuming that the order and specification of
delays does not matter; the third-order delay he used was instead a “compro-
mise” that provided the lowest-order (and thus least computationally demand-
ing) member of the Erlang family with reasonable impulse- and frequency-
response characteristics (specifically, the impulse response of the third-order
delay, like many real-life processes, shows no immediate response to a change
in its input). This compromise was a pragmatic decision dictated by limitations
on computer memory and speed, and on the availability of the data required to
identify and estimate the impulse response distribution. Today, both these con-
straints have been significantly relaxed, and it is often possible to specify the
order, mean delay time and other characteristics of delays that best represent
the process and fit the data (see Sterman, 2000, ch. 11, for examples).
System dynamics models can be implemented using a variety of different
simulation architectures. These vary in their representation of time (continuous
or discrete), state variables (continuous or discrete), and uncertainty (stochastic
or deterministic). Ordinary differential equations, stochastic differential equa-
tions, discrete event simulations, agent-based models and dynamic network
models are common computational architectures offering different choices on
these dimensions. Today, many software programs are available to implement
these architectures, and some allow hybrid models—for example, models that
have compartments (aggregated stocks) for some state variables and individual
© 2018 System Dynamics Society
DOI: 10.1002/sdr
J. D. Sterman: System Dynamics at Sixty 19
agents for others. Both compartment and individual-level models can be formu-
lated in continuous or discrete time, with continuous or discrete quantities,
and either can be deterministic or stochastic.
Any mechanism can be specified at various levels of aggregation. For
example, to model the spread of infectious diseases one could use an aggre-
gated compartment model such as the classic SIR model and its variants
(Sterman, 2000, ch. 9), in which all individuals in a community are aggre-
gated into compartments representing the susceptible, infectious and
removed (recovered or deceased) populations. If needed, the model could be
disaggregated into multiple compartments to represent any relevant dimen-
sions of heterogeneity, such as age, gender, immune system status, location
(at multiple scales, from nation or province to postal code or even finer), or
by patterns of activity that determine contacts with others. High-impact
examples include the disaggregated compartment models developed by
Thompson, Tebbens and colleagues and used by WHO and CDC to design
policy to support the Global Polio Eradication Initiative (Thompson and Teb-
bens, 2007, 2008; Thompson et al., 2015; Tebbens and Thompson, 2018, this
issue). Alternatively, one could capture aspects of the contact networks
among individuals by considering their pair-wise interactions
(as demonstrated for system dynamics applications by Lamberson, 2018, this
issue, building on Keeling et al., 1997; see also Lamberson, 2016). Or one
could move to an individual-based model in which each individual is repre-
sented separately. Where the SIR compartment model has three states, the
individual-level SIR model portrays each individual as being either suscepti-
ble, infectious, or removed, and could potentially include heterogeneous
individual-level attributes including location, travel patterns, social net-
works, and others that determine the hazard rates of state transitions (see,
e.g., Waldrop, 2018, and the work of the MIDAS Group (Models of Infectious
Disease Agent Study), https://www.epimodels.org/). Obviously, individual-
level (agent) models include stocks, flows and feedback loops, just as com-
partment models do. Where the stock of susceptible individuals in the com-
partmental SIR model is a single state variable and is reduced by the aggregate
flow of new cases (infection), the stock of susceptible individuals in the agent-
based model is the sum of all those currently susceptible, and the aggregate
flow of new cases is the sum of those becoming infected. In the agent-based
model the hazard rate that each susceptible individual becomes infected is
determined by the rate at which each comes into contact with any infectious
individuals and the probability of infection given contact, closing the reinfor-
cing feedback by which the contagion spreads. In the compartment model the
hazard rate of infection is the same for all individuals, while it can differ
across individuals in the agent-based model. The feedback structure, how-
ever, is the same (see Rahmandad and Sterman, 2008, for an explicit compari-
son of individual and compartment models in epidemiology).
4
Edwards (2010) provides a compelling history of weather and climate modeling and the coevolution of these
models with ever-finer and more extensive data collection.
parameters, and try any experiments they like and receive feedback immedi-
ately, thus enabling them to learn for themselves and assess new proposals
at the cadence required in negotiations and educational experiences (Rooney
Varga et al., 2018).
The capabilities and speed of computers today have greatly relaxed the
constraints on model size and level of aggregation compared to the 1950s,
when Forrester created system dynamics. But constraints still exist, and
always will. No matter how fast computers become, comprehensive models
of a system are impossible. As Mihailo Mesarovic, a developer of early global
simulations, noted: “No matter how many resources one has, one can envi-
sion a complex enough model to render resources insufficient to the task”
(Meadows et al., 1982, p. 197). Modelers should choose the simulation archi-
tecture and level of aggregation most appropriate for the purpose of the
model—that is, best suited to solve the problem. Typically, aggregate models
are easier to represent in differential equation frameworks (compartment
models, whether continuous or discrete, deterministic or stochastic), while
disaggregation to capture important degrees of heterogeneity across members
of the population may call for an individual-based architecture (again, deter-
ministic or stochastic). And just as one should carry out sensitivity analysis
to assess the impact of parameter uncertainty, structural sensitivity tests
assessing the similarities and differences across levels of aggregation and
model architectures are needed. For example, Rahmandad and Sterman
(2008) compared continuous, aggregated compartment models to individual-
based models in the context of epidemiology and the diffusion of infectious
diseases (see also Lamberson, 2018, this issue).
In sum, the goal of dynamic modeling is sometimes to build theoretical
understanding of complex dynamic systems, sometimes to implement policies
for improvement, and often both. To do so, system dynamics modelers seek to
include a broad model boundary that captures important feedbacks relevant to
the problem to be addressed; represent important structures in the system
including accumulations and state variables, delays and nonlinearities; use
behavioral decision rules for the actors and agents, grounded in first-hand
study of the relevant organizations and actors; and use the widest range of
empirical data to specify the model, estimate parameters, and build confidence
in the results. System dynamics models can be implemented using a wide
range of methods, including differential equations, difference equations or dis-
crete event simulation; aggregated or disaggregated compartment models or
individual-based models; deterministic or stochastic models; and so on.
Asking “Should I build a system dynamics or agent-based model?” is like
going to the hospital after an accident and having the doctor say “Would
you like to be treated or do you want surgery? The choice is not between
treatment in general and any particular treatment, but among the different
treatments available. You seek the best treatment for your condition, and
you expect doctors to learn and use the best options as new ones become
© 2018 System Dynamics Society
DOI: 10.1002/sdr
J. D. Sterman: System Dynamics at Sixty 23
Forrester argued early and correctly that data are not only numerical data
and that “soft” (unquantified) variables should be included in our models if
they are important to the purpose. Jay noted that the quantified data are a
tiny fraction of the relevant data needed to develop a model and stressed the
importance of written material and especially the “mental data base” con-
sisting of the mental models, beliefs, perceptions and attitudes of the actors
in the system. His emphasis on the use of unquantified data was, at the time,
unusual, if not unique, among modelers in the management sciences and eco-
nomics, some of whom attacked his position as unscientific. Jay’s response:
“To omit such variables is equivalent to saying they have zero effect—
probably the only value that is known to be wrong!” (ID, p. 57). Omitting
structures or variables known to be important because numerical data are
unavailable to specify and estimate them is less scientific and less accurate
than using expert judgment and other qualitative data elicitation techniques
to estimate their values. Omitting important processes because we lack
numerical data to quantify them leads to narrow model boundaries and biased
results—and may lead to erroneous or harmful policy recommendations.
However, acknowledging the importance of qualitative data begs the ques-
tion of how these variables and effects can be identified and tested. Jay’s
approach emphasized observation and interviews to capture the “mental
data base” of the people in the system:
One can go into a corporation that has serious and widely known difficulties.
The symptom might be a substantial fluctuation of employment with peaks sev-
eral years apart. Or the symptom might be falling market share.... In the process
of finding valuable insights in the mental data store, one talks to a variety of
people in the company, maybe for many days, possibly spread over many
months. The discussion is filtered through one’s catalog of feedback structures
into which the behavioral symptoms and the discussion of structure and policy
might fit. The process converges toward an explicit simulation model. The poli-
cies in the model are those that people assert they are following; in fact, the
emphasis is often on the policies they are following in an effort to alleviate the
great difficulty. (Forrester, 1980, p. 560; emphasis added).
Research since Jay’s pioneering studies shows that mental models are
often unreliable and biased even regarding policy, structure and actual
behavior, and that verbal accounts, even carefully elicited and cross-
checked, often do not reveal the causal structure of a system, the decision
processes of the informant or other actors, or even accurate accounts of
events and behavior. Today we know that many judgments and decisions
arise rapidly, automatically, and unconsciously from so-called “system 1”
neural structures, in contrast to the slow, “system 2” structures underlying
effortful, conscious deliberation (e.g., Sloman, 1996; Kahneman, 2011. Lakeh
and Ghaffarzadegan, 2016, explore implications for system dynamics). Per-
ceptions, judgments and decisions arising from system 1 are often systemati-
cally wrong, with examples ranging from optical illusions to violations of basic
rules of logic to stereotyping and the fundamental attribution error to the dozens
of “heuristics and biases” documented in the behavioral decision-making and
behavioral economics literature (e.g., Gilovich et al., 2002).
Furthermore, when asked to provide explanations for our beliefs, judg-
ments and decisions, we provide accounts that suffer from post hoc rationali-
zation and align with what we consider to be plausible, based on prior, often
implicit, and often erroneous mental models. These errors apply not only to
accounts of unconscious, system 1 processes, but to deliberative processes as
well: we often have “little or no direct introspective access to higher order
cognitive processes”, “telling more than we can know” (Nisbett and Wilson,
1977). Memory is fallible and eyewitness testimony is often unreliable, as
documented in a comprehensive National Academies review, which found:
Gaps in sensory input are filled by expectations that are based on prior
experiences with the world. Prior experiences are capable of biasing the visual
perceptual experience and reinforcing an individual’s conception of what was
seen.... [P]erceptual experiences are stored by a system of memory that is highly
malleable and continuously evolving.... The fidelity of our memories to actual
events may be compromised by many factors at all stages of processing, from
encoding to storage to retrieval. Unknown to the individual, memories are forgot-
ten, reconstructed, updated, and distorted. (National Research Council,
2014, pp. 1–2).
Worse, we are all subject to implicit (unconscious) racial, gender and other
biases (Greenwald and Banaji, 1995). These and other errors and biases
afflict lay people and experts alike. Indeed, experts are often more vulnera-
ble to them. Overconfidence bias provides a telling example: people tend to
be overconfident in their judgments, significantly underestimating
© 2018 System Dynamics Society
DOI: 10.1002/sdr
26 System Dynamics Review
These comments are not to discourage the proper use of the data that are avail-
able nor the making of measurements that are shown to be justified … Lord Kel-
vin’s famed quotation, that we do not really understand until we can measure,
still stands. (ID, p. 59).
The quotation from Lord Kelvin (the physicist William Thomson) to which
Jay refers is:
a first essential step in the direction of learning about any subject is to find prin-
ciples of numerical reckoning and methods for practicably measuring some
quantity connected with it.... when you can measure what you are speaking
about, and express it in numbers, you know something about it; but when you
cannot measure it, when you cannot express it in numbers, your knowledge is
of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but
you have scarcely, in your thoughts, advanced to the stage of science, whatever
the matter may be. (Thomson, 1883, p. 73; emphasis in original).
But before we measure, we should name the quantity, select a scale of measure-
ment, and in the interests of efficiency we should have a reason for wanting to
know. (ID, p. 59).
Fig. 8. Pacemaker
recipients, U.S.A.,
1960–1980. Source:
Homer (1987, 2012)
[Color figure can be
viewed at
wileyonlinelibrary.com]
Follow-Up
+
Desired Follow
Adequacy of Up Reports
-
Reports
+
follow-up cases required for reliable inferences about safety and efficacy.
Basic sampling theory suggests that the required sample size for follow-up
studies would scale with the square root of the pacing-eligible population.
Homer found an excellent fit to the data (Figure 11). To estimate the nature
and length of the publication delay, note that the multi-stage nature of the
follow-up supply chain (initiation, approval, funding, execution, and so on)
suggests a high-order delay, but Homer did not merely assume this to be the
case; instead, he experimented with different delay types, finding that a
third-order delay with a mean delay time of 1.25 years fit the data best
(Figure 12).
The oscillations in evaluations Homer found are not merely a curiosity but
have significant policy implications. During troughs in publications, follow-
up data became increasingly dated, providing a poor guide to the benefits
and risks as pacemakers improve and new patient populations become eligi-
ble for pacing. During such times, clinicians would be flying blind, perhaps
being too aggressive and placing at risk certain new patients who would ben-
efit little; or perhaps being too conservative and failing use pacing for others
who could benefit substantially.
Policy analysis with the model led Homer to endorse the use of large clini-
cal registries for new medical technologies as a supplement to randomized
clinical trials and evaluative studies. Registries, typically overseen by federal
agencies such as the National Institutes of Health, require clinicians to con-
tinuously report outcomes for their patients. Registries generate a steady flow
of information and shorten the delay in the evaluation process, thus keeping
up with changes in the technology and its application. They are widely used
today, having proven helpful in the early identification of harmful side
effects and unexpected benefits for certain patient subsets.
The lesson is clear: it would have been easy for Homer to rely on “com-
mon sense” and assume that the publication of follow up studies followed
the smooth growth of pacing. Instead, his careful empirical work revealed a
surprising phenomenon with important implications (see also Homer, 1996).
Furthermore, the tools for statistical estimation of parameters were not well
developed compared to today and often inappropriate for use in complex
dynamic systems. Given the limitations of computing at the time, multiple
linear regression was the most widely used econometric tool. However, lin-
ear regression and related methods (e.g., ANOVA) impose assumptions that
are routinely violated in complex dynamic systems. These include perfect
specification of the model, no feedback between the dependent and indepen-
dent variables, no correlations among the independent variables, no mea-
surement error, and error terms that are i.i.d. normally distributed. And, of
course, classical multiple regression and similar statistical methods only
revealed correlations among variables and could not identify the genuine
causal relationships modelers sought to capture.
Given these limitations, Jay and other early modelers, including econo-
mists, sociologists and others, faced a dilemma: they could estimate parame-
ters and relationships in their models using econometric methods and
provide quantitative measures of model fit to the data, but at the cost of con-
straining the structure of their models to the few constructs for which
numerical data existed and using estimation methods that imposed dubious
assumptions; or they could use qualitative and quantitative data to build
models with broad boundaries, important feedbacks, and nonlinearities, at
the cost of estimating parameters by judgment and expert opinion when sta-
tistical methods were impossible or inappropriate. Forrester and early sys-
tem dynamics practitioners opted for the latter. Much controversy in the
early years surrounded the different paths modelers trained in different tra-
ditions chose when faced with these strong tradeoffs.
Today, data have never been more abundant. Methods to reliably measure
previously unquantified concepts, unavailable to Jay, are now essential. Rig-
orous data collection, both qualitative and quantitative, opens up new
opportunities for formal estimation of model parameters, relationships and
structure, to assess the ability of models to replicate the historical data, and
to characterize the sensitivity of results to uncertainty, providing more reli-
able and useful insights and policy recommendations. Rahmandad et al.
(2015) provide an excellent survey of these methods, with examples and
tutorials.
System dynamics modelers have long sought methods to test models and
assess their ability to replicate historical data (e.g., Barlas, 1989, 1996; Ster-
man, 2000, ch. 21; Saysel and Barlas, 2006; Yücel and Barlas, 2015, provide
an overview of many tests to build confidence in dynamic models, including
their ability to fit historical data, robustness under extreme conditions, and
generalizability). But system dynamics modelers were not the only ones con-
cerned with the limitations of early statistical methods. Engineers, statisti-
cians and econometricians worked to develop new methods to overcome the
limitations of traditional methods. By the late 1970s, control theory methods
designed to handle feedback-rich models with measurement error and
© 2018 System Dynamics Society
DOI: 10.1002/sdr
34 System Dynamics Review
laboratory but include many large-scale field experiments (Duflo and Bane-
rjee, 2017). Many RCTs in social systems today are carried out in difficult
circumstances and provide important insights into policies to reduce poverty
and improve human welfare (see, e.g., the work of MIT’s Poverty Action Lab,
https://www.povertyactionlab.org).
System dynamics has long used experimental methods (Sterman, 1987,
1989), and experimental research in dynamic systems is robust (e.g., Booth
Sweeney and Sterman, 2000; Moxnes, 2001, 2004; Croson and Donohue,
2005; Kopainsky and Sawicka, 2011; Arango et al., 2012; Gonzalez and
Wong, 2012; Kampmann and Sterman, 2014; Lakeh and Ghaffarzadegan,
2015; Özgün and Barlas, 2015; Sterman and Dogan, 2015; Villa et al., 2015;
Gary and Wood, 2016). Many of these studies involve individuals or small
groups and have provided important insights into how people understand,
perform in and learn from experience in dynamic systems. Still, many set-
tings involve group decision-making. Experimental work involving groups
offers important opportunities; for example, McCardle-Keurentjes et al.
(2018, this issue), who report an experiment evaluating whether system
dynamics tools such as causal diagrams improved performance of individ-
uals in a group model building workshop. Extending experimental methods
to group behavior and large-scale field settings is a major opportunity to con-
duct real-world tests of the theories and policies emerging in the broad
boundary, feedback-rich models typical in system dynamics.
Despite progress in methods for and the scope of experimental methods,
experimentation remains impossible in many important contexts. Absent rig-
orous RCTs, researchers often seek to estimate causal structure and relation-
ships via econometric methods. Although we cannot randomly assign
murderers to execution versus prison to assess the deterrent effect of capital
punishment, we might estimate the effect by comparing states or countries
where the death penalty is legal to those where it is not. Of course, states dif-
fer from one another on multiple dimensions, including socio-demographic
characteristics of the population (age, race, education, religion, urban
vs. rural, family structure, etc.), economic conditions (income, unemploy-
ment, etc.), climate, cultural attitudes, political orientation, and a host of
others. Such studies are extremely common, typically using panel data and
specifying regression models using fixed effects to control for those differ-
ences deemed to be important. The hope is that controlling for these differ-
ences allows the true impact of capital punishment on the murder rate
(or any other effect of interest) to be estimated. The problem is that one can-
not measure or even enumerate all the possible differences among the states
that could potentially influence the murder rate. If any factors that affect the
murder rate are omitted, the results will be biased, particularly if, as is com-
mon, the omitted factors are correlated with those factors that are included
(omitted variable bias), and especially if a state’s decision to use the death
penalty was affected by any of these conditions, including the murder rate
itself (endogeneity bias).
These are more than theoretical concerns. Leamer (1983), in a provoca-
tively titled article, “Let’s take the ‘con’ out of econometrics”, showed that
choosing different control variables in panel data regressions led to statisti-
cally significant results indicating that a single execution either prevented
many murders or actually increased them. He concluded “that any inference
from these data about the deterrent effect of capital punishment is too fragile
to be believed” (Leamer, 1983, p. 42). The results challenged the economet-
ric community to take identification seriously: if econometric methods can-
not identify causal relationships then the results of such models cannot
provide reliable advice to policymakers.
The result was the “identification revolution”, an explosion in methods
for causal inference using formal estimation (see Angrist and Pischke, 2010,
whose paper is titled “The Credibility Revolution in Empirical Economics:
How Better Research Design Is Taking the Con out of Econometrics”). These
methods include natural experiments (e.g., Taubman et al., 2014) and, when
true random assignment is not possible, quasi-experimental methods,
including regression discontinuity, difference-in-difference and instrumental
variables (see Angrist and Pischke, 2009).5
Robust methods for parameter estimation, identification, assessment of good-
ness of fit, and parametric and structural sensitivity analysis are now available,
not only in econometrics but also in engineering, artificial intelligence and
machine learning (e.g. Pearl, 2009; Pearl and Mackenzie, 2018. Abdelbari and
Shafi, 2017, offer an application in system dynamics). Following Jay’s example,
system dynamics modelers should master the state of the art and use these
tools, follow new developments as the tools continue to evolve, and innovate to
develop new methods appropriate for the models we build.
The advent of “big data” and analytics including machine learning, vari-
ous artificial intelligence methods and the ability to process data sets of pre-
viously unimaginable size also create unprecedented opportunities for
dynamic modelers. These detailed data sources provide far greater temporal
and spatial resolution, generating insight into empirical issues relevant to
important questions. For example, Rydzak and Monus (2018, this issue) pro-
vide detailed empirical evidence on networks of collaboration among
workers from different departments in industrial facilities and model how
these evolved over time to explain why one facility was successful in
improving maintenance and reliability while another struggled. In the social
realm, the integration of big data sets describing population density, the
location of homes, schools, businesses and other buildings, road networks,
5
Angrist and Pischke (2010, p. 24) acknowledge a fear “that the experimentalist paradigm leads researchers
to look for good experiments, regardless of whether the questions they address are important” and note that
“There is no shortage of academic triviality.” The debate continues, but also spurs innovation that improves
methods for study design, data generation and estimation of impacts that shed light on important questions.
social media and cell phone activity provide the granular data needed to
specify patterns of commuting, communication, travel and social interac-
tions. These are essential in developing individual-level models to build
understanding of and design policies to respond to, for example, natural
disasters, terrorist attacks, and outbreaks of infectious diseases (Eubank
et al., 2004; Venkatramanan et al., 2018; Waldrop, 2018).
Big data also enable important theoretical developments in dynamics. For
example, most work in networks has focused on static networks, while many
important real-world networks are dynamic, with new nodes created and
lost, and links among them forged and broken, over time, both exogenously
and endogenously. Consider two examples. First, the famous “preferential
attachment” algorithm (Barabási and Albert, 1999) generates so-called
“scale-free” networks that closely resemble many real networks because it
embodies a powerful positive feedback loop in which new nodes arising in
the network are more likely to link to existing nodes with more links than to
nodes with few links, further increasing the probability that other nodes will
link to the popular ones. Second, conventional wisdom has held that the
presence of transient (“temporal”) links compromises information diffusion,
exploration, synchronization and other aspects of network performance,
including controllability. In contrast, Li et al. (2017) develop formal models
and simulations based on a range of real dynamic networks, from the yeast
proteome to cell phones, showing that “temporal networks can, compared to
their static counterparts, reach controllability faster, demand orders of mag-
nitude less control energy, and have control trajectories that are considerably
more compact than those characterizing static networks.” System dynamics
modelers who embrace big data, analytics and modern dynamical systems
theory will find tremendous opportunities for rigorous work that can address
critical challenges facing humanity.
practice dating to the 1960s even though better choices now exist and some
old methods and tools are now neither effective nor acceptable for research
or practice. Some fail to emphasize, use, or teach basic principles to gather
evidence, test hypotheses and build confidence in models. To illustrate, for
many years the website of the System Dynamics Society, on a prominent
page titled “Introduction to System Dynamics”, described how system
dynamics ought to be done:6
worst, policies based on such weakly grounded and poorly tested “insights”
put people at risk of harm.
Despite the rapid progress in methods for parameter estimation, statistical
inference, and causal attribution in complex dynamic systems, some con-
tinue to argue, wrongly, that formal estimation of model parameters is not
possible or not necessary. Today, best practice leverages the rapid progress
in data availability and methods for parameter estimation and causal infer-
ence in complex dynamic systems. Discussing policy based on model simu-
lations without presenting any formal measures of goodness of fit or
evidence to justify the parameters and assess the uncertainty around them
and in results is simply not acceptable.
Some go farther and argue that “quick and dirty” models are good enough,
that expert judgment—often their own—is a sufficient basis for the choice of
model boundary, formulations, and parameter values, that subjective
judgment—again, often their own—of model fit to historical data is suffi-
cient. After all, the alternative is that managers and policymakers will con-
tinue to use their mental models to make consequential decisions. Surely, it
is argued, even a simple model, or even a causal diagram or system arche-
type is a better basis for decision making. Such claims must be tested rigor-
ously. So far, the evidence does not support them (e.g. McCardle-Keurentjes
et al., 2018, this issue). Learning in and about complex systems requires con-
stant iteration between experiments in the virtual world of models, where
costs and risks are low, and interventions in the real world, where experi-
ments are often costly and risky (Sterman, 1994). Failing to test models
against evidence and through experiments in the real world cuts these criti-
cal feedbacks. Replacing a poor mental model with a diagram, archetype, or
simulation that is not grounded in evidence and is poorly tested may create
more harm by providing false confidence and more deeply embedding
flawed mental models.
Even more pernicious is the claim that the methods outlined here are only
relevant for academic work, or that those who advocate for greater scientific
rigor are motivated only to have their work published in academic journals,
instead of working on the grand challenges, as Jay called upon us to
do. Publishing for the sake of publishing is indeed a waste of resources and
talent. Publishing is a means, not an end. We should strive to publish our
work in the best academic journals because peer review provides valuable
feedback, helping us uncover errors and improve our work (Repenning,
2004). We should strive to publish our work because publication enables
others to learn about, use and build on it. We should strive to publish
because, yes, publication in respected journals is how academics gain ten-
ure. Promotion and tenure are, like publishing, not an end in themselves.
They are the means that provide the academic freedom that enables scholars
to work on the important issues, whether they are in vogue or not. Promo-
tion and tenure are the means through which the next generation of
© 2018 System Dynamics Society
DOI: 10.1002/sdr
40 System Dynamics Review
7
System dynamics practitioners and consultants play a vital role in applications and real-world impact, but
they do not have the resources, infrastructure or incentives to educate and train people to the level required
to advance the state of the art or grow the capacity of the field, and, with very few exceptions, have not
done so.
Acknowledgements
I thank the Forrester family for permission to use images from Jay’s personal
files, and Yaman Barlas, Jack Homer, David Keith, Hazhir Rahmandad, Nel-
son Repenning, Rogelio Oliva, Jørgen Randers, George Richardson and many
other colleagues for helpful comments and suggestions. All errors are mine.
Biography
References
Glaser B, Strauss A. 2017. The discovery of grounded theory: strategies for qualitative
research. Routledge: Abingdon, U.K.
Gonzalez C, Wong H. 2012. Understanding stocks and flows through analogy. System
Dynamics Review 28(1): 3–27.
Gourieroux C, Monfort A, Renault E. 1993. Indirect inference. Journal of Applied
Econometrics 8(S1): S85–S118.
Green T. 2010. Bright boys: the making of information technology. Taylor & Francis:
Milton Park, U.K.
Greenwald A, Banaji M. 1995. Implicit social cognition: attitudes, self-esteem, and
stereotypes. Psychological Review 102(1): 4–27.
Hoekstra A, Chopard B, Coveney P. 2014. Multiscale modelling and simulation: a
position paper. Philosophical Transactions Series A 372(2021). https://doi.org/10.
1098/rsta.2013.0377.
Holz C, Siegel L, Johnston E, Jones A, Sterman J. 2018. Ratcheting ambition to limit
warming to 1.5 C: trade-offs between emission reductions and carbon dioxide
removal. Environmental Research Letters 13(6): 1–11. https://doi.org/10.
1088/1748-9326/aac0c1.
Homer J. 1987. A diffusion model with application to evolving medical technologies.
Technological Forecasting and Social Change 31(3): 197–218.
Homer J. 1996. Why we iterate: scientific modeling in theory and practice. System
Dynamics Review 12(1): 1–19.
Homer J. 2012. Partial-model testing as a validation tool for system dynamics. System
Dynamics Review 28(3): 281–294.
Hosseinichimeh N, Rahmandad H, Jalali M, Wittenborn A. 2016. Estimating the
parameters of system dynamics models using indirect inference. System Dynamics
Review 32(2): 156–180.
Jalali M, Rahmandad H, Ghoddusi H. 2015. Using the method of simulated moments
for system identification. In Analytical Methods for Dynamic Modelers,
Rahmandad H, Oliva R, Osgood ND (eds). MIT Press: Cambridge, MA; 39–69.
Janis IL. 1982. Groupthink: psychological studies of policy decisions and fiascoes.
Houghton Mifflin: Boston, MA.
Kahneman D. 2011. Thinking, fast and slow. Farrar, Straus, and Giroux: New York.
Kampmann C. 2012. Feedback loop gains and system behavior. System Dynamics
Review 28(4): 370–395.
Kampmann C, Oliva R. 2009. Analytical methods for structural dominance analysis in sys-
tem dynamics. In Encyclopedia of complexity and systems science, Meyers R (ed).
Springer: Berlin.
Kampmann C, Sterman J. 2014. Do markets mitigate misperceptions of feedback? Sys-
tem Dynamics Review 30(3): 123–160.
Kapmeier F, Gonçalves P. 2018. Wasted paradise? Policies for Small Island States to
manage tourism-driven growth while controlling waste generation: the case of the
Maldives. System Dynamics Review 34(1–2): 172–221.
Keeling MJ, Rand DA, Morris AJ. 1997. Correlation models for childhood epidemics.
Proceedings of the Royal Society B Biological Sciences 264: 1149–1156.
Keith D, Sterman J, Struben J. 2017. Supply constraints and waitlists in new product
diffusion. System Dynamics Review 33(3–4): 254–279.
informed climate action: Evidence from the World Climate simulation. PLoS One
13(8): 1–28. https://doi.org/10.1371/journal.pone.0202877.
Rudolph J, Morrison J, Carroll J. 2009. The dynamics of action-oriented problem solv-
ing: linking interpretation and choice. Academy of Management Review 34(4):
733–758.
Rydzak F, Monus P. 2018. Shaping organizational network structure to enable sus-
tainable transformation. System Dynamics Review 34(1–2): 255–283.
Saleh M, Oliva R, Kampmann CE, Davidsen P. 2010. A comprehensive analytical
approach for policy analysis of system dynamics models. European Journal of
Operational Research 203(3): 673–683.
Sanchez C, Dunning D. 2018. Overconfidence among beginners: Is a little learning a
dangerous thing? Journal of Personality and Social Psychology 114(1): 10–28.
Saysel A, Barlas Y. 2006. Model simplification and validation with indirect structure
validity tests. System Dynamics Review 22(3): 241–262.
Simmons J, Nelson L, Simonsohn U. 2011. False-positive psychology: undisclosed
flexibility in data collection and analysis allows presenting anything as significant.
Psychological Science 22(11): 1359–1366.
Sloman S. 1996. The empirical case for two systems of reasoning. Psychological Bul-
letin 119(1): 3–22.
Sterman J. 1987. Testing behavioral simulation models by direct experiment. Man-
agement Science 3(12): 1572–1592.
Sterman J. 1989. Modeling managerial behavior: misperceptions of feedback in a
dynamic decision making experiment. Management Science 35(3): 321–339.
Sterman J. 1994. Learning in and about complex systems. System Dynamics Review
10(2–3): 291–330.
Sterman J. 2000. Business dynamics: systems thinking and modeling for a complex
world. Irwin/McGraw-Hill: New York.
Sterman J (ed). 2007. Exploring the next great frontier: system dynamics at fifty. Sys-
tem Dynamics Review 23(2–3): 89–93.
Sterman J. 2015. Learning for ourselves: interactive simulations to catalyze science-
based environmental activism. In Science based activism, Stoknes P, Eliassen K
(eds). Fagbokfolaget: Bergen, Norway; 253–279.
Sterman J, Dogan G. 2015. I’m not hoarding, I’m just stocking up before the hoarders
get here: behavioral causes of phantom ordering in supply chains. Journal of Oper-
ations Management 39–40: 6–22.
Sterman J, Fiddaman T, Franck T, Jones A, McCauley S, Rice P, Sawin E, Siegel L.
2012. Climate interactive: the C-ROADS climate policy model. System Dynamics
Review 28(3): 295–305.
Sterman J, Fiddaman T, Franck T, Jones A, McCauley S, Rice P, Sawin E, Siegel L.
2013. Management flight simulators to support climate negotiations. Environmen-
tal Modelling and Software 44: 122–135.
Sterman J, Repenning N, Kofman F. 1997. Unanticipated side effects of successful
quality programs: exploring a paradox of organizational improvement. Manage-
ment Science 43(4): 501–521.
Strogatz S. 2001. Exploring complex networks. Nature 410: 268–276.
Struben J, Sterman J. 2008. Transition challenges for alternative fuel vehicle and
transportation systems. Environment and Planning B 35: 1070–1097.