Efficient Coding and Risky Choice
Efficient Coding and Risky Choice
Efficient Coding and Risky Choice
C The Author(s) 2021. Published by Oxford University Press on behalf of the Presi-
dent and Fellows of Harvard College. All rights reserved. For Permissions, please email:
journals.permissions@oup.com
The Quarterly Journal of Economics (2022), 161–213. https://doi.org/10.1093/qje/qjab031.
Advance Access publication on August 19, 2021.
161
162 THE QUARTERLY JOURNAL OF ECONOMICS
I. INTRODUCTION
In nearly all economic models of risky choice, the decision
maker (DM) is assumed to make a choice based on a precise
5. KLW also provide an extension to their baseline model in which the precision
of noisy coding can flexibly change with the volatility of a particular (log-normal)
prior distribution. Our framework further generalizes this flexibility by deriving
optimal coding rules for any prior distribution.
168 THE QUARTERLY JOURNAL OF ECONOMICS
how the DM’s prior belief shifts after observing a noisy signal of
the true payoff. As we show next, the distribution of the noisy
signal drives the main predictions of our model.
(2)
Equations (3) and (4) show that the likelihood functions are
driven by the coding rules, θ (X) and θ (C), which map X and C
into the probability that a neuron emits a value of 1. Intuitively,
if the DM is particularly concerned about perceiving values of X
within a given range, then a good coding rule, θ (X), should be very
sensitive to X over that range.
When n is large, and when pX and C are i.i.d., HWP show
that the coding rules that maximize expected financial gain are
given by9
⎡ ⎛ X
⎞ ⎤2
2
⎢ ⎜π f (x) dx ⎟⎥ 3
⎢ ⎜ ⎟⎥
(5) θ (X) = ⎢sin ⎜ −∞ ⎟⎥
⎣ ⎝2 ∞ 2 ⎠⎦
f (x) 3 dx
−∞
9. The coding rules described in this section are derived when n, which param-
eterizes the capacity constraint, is sufficiently large. Appendix 7 of HWP shows
that for any finite n that is greater than or equal to 10, the coding rules remain
approximately optimal. When illustrating the model’s implications in Section II,
we set n to 10.
EFFICIENT CODING AND RISKY CHOICE 171
and
⎡ ⎛ C
⎞⎤2
2
⎢ ⎜π f (c) dc ⎟⎥
3
−∞
Probaccurate
≡ Prob(Rx > Rc |θ (X) > θ (C)) · 1θ(X)>θ(C) · f (X) f (C) · dXdC
+ Prob(Rx < Rc |θ (X) < θ (C)) · 1θ(X)<θ(C) · f (X) f (C) · dXdC.
(8)
are satisfied, the coding rules under all three objectives reduce to
2
π X − Xl
θ (X) = sin
and
2
π C − Cl
(11) θ (C) = sin ,
2 Cu − Cl
(A)
0.2
0.15
0.05
10 15 20 25 30
(B)
1
0.8
0.6
0.4
0.2
0
10 15 20 25 30
(C)
0.5
0.4
0.3
0.2
0.1
0
0 5 10
FIGURE I
Prior Distributions, Coding Rules, and the Optimal Likelihood Functions
Panel A plots two uniform prior distributions for X, one with low volatility (Xl =
16 and Xu = 24) and the other with high volatility (Xl = 8 and Xu = 32). Panel B
plots the coding rule θ (X), defined in equation (10), for both volatility environments.
Panel C plots the implied likelihood function f(Rx |X), defined in equation (12), for
two values, X = 18 and X = 22, and for each of the two prior distributions. The
capacity constraint parameter n is set to 10.
EFFICIENT CODING AND RISKY CHOICE 175
and
Cu
Cu f (Rc |C) f (C)CdC
where f(X) and f(C) are the DM’s prior beliefs about X and C, and
the likelihood functions f(Rx |X) and f(Rc |C) are from equation (12).
Importantly, equation (13) shows that the DM’s estimate of X
is a random variable, and the randomness comes from Rx . There-
fore, the DM faces a distribution of perceived values for each X.
We now characterize the mean and standard deviation of this
distribution. Specifically, we define the value function, v(X), by
n
(15) v(X) = f (Rx |X) · E[ X̃|Rx ].
Rx =0
⎡ ⎤ 12
n
σ (X) = ⎣ f (Rx |X)(E[ X̃|Rx ]) − v 2 (X)⎦ .
2
(16)
Rx =0
Equations (15) and (16) indicate that the curvature of the value
function and the randomness in subjective valuation are jointly
determined by the DM’s prior beliefs and the implied likelihood
functions.
In keeping with the running example from the previous sec-
tion, Figure II, Panel A plots, for both the high- and low-volatility
environments, the average subjective valuation v(X), as well as
its one standard deviation bounds v(X) ± σ (X).
The figure shows that randomness in utility, σ (X), is substan-
tially higher in the high-volatility environment. This is driven by
the greater overlap of likelihood functions in the high-volatility
environment, compared with the low-volatility environment. Be-
cause subjective valuation is noisier in the high-volatility envi-
ronment, the model predicts that choices will also be noisier and
hence less sensitive to a given change in payoff values.
176 THE QUARTERLY JOURNAL OF ECONOMICS
(A) (B)
0.2 0.4
0.3
0.2
0.1
0.1
0.05
0
10 20 30 2 4 6 8
8
30
25
6
20
15 4
10
2
10 20 30 2 4 6 8
FIGURE II
Prior Distributions and Value Functions
Panel A: the upper graph plots two uniform prior distributions for X, one with
low volatility (Xl = 16 and Xu = 24) and the other with high volatility (Xl = 8
and Xu = 32). The lower graph plots the subjective valuations implied by efficient
coding, v(X), and their one standard deviation bounds v(X) ± σ (X). Panel B: the
upper graph plots two prior distributions, one increasing and one decreasing. The
increasing distribution is characterized by
i , h, l = l, if Xl X Xm
i
f X; Xl , Xu, Xm ,
h, if Xm < X Xu
i
7 1
where Xl = 2, Xu = 8, Xm
i = 4.5, h =
25 , and l = 125 . The decreasing distribution
is characterized by
h, if Xl X Xm
d
f X; Xl , Xu, Xm
d , h, l =
d < XX
,
l, if Xm u
Prob(risk taking|X, C)
n
n
= 1 p·E[ X̃|Rx ]>E[C̃|Rc ] · f (Rx |X) · f (Rc |C)
Rx =0 Rc =0
n
n
1
(17) + 1 p·E[ X̃|Rx ]=E[C̃|Rc ] · f (Rx |X) · f (Rc |C) .
2
Rx =0 Rc =0
Equation (17) says that the DM chooses the risky lottery over the
certain option when p · E[ X̃|Rx ] > E[C̃|Rc ], and the DM randomly
chooses between the two options when p · E[ X̃|Rx ] = E[C̃|Rc ].
Figure III plots, for both the high- and low-volatility envi-
ronments, the probability of risk taking against the difference in
expected values between the two options, namely, pX − C. Natu-
rally, a higher value of pX − C increases the attractiveness of the
risky lottery and hence increases the probability of risk taking.
Note that for an expected-utility maximizer with no background
wealth, the probability of risk taking should be a step function of
pX − C with a single step at pU −1 ( U (C)−(1− p
p)U (0)
) − C. However,
Figure III shows that under noisy coding, the probability of risk
taking has an S-shaped relationship with pX − C. More impor-
tant, under efficient coding, the slope of this function is negatively
related to the volatility of the stimulus distribution (for those val-
ues of pX − C that do not deliver an extreme probability near 0
or 1). Thus, for a given increase in X, the probability of choosing
the risky lottery increases more in the low-volatility environment.
This heightened sensitivity in the low-volatility environment can
be traced back to the property illustrated in Figure I, Panel C: a
given increase in X leads to a larger difference in the distribution
of noisy signals in the low-volatility environment, compared with
the high-volatility environment.
178 THE QUARTERLY JOURNAL OF ECONOMICS
10. When the prior distribution is increasing or decreasing, conditions (9) for
the equivalence of coding rules no longer hold. Figure II, Panel B presents the
subjective valuations based on the coding rule that maximizes the DM’s expected
financial gain. The results are quantitatively similar if the subjective valuations
are instead based on the coding rule that maximizes mutual information.
180 THE QUARTERLY JOURNAL OF ECONOMICS
fixed at 0.5 for all trials. The values of X and C are drawn in-
dependently, and we manipulate the distribution of each payoff
across two volatility conditions. In the high-volatility condition, X
is drawn uniformly from [8, 32], and C is drawn uniformly from
[4, 16]. In the low-volatility condition, X is drawn uniformly from
[16, 24], and C is drawn uniformly from [8, 12].
We choose these design parameters for two reasons. First, be-
cause our goal is to isolate the effect of volatility, we keep the mean
of each payoff distribution constant across conditions. The mean
of X is fixed at 20, and the mean of C is fixed at 10. Second, our
parameter values satisfy conditions (9): the distributions of X and
C are independent, and pX and C are identically and uniformly
distributed. These conditions imply that the efficient coding rule
is robust to changing the performance objective from maximiz-
ing expected financial gain to maximizing mutual information or
maximizing the probability of an accurate choice. Thus, our design
is optimized to test generic predictions of efficient coding.
Figure IV shows a schematic of the task design. Each sub-
ject goes through both the high- and low-volatility conditions; the
EFFICIENT CODING AND RISKY CHOICE 181
Notes. The table reports results from mixed-effects linear regressions in which the dependent variable takes the value of 1 if the subject chooses the risky lottery, and 0 otherwise.
The dummy variable, high, takes the value of 1 if the trial belongs to the high-volatility condition, and 0 if it belongs to the low-volatility condition. Only data from common trials are
included. There are random effects on the independent variables X, C, and the intercept. Standard errors of the fixed-effect estimates are clustered at the subject level and reported
in parentheses. ∗ , ∗∗ , and ∗∗∗ indicate significance at the 10%, 5%, and 1% level, respectively.
183
(A)
1
0.8
0.4
0.2
0
-3 -2 -1 0 1 2 3 4
(B)
1
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
FIGURE V
Frequency of Risk Taking across Volatility Conditions
Panel A: the graph plots, for each volatility condition, the empirical frequency of
risk taking against the difference in expected values between the risky lottery and
the certain option, namely, pX − C. The frequency of risk taking is computed as
the proportion of trials on which subjects choose the risky lottery. Data are pooled
across subjects over all common trials in the first condition, and thus represent
between-subjects comparisons. For each volatility condition, we bin the running
variable, pX − C, to its nearest integer value, and plot the mean for each bin. The
length of the vertical bar inside each data point denotes two standard errors of the
mean. Standard errors are clustered by subject. Panel B: each point represents one
of the 30 common trials in the first condition. The x-axis measures the frequency of
risk taking in the high-volatility condition, while the y-axis measures the frequency
of risk taking in the low-volatility condition. Inside each data point, the length of
the vertical bar denotes two standard errors of the mean frequency of risk taking in
the low-volatility condition. The length of the horizontal bar denotes two standard
errors of the mean frequency of risk taking in the high-volatility condition.
186 THE QUARTERLY JOURNAL OF ECONOMICS
12. For the high-volatility condition, we designate 90 out of the 340 test trials
as common trials. These 90 common trials are created by sampling each element
in the low-volatility condition five times. The remaining 250 trials in the high-
volatility condition are drawn with 50% probability from a uniform distribution
over [31, 55] and with 50% probability from a uniform distribution over [75, 99].
For the low-volatility condition, we designate the entire 340 test trials as common
EFFICIENT CODING AND RISKY CHOICE 191
trials as they all fall in the range [56, 74]. This procedure ensures that X is drawn
according to its population distribution in both conditions.
13. Including these two subjects in our subsequent analyses does not affect
any of the main results.
192 THE QUARTERLY JOURNAL OF ECONOMICS
(A)
1
0.8
0.4
0.2
0
30 40 50 60 70 80 90 100
(B)
0.7
0.6
0.5
30 40 50 60 70 80 90 100
FIGURE VIII
Classification Accuracy and Response Times for the Perceptual Choice Task
Panel A: the x-axis denotes the integer X that is presented on each trial, and the
y-axis denotes the proportion of trials on which subjects classified X as greater than
65. Panel B: the y-axis denotes the average response time for subjects to execute
a decision, for trials on which subjects responded correctly. Data are pooled across
subjects over all test trials in the first condition, and thus represent between-
subjects comparisons. The length of the vertical bar inside each data point denotes
two standard errors of the mean. Standard errors are clustered by subject.
14. In the perceptual choice task, we incentivize fast responses; therefore, the
coding rule from equation (5) does not necessarily maximize expected financial
gain. At the same time, the coding rule from equation (10) continues to maximize
mutual information. For this reason, we opt for the assumption that the DM
maximizes mutual information in this task.
194
TABLE II
FREQUENCY OF CLASSIFYING X AS GREATER THAN 65 IN THE PERCEPTUAL CHOICE TASK
56 X 74 60 X 69 56 X 59 56 X 74 60 X 69 56 X 59
Dependent variable: or or
“Classify X as greater 70 X 74 70 X 74
than 65” (1) (2) (3) (4) (5) (6)
Notes. The table reports results from mixed-effects logistic regressions in which the dependent variable takes the value of 1 if the subject classifies the integer X as larger than
THE QUARTERLY JOURNAL OF ECONOMICS
65, and 0 otherwise. The integer X is drawn uniformly from the set [31, 99]\{65} in the high-volatility condition, while it is drawn uniformly from the set [56, 74]\{65} in the
low-volatility condition. The dummy variable, high, takes the value of 1 if the trial belongs to the high-volatility condition, and 0 if it belongs to the low-volatility condition. There
are random effects on the independent variable X − 65 and the intercept. Standard errors of the fixed-effect estimates are clustered at the subject level and reported in parentheses.
∗ , ∗∗ , and ∗∗∗ indicate significance at the 10%, 5%, and 1% level, respectively.
then we estimate the model using data from the perceptual choice
task.
1. Estimation of the Risky Choice Task. Recall that the one
(20)
where y ≡ {yt }300 t=31 and yt denotes the subject’s choice on trial t;
yt = 1 if the subject chooses the risky lottery, and yt = 0 if the
subject chooses the certain option. In addition, Prob(yt |n) denotes
the model-predicted probability of choosing the risky lottery given
n, Xt , and Ct ; it is computed using equation (17). We maximize the
log likelihood function in equation (20) by searching over integer
values of n in [5, 40]. We find that the average estimate of n,
across subjects, is 8.9 with a standard deviation of 9.7, indicating
substantial heterogeneity.
Our baseline model of efficient coding assumes linear utility;
as such, the DM’s optimal decision rule depends on E[ X̃|Rx ] and
E[C̃|Rc ]. However, the model can easily be integrated with stan-
dard nonlinear utility functions. To generalize the baseline model,
we assume that the DM maximizes expected utility, with a util-
ity function U(·) = (·)α . Under this assumption, the DM chooses
the risky lottery if and only if p · E[( X̃)α |Rx ] > E[(C̃)α |Rc ].15 The
optimal coding rules presented in equations (5) and (6) are then
15. As in equation (17), we also assume that the DM randomly chooses between
the risky lottery and the certain option when p · E[( X̃)α |Rx ] = E[(C̃)α |Rc ].
196 THE QUARTERLY JOURNAL OF ECONOMICS
replaced by16
⎡ ⎛ X
⎞ ⎤2
2 α−1
⎢ ⎜π f (x) (x) dx ⎟⎥
3 3
and
⎡ ⎛ C
⎞ ⎤2
2 α−1
⎢ ⎜π f (c) (c) dc ⎟⎥
3 3
⎢ ⎜ Cl ⎟⎥
(22) θ (C) = ⎢sin ⎜ C ⎟⎥ .
⎣ ⎝2 u
2 α−1 ⎠⎦
f (c) 3 (c) 3 dc
Cl
Prob(risk taking|X, C)
n
n
= 1 p·E[( X̃)α |Rx ]>E[(C̃)α |Rc ] · f (Rx |X) · f (Rc |C)
Rx =0 Rc =0
n
n
1
(23) + 1 p·E[( X̃)α |Rx ]=E[(C̃)α |Rc ] · f (Rx |X) · f (Rc |C) .
2
Rx =0 Rc =0
16. The generalization of the coding rules to allow for nonlinear utility follows
the analysis of this issue in Payzan-LeNestour and Woodford (forthcoming).
EFFICIENT CODING AND RISKY CHOICE 197
400
LL(n|z) = zt · log(Prob(zt |n)) + (1 − zt ) · log(1 − Prob(zt |n)),
t=61
(24)
17. The increasing and the decreasing distributions take the form of
7 1
equations (18) and (19). The parameter values are: Xl = 2, Xu = 8, h = 25 , l = 125 ,
i = 4.5, and Xd = 5.5.
Xm m
18. Online Appendix A provides a brief proof of this statement.
EFFICIENT CODING AND RISKY CHOICE 199
19. Our common trials focus on large values of X because these values lead
to a difference in risk taking across the two experimental conditions that remains
substantial even when subjects exhibit a strong degree of intrinsic risk aversion.
By contrast, small values of X lead to a difference in risk taking across the two
conditions that diminishes when subjects’ degree of risk aversion is sufficiently
high. Moreover, we fix the value of C at $2.70 so that it has a high density in
both conditions. As a result, the difference in the perception of C across the two
conditions is minimal and only has a small impact on the difference in risk taking
across conditions.
20. The design in this task is similar to that of Payzan-LeNestour and Wood-
ford (forthcoming) who insert a “test trial” every 40 trials, although their design is
implemented in the perceptual choice where a subject is incentivized to discrimi-
nate between shades of gray.
200 THE QUARTERLY JOURNAL OF ECONOMICS
TABLE III
FREQUENCY OF CHOOSING THE RISKY LOTTERY IN EXPERIMENT 2 (SHAPE
MANIPULATION)
Notes. The table reports results from mixed-effects linear regressions in which the dependent variable
takes the value of 1 if the subject chooses the risky lottery, and 0 otherwise. The dummy variable, increasing
prior, takes the value of 1 if the trial belongs to the increasing-prior condition, and 0 if it belongs to the
decreasing-prior condition. Only data from common trials are included. The variable C is constant among
common trials, and therefore is not included in the regressions as a control variable. There are random effects
on the independent variable X and the intercept. Standard errors of the fixed-effect estimates are clustered
at the subject level and reported in parentheses. ∗ , ∗∗ , and ∗∗∗ indicate significance at the 10%, 5%, and 1%
level, respectively.
times, defined as less than 0.5 seconds. After applying these four
exclusion criteria, 151 subjects remain with a total of 85,703 trials,
of which 2,278 are common trials. All regression results presented
below are robust to using the full sample without applying the ex-
clusion criteria.
Our main hypothesis involves testing whether, for a fixed
value of C, a subject’s appetite for risk is higher when large val-
ues of X are more frequent. Table III presents results from mixed-
effects linear regressions in which the dependent variable takes
the value of 1 if the subject chooses the risky lottery and 0 oth-
erwise. All regressions in Table III include only common trials.
Column (1) shows that the frequency of choosing the risky lot-
tery is 7.5% higher in the context of the increasing distribution,
compared with the decreasing distribution (p-value = .001). As
in Experiment 1, this result is based on a comparison where the
choice sets are fixed and only the context varies across conditions.
We emphasize that the significant difference in risk taking cannot
be driven by the mere fact that X has a higher mean in the in-
creasing condition, because this effect is exactly offset by a higher
V. DISCUSSION
V.A. Adaptation Dynamics
The theoretical framework in Section II is a static model of
risky choice, and hence does not tackle the important question re-
garding how the DM learns the prior distribution. Most empirical
tests of efficient coding in sensory perception assume full adap-
tation to the prior distribution (Laughlin 1981; Wei and Stocker
2015), and this assumption has also been recently invoked in pa-
pers on efficient coding in value-based decisions (Rustichini et al.
2017; Polanı́a, Woodford, and Ruff 2019). Following this litera-
ture, we have assumed that subjects in our experiments are fully
adapted to the population distribution after completing an ini-
tial set of preregistered “adaptation trials.” Yet we emphasize this
EFFICIENT CODING AND RISKY CHOICE 203
VI. CONCLUSION
We have experimentally tested the hypothesis that efficient
coding, a core principle from neuroscience, is a driving force in
decision making under risk. Our results provide strong evidence
that the DM’s willingness to take risk depends systematically on
the payoff distribution to which she has recently adapted. Our ex-
perimental data are consistent with the noisy perception of lottery
payoffs; moreover, we find that the noise distribution varies as an
optimal response to a change in the environment. In Experiment
1, we show that risky choice becomes noisier as the volatility of
the payoff distribution increases. In Experiment 2, we find that
the level of risk taking changes with the shape of the payoff dis-
tribution, which highlights the role that efficient coding plays in
generating perceptual biases. Together, our data indicate that risk
taking is systematically unstable across environments, in a man-
ner that closely mimics the instability of sensory perception.
Our results raise a number of important directions for future
work. There is a strong need to understand how the DM adapts
to a given environment based on the history of perceived pay-
offs. This mechanism of course depends on the DM’s prior beliefs
about payoffs—which we manipulate in our experiments—but it
also depends on higher-order priors about the rate at which the
environment changes. For example, if the DM expects the environ-
mental distribution to change rapidly, then adaptation will also
likely take place at a fast pace (Behrens et al. 2007; Nassar et al.
2012). Theory is already being developed along this direction, but
future experimental evidence of the adaptation process will be
critical in guiding further development of such theory (Robson and
Whitehead 2018; Aridor, Grechi, and Woodford 2020; Młynarski
and Hermundstad 2021).
Another important direction for future research is to test the
implications of efficient coding outside the laboratory. A challenge
here is to measure the prior distribution to which the DM has
adapted. A more refined theory of adaptation will be integral for
210 THE QUARTERLY JOURNAL OF ECONOMICS
SUPPLEMENTARY MATERIAL
An Online Appendix for this article can be found at The
Quarterly Journal of Economics online.
DATA AVAILABILITY
Data and code replicating the tables and figures in this article
can be found in Frydman and Jin (2021) in the Harvard Dataverse,
https://doi.org/10.7910/DVN/PYQXAD.
REFERENCES
Alempaki, Despoina, Emina Canic, Timothy L. Mullett, William J. Skylark, Chris
Starmer, Neil Stewart, and Fabio Tufano, “Reexamining How Utility and
Weighting Functions Get Their Shapes: A Quasi-Adversarial Collaboration
Providing a New Interpretation,” Management Science, 65 (2019), 4841–4862.
André, Quentin, and Bart de Langhe, “No Evidence for Loss Aversion Disappear-
ance and Reversal in Walasek and Stewart (2015),” Journal of Experimental
Psychology: General (forthcoming), https://doi.org/10.1037/xge0001052.
Aridor, Guy, Francesco Grechi, and Michael Woodford, “Adaptive Efficient Coding:
A Variational Auto-encoder Approach,” Columbia University Working Paper,
2020.
Barberis, Nicholas, “Psychology-Based Models of Asset Prices and Trading Vol-
ume,” in Handbook of Behavioral Economics, Vol. 1, Douglas Bernheim, Ste-
fano DellaVigna and David Laibson, eds. (Amsterdam: North Holland, 2018),
Ch. 2, 79–175.
Barlow, Horace, “Possible Principles Underlying the Transformations of Sensory
Messages,” in Sensory Communication, Walter A. Rosenblith, ed. (Cambridge,
MA: MIT Press, 1961), 217–234.
Behrens, Timothy, Mark Woolrich, Mark Walton, and Matthew Rushworth,
“Learning the Value of Information in an Uncertain World,” Nature Neuro-
science, 10 (2007), 1214–1221.
EFFICIENT CODING AND RISKY CHOICE 211
Khaw, Mel Win, Paul W. Glimcher, and Kenway Louie, “Normalized Value Coding
Explains Dynamic Adaptation in the Human Valuation Process,” Proceedings
of the National Academy of Sciences, 114 (2017), 12696–12701.
Khaw, Mel Win, Ziang Li, and Michael Woodford, “Cognitive Imprecision and
Small-Stakes Risk Aversion,” Review of Economic Studies, 88 (2021), 1979–
Stewart, Neil, Stian Reimers, and Adam J. L. Harris, “On the Origin of Utility,
Weighting, and Discounting Functions: How They Get Their Shapes and How
to Change Their Shapes,” Management Science, 61 (2015), 687–705.
Stigler, George J., “The Economics of Information,” Journal of Political Economy,
69 (1961), 213–225.