SSRN Id4747461

UNTANGLING UNIVERSALITY AND DISPELLING
MYTHS IN MEAN-VARIANCE OPTIMIZATION
JEROME BENVENISTEa , PETTER N. KOLMb , AND GORDON RITTERc
Abstract. Following Markowitz’s pioneering work on mean-variance

optimization (MVO), such approaches have permeated nearly every facet
of quantitative finance. In the first part of the article, we argue that
their widespread adoption can be attributed to the universality of the
mean-variance paradigm, wherein the maximum expected utility and
mean-variance allocations coincide for a broad range of distributional
assumptions of asset returns. Subsequently, we introduce a formal defini-
tion of mean-variance equivalence, and present a novel and comprehen-
sive characterization of distributions, termed mean-variance-equivalent
(MVE) distributions, wherein expected utility maximization and the
solution of an MVO problem are the same. In the second part of the
article, we address common myths associated with MVO. These myths
include the misconception that MVO necessitates normally-distributed
asset returns, the belief that it is unsuitable for cases with asymmetric
return distributions, the notion that it maximizes errors, and the per-
ception that it underperforms a simple 1/n portfolio in out-of-sample
tests. Furthermore, we address misunderstandings regarding MVO’s
ability to handle signals across different time horizons, its treatment of
transaction costs, its applicability to intraday and high-frequency trading,
and whether quadratic utility accurately represents investor preferences.
Keywords: Investment management; Mean-variance optimization; Mean-

variance equivalent distributions; Portfolio optimization; Portfolio theory;
Robust portfolio management; Trading; Universality
(a) Courant Institute of Mathematical Sciences, New York University, NY, USA. Email:
ejb14@nyu.edu
(b) Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., NY
10012, USA. Email: petter.kolm@nyu.edu
(c) Courant Institute of Mathematical Sciences, NYU Tandon School of Engineering,
Baruch College, Columbia University and General Partner/CIO at Ritter Alpha, LP. Email:
ritter@post.harvard.edu
Date: March 3, 2024.
1
Electronic copy available at: https://ssrn.com/abstract=4747461

1. Introduction
With hindsight gained over almost 75 years, we are in the enviable position
of being able to place Markowitz’ pioneering work in historical context,
and use subsequent developments to shed new light on his contributions.
Rubinstein (2011) defines 1950 as the boundary between the “ancient” and
“classical” periods in the history of financial economics. The critical juncture
is the publication of Harry Markowitz’s seminal article, “Portfolio Selection”
in 1952 (Markowitz, 1952). In this article, Markowitz addresses a fundamental
question in investment management of how investors should allocate their
capital among a diverse set investment options. He formally quantifies the
return and risk of assets through their expected returns and their pairwise
return covariances. Then, Markowitz proposes that investors should consider
the trade-off between portfolio risk and return when determining the allocation
across investment alternatives.
His work is groundbreaking for several reasons. First, it represents a
significant departure from traditional financial wisdom by challenging the
notion that investors should solely prioritize assets based on which ones offer
the highest future value relative to their current price, without taking risk
into account. Second, it is arguably the first time a financial decision-making
process is formalized as a mathematical optimization problem. Specifically,
the mean-variance optimization (MVO) (or simply “mean-variance” for short)
problem prescribes that among the infinite number of portfolios achieving
a particular desired return, the investor should select the portfolio with the
lowest risk, measured as portfolio variance. Portfolios with identical target
returns but greater portfolio variances are deemed “inefficient,” owing to their
increased risk.
In some sense, Markowitz’ work came far too early. The optimization
of the risk-reward trade-off in portfolio theory by means of MVO is not
an accident; indeed, mean-variance is a necessary consequence of utility
theory for the broad family of elliptical distributions, as demonstrated by
Chamberlain (1983). In the 1960s and 1970s, the foundation for multi-factor
risk models are laid by Sharpe, Blume, King and Rosenberg (Sharpe, 1963;
Blume, 1972; King, 1966; Rosenberg, 1974). It is now known that mean-
variance optimization remains numerically stable when we employ “reasonable”
multi-factor risk models and expected return forecasts. Therefore, Markowitz’
pioneering work came about 25 years too early to stave off controversies

around stability, and about thirty years too early to avoid questions about
whether MVO is truly optimal for elliptical densities. That said, the evolution
of the many theoretical and practical aspects of mean-variance optimization
(MVO) is undeniably intriguing, and we direct interested readers to references
such as Markowitz (1991), Bernstein (1993), Markowitz (2010), Markowitz
(1990), Rubinstein (2011), Fabozzi et al. (2007), and Kolm et al. (2014).
The outline of the article is as follows. In Section 2, we provide a brief review
of classical MVO, while also introducing some notation that will be used
throughout. Since Markowitz’s pioneering work, mean-variance approaches
have become ubiquitous across nearly every subfield of quantitative finance.
In Section 3, we argue that this widespread adoption can be attributed
to the universality of the mean-variance paradigm, wherein the maximum
expected utility and mean-variance allocations coincide for a broad range of
distributional assumptions of asset returns. After introducing the concept of
mean-variance equivalence, we contribute a new and general characterization
of distributions for which expected utility maximization and the solution of
an MVO problem are the same. In particular, we introduce a new family
of distributions, which we call MVE distributions, and prove the following
result: a distribution is mean-variance equivalent if and only if it belongs to
the family of MVE distributions. In Section 4, we shift our focus to common
myths surrounding MVO. These include beliefs that MVO requires normally-
distributed asset returns, cannot or should not be applied in cases where return
distributions are asymmetric, is an error maximizer, and fails to outperform
a 1/n portfolio out-of-sample. Additionally, there are misunderstandings
regarding MVO’s handling of signals at different horizons, transaction costs,
applicability for intraday and high-frequency trading, and whether quadratic
utility reflects investor preferences. While several of these myths have been
addressed in prior literature (see, for example, (Kinlaw et al., 2017, Ch. 3–7)),
we offer new insights and perspectives on them. We relegate mathematical
proofs to Appendix A, which readers can comfortably skip without losing
sight of the article’s main themes. Section 5 concludes.
2. Mean-Variance Optimization
We consider a market with n investable assets and let r denote the n-

dimensional random vector representing the one period asset returns over the
investment horizon [0, T ]. We do not assume that r follows a multivariate nor-
mal distribution, although in some later sections, we will impose restrictions
3

on its distributional form. For notational convenience, we reserve special
notation for the expected returns and covariance matrix of returns of r
µ := E[r] ∈ Rn , and Σ := V[r] ∈ Rn×n , (1)
which we assume exist.

In classical portfolio theory (Markowitz, 1952), a mean-variance investor
determines their portfolio holdings h = (h1 , . . . , hn )′ in units of dollars, or
whatever currency the investor is using, by solving the MVO problem
λ 2
max µp − σ , (2)
h∈C 2 p
where
µp := E[h′ r] = h′ µ and (3)

σp2 := V[h′ r] = h′ Σh , (4)
denote the ex ante portfolio return and variance respectively, C ⊆ Rn is the

set of constraints, λ > 0 is the investor’s risk aversion, and prime denotes
the transpose. Constraints may include long-only, trading, exposure and
risk management constraints (see, for example, Fabozzi et al. (2007) and
Kolm et al. (2014) for a discussion of common constraints used by portfolio
managers).
Throughout the article, we assume that there be no perfectly redundant
assets in the market.
Assumption 1. Our marketplace cannot contain any linear combination of

assets always returning zero in every state of the world.
This assumption can be made without loss of generality, since the problem of
how to allocate between redundant assets is trivial.
Bad forecasts, even when input into an excellent optimizer, should not be
expected to lead to favorable results. In that spirit, we want to focus on the
subset of the portfolios where the model of variances and covariances can be
trusted.
Assumption 2. We require that the covariance matrix Σ is such that h′ Σh

is an admissible forecast of ex ante portfolio variance for all h ∈ C.
If there is a non-trivial vector v ̸= 0 such that Σv = 0, then by extension

there is a long-short portfolio whose forecasted variance is zero. Existence
of such putative null vector would contradict our previous assumption of no
4

redundant assets, and in the absence of redundant assets, fails Assumption 2
(zero is not an admissible forecast of portfolio variance) if v falls within the
feasible set.
The unconstrained case is defined by the MVO problem (2) with C = Rn ,
and in this setting, Assumption 2 implies in particular that Σ−1 exists, and
moreover, that the optimization problem is strictly convex with the unique
solution given by the well-known formula
h∗ := (λΣ)−1 µ . (5)
Thus, in a world satisfying Assumption 2, unconstrained MVO is both as

simple, and as complicated, as understanding the inverse of a symmetric
positive-definite (SPD) matrix.
3. Universality
In the 75 years since Markowitz’ pioneering work, mean-variance approaches

have appeared in almost every subfield of quantitative finance. Almgren et al.
(1999) introduce the optimal order execution problem, treating the investor’s
positions at different intraday time-slices as independent sources of risk and
applying a mean-variance framework to determine the trade trajectory. The
dynamic (multi-period) portfolio choice model of Gârleanu et al. (2013) and
Gârleanu et al. (2016) and its extensions (Kolm et al., 2015; Moallemi et al.,
2017; Ma et al., 2019; Collin-Dufresne et al., 2020; Baldacci et al., 2021)
employ mean-variance and mean-quadratic variation formulation, resembling
optimal order liquidation but incorporating additional complexities such as
time-varying return predictions (“alpha”) and nonlinear price impact. Multi-
period portfolio optimization models were quite familiar to Markowitz, who
published several articles on the topic (Markowitz et al., 2011; Blay et al.,
2020). Methods of dynamic option hedging in the presence of transaction
costs can be thought of as minimizing the mean loss due to transaction costs,
while simultaneously minimizing variance introduced by imperfect replication
(Almgren et al., 2016; Bank et al., 2017; Aliaga-Diaz et al., 2020; Kolm et al.,
2019; Du et al., 2020; Fu et al., 2024). Other examples include high-frequency
trading with limit orders (Cartea et al., 2014), mean-reversion strategies
subject to nonlinear price impact (Ritter, 2017), and market making (El
Aoud et al., 2015; Bergault et al., 2021), which once again bear similarity
to Almgren et al. (1999) insofar as the objective is mean-variance across
intraday time-slices.
5

Therefore, it is of great interest to understand why the mean-variance
paradigm is so universal. The best such explanation available at the time
of writing seems to be that, under certain assumptions on the distribution
of asset returns that shall be detailed below, the optimal allocation is the
mean-variance allocation for any “reasonable” utility function. The framework
of modeling investor preferences via utility functions is very well established,
with seminal contributions due to Arrow (1971) and Pratt (1964). Indeed
the existence of a utility function can be derived from much more basic
assumptions concerning the investor’s preference relation, as we discuss next.
3.1. From Preferences to Utility Functions. The theory of portfolio

selection can be viewed within the broader framework of decision-making
under uncertainty. Selecting a portfolio is a special case of choosing from
among a class of “uncertain prospects.” One can think of probabilities as
objective or subjective. In the approach of von Neumann et al. (1947),
probabilities are objective and there is given some set Z whose elements are
are referred to as rewards or prizes. The choice set is effectively X = P, a
suitable space of probability distributions over Z, so when the agent chooses
an uncertain prospect x ∈ X they effectively lock themselves into a certain
probability distribution p over rewards. In the application to portfolio choice,
elements of Z correspond to possible values for the investor’s final wealth
at some future horizon, and the probability distribution of asset returns is
assumed to be objective and known to the agent.
The key result we need from the theory of rational choice is: for Z ⊂ R a
bounded interval, the three axioms of Preference, Substitution, and Continuity
(as formulated in the appendix) are equivalent to the existence of a continuous
function u : Z → R such that the investor always ranks lotteries according to
the value of E[u(z)], where the expectation is computed for each lottery with
respect to the lottery’s given probabilities. This holds both for continuous
and discrete distributions, under appropriate mathematical conditions on
the space of distributions as detailed in the appendix. To aid the reader in
navigating the extensive literature, the appendix provides the precise set of
axioms that give the desired expected-utility representation for preference
relations over probability measures on a bounded interval. Such measures
represent the probability distributions over the investor’s final wealth relevant
for portfolio optimization problems.
6

An investor whose choices satisfy the axioms of Preference, Substitution,
and Continuity is frequently called a “rational investor.” Informally, every
rational investor has a utility function and behaves as if maximizing expected
utility. This does not require that the investor does so knowingly, nor that
they are able to articulate said utility function.
The domain of the utility function represents the investor’s wealth level.
It could of course be applied to the wealth in a single account, while the
investor has other accounts. We denote the initial wealth level in the account
by w0 , and the wealth one period ahead, in the absence of transaction costs,
by
wT = w0 + h′ r , (6)
where h ∈ Rn are the holdings. We assume the investor has access to
effectively unlimited financing, allowing for negative wealth levels. Thus, the
domain of the utility function is R, encompassing both positive and negative
values.
Assumption 3. We refer to a utility function u : R → R as standard if it is
increasing, strictly concave, and continuously differentiable. Throughout this
article, we assume all utility functions are standard, unless explicitly noted
otherwise.
The properties required by Assumption 3 make economic sense. A utility
function must be increasing. Even the most philanthropic charity would
never optimize in such a way as to intentionally cause losses in the investment
portfolio of its endowment. Concavity implies to risk-aversion. If u were
allowed to be discontinuous, any point of discontinuity would represent a
particular level of wealth which is perceived by the investor as very different
from nearby wealth levels, even if the difference were by one penny. This is
not the case for most rational investors, and is nonsensical for investments in
financial markets as portfolio values constantly fluctuate.
3.2. Mean-Variance Equivalent Distributions. A quadratic function can

never satisfy Assumption 3 and so for all economically reasonable purposes,
“quadratic utility” is a misnomer. Nevertheless, the function we are truly
interested in, namely the expected utility of h
Z
û(h) := E[u(w0 + h r)] = u w0 + h′ r p(r)dr ,
′

(7)
may represent an integral that is not available in closed form due to the high
dimension n, the nonlinearity of u, and further nonlinearity of the probability
7

density function p(r). Therefore, it would be convenient if we could somehow
construct a quadratic function of h whose maximum is known to be located at
the precise vector h∗ which maximizes the expected utility (7). The associated
quadratic function, if it exists, is not a utility function, but is instead referred
to as a mean-variance form of the utility (or simply “mean-variance form” for
short). The existence of a mean-variance form having the same solution as
the original, often intractable, problem sounds almost too good to be true.
But we must remember that due to strong concavity of u, maximizing the
expected utility (7) is almost a convex optimization problem, and we can
exploit a certain symmetry in p(r) along with convexity.
Levy et al. (1979) show the effectiveness of what might be called large-E,
low-V solutions, where E is the expected wealth and V is the variance of
wealth. Later work of Kroll et al. (1984) focus on empirical studies in which
the goal is to identify some function f , if such function exists, such that
maximizing f (E, V ) yields almost as favorable results as maximizing expected
utility of wealth. Markowitz (1991) asserts
“Of the various approximations tried in Levy-Markowitz the
one which did best, almost without exception, was essentially
that suggested in Markowitz (1959), namely
f (E, V ) = u(E)+0.5 u′′ (E)V . ”
This choice of f (E, V ) is of course the Taylor series of the expected utility
(7) to second order.
In empirical work such as Kroll et al. (1984), we must concede some
possibility that these results could be an artifact of the particular dataset
that was chosen for the study. Nevertheless, their work inspires the question:
when is expected utility a function of mean and variance, and what can we
say concerning the shape of such function?
Definition 1. For two scalars E, V ∈ R, where V > 0, let L(E, V ) denote

the space of allocations under which
E[wT ] = E and V[wT ] = V . (8)
We say expected utility is a function of mean and variance if there exists

some function f (E, V ) such that
E[u(wT )] = f (E, V ) (9)
for all allocations in L(E, V ).

8

Definition 2. For n-dimensional random vectors possessing continuous
probability density functions (pdfs), an elliptical distribution is one whose
pdf on Rn takes the form
fn (x) = |Ω|−1/2 gn (x − µ)′ Ω−1 (x − µ) ,

(10)
where µ is the median vector, Ω is a positive definite matrix, |Ω| denotes the
determinant of Ω, and gn : R+ → R+ is a generator function (that does not
depend on µ and Ω).
We remark that µ is the mean vector and Ω is proportional to the covariance

matrix, if they exist. For the multivariate normal distribution, the most
well-known distribution within the elliptical family, we have
gn (s) = (2π)−n/2 exp(−s/2) . (11)
The elliptical class also encompasses several non-normal distributions, such

as the multivariate Student-t, symmetric multivariate stable distribution,
symmetric multivariate Laplace, and symmetric general hyperbolic distribu-
tions. These distributions exhibit heavy tails. For example, the multivariate
Student-t distribution with ν degrees of freedom has density of the form (10)
with
gn (s) ∝ (ν + s)−(n+ν)/2 . (12)
When ν = 1, we obtain the multivariate Cauchy distribution, although
not all of the remarks below apply to the Cauchy distribution due to the
non-existence of moments of order one and greater.
Proposition 1. If the distribution of r is elliptical, and if u is a standard

utility function, then expected utility is a function of mean and variance, and
∂ ∂
f (E, V ) ≥ 0 and f (E, V ) ≤ 0. (13)
∂E ∂V
Proposition 1 is due to Chamberlain (1983), whose interest in this problem,

and finance more broadly, was sparked by Michael Rothschild (Graham
et al., 2023). The form of this proposition is proven in Ingersoll (1987,
App. B of Ch. 4). Under the conditions of Proposition 1, let E be the level
of the expected return at the optimal expected-utility solution. Because
∂f (E, V )/∂V ≤ 0, we know that V must be the minimum variance for the
given expected return. In words, if expected utility is a function of mean
and variance (9), and if furthermore, the function f is increasing in E for all
9

V and decreasing in V for any E, then we can find the optimal solution by
MVO.
However, this result is stronger than we need. Next, we define a weaker
condition, that we call mean-variance equivalence. This condition is merely a
statement about the locations of the respective optima, rather than a broad
statement about the expected utility function associated with any possible
portfolio.
Definition 3. The underlying asset return distribution, p(r), is said to be

mean-variance equivalent (MVE) if, for any standard utility function u at
any initial wealth level w0 , there exists some constant κ > 0, such that
κ
h∗ := argmaxh E[u(wT )] = argmaxh E[wT ] − V[wT ] , (14)
2
where wT = w0 + h′ r.
We note that when expected utility is a function of the mean and variance,
then the underlying asset return distribution is also MVE. We contend that
a rational expected-utility-maximizer cares more about Definition 3 than
that of Definition 1. Particularly, a utility-maximizer cares about their
ability to find the exact solution to the expected utility problem by solving
an equivalent MVO problem, rather than concerning themselves with the
multitude of portfolios that represent sub-optimal expected-utility allocations.
The risk-aversion parameter κ typically depends on w0 and on the shape of
the function u, but in the situation of Definition 3, other nuances of u are not
needed to find the optimum. In order to apply the MVO formulation in (14),
the investor does not need to know everything about their utility function.
They only need to know enough to determine the appropriate κ given their
current wealth. In practice, determining κ entails solving the MVO problem
for all values of κ in some range, and selecting the optimal portfolio that is
compatible with the investor’s risk tolerance.
Then, which distributions are mean-variance equivalent? Tobin (1958)
conjectures that for n > 2, any two-parameter distribution is mean-variance
equivalent. But Feldstein (1969) provides a counterexample to Tobin’s
idea. Many distributions, including heavy-tailed distributions such as the
multivariate Student-t, are also mean-variance equivalent, but not all two-
parameter families.
To motivate the following, let us consider a random vector r of one period
asset returns from a multivariate normal distribution with mean µ and
10

Figure 1. A mean-variance equivalent distribution. Level curves from
the
two-dimensional mean-variance equivalent distribution p(x, y) ∝
2 2 (1−exp(−x2 /2))2
x exp(−x /2) + x3
· exp(−y 2 /2).
covariance Σ. Of course, we know that for this distribution, expected utility

is a function of mean and variance. We denote by Z the random variable
that is the return of the Markowitz portfolio, i.e.
r′ Σ−1 µ
Z := . (15)
µ′ Σ−1 µ
It is easy to see that the conditional expected return of r given Z is
E[r|Z] = Zµ . (16)
Now, if we let ε be the residual random vector of returns
ε := r − Zµ , (17)
then the return of the corresponding Markowitz portfolio is zero,

ε′ Σ−1 µ
= 0. (18)
µ′ Σ−1 µ
Moreover, we observe that all components of the residual return vector
are independent of the return of the Markowitz portfolio. This has an
interpretation in terms of portfolios. While it is always the case that a
portfolio h whose returns are uncorrelated with those the Markowitz portfolio,
has an expected return of zero (since h′ ΣΣ−1 µ = h′ µ = 0), we see that
for the normal distribution such a portfolio must have zero expected return
conditional on the return of the Markowitz portfolio.
Remarkably, it turns out that the existence of a stochastic representation
of similar to the form above fully characterizes mean-variance equivalent
distributions as per Definition 4. We formulate this new result next, and
refer to the appendix for a formal proof.
11

We consider distributions of n-dimensional random vectors r with the
stochastic representation
r = µZ + ε , (19)
where µ ∈ Rn , Z is a scalar random variable, and ε ∈ Rn is a random vector
such that
E[Z] = 1 , (20)
ε′ Σ−1 µ = 0 , (21)
E[ε | Z] = 0 , (22)
V[ε] = Σ − V[Z]µµ′ , (23)
where Σ is an n-by-n SPD matrix.

We note that condition (23) is normalization to ensure that the covariance
of r is Σ.
Proposition 2. A distribution is mean-variance equivalent if and only if it

has the stochastic representation (19) that satisfies the conditions (20)–(23).
To the best of our knowledge, this proposition is the first to characterize

mean-variance equivalence as per Definition 3. This new result motivates the
following definition.
Definition 4. Distributions with the stochastic representation (19) are said

to be mean-variance equivalent distributions.
Exhibit 1 presents an example of a MVE-distribution. In a forthcoming

article (Benveniste et al., 2024), we examine a number of properties of MVE-
distributions and present several implications for portfolio optimization. It is
important to highlight that the main result in the recent work by Schuhmacher
et al. (2021) and Auer et al. (2023) differs significantly from Proposition 2
above. In particular, they establish that the return distribution of portfolios
that include the risk-free asset is determined by its mean and variance (as per
Definition 1 )if and only if asset returns follow a skew-elliptical generalized
location and scale (SEGLS) distribution. Proposition 2 is more general and
establishes mean-variance equivalence (as per Definition 3) for portfolios of
any assets that are MVE-distributed.
Clearly, the above results disprove the prevailing belief that MVO cannot or
should not be applied in asymmetric cases. In particular, these new findings
shed fresh light on the universality of MVO, beyond that of Chamberlain’s
characterization (symmetric distributions) in Proposition 1. The reader may
12

have noticed that the vector r could represent returns of different assets
on the same time interval, or returns of the same asset over different time
intervals, or a combination of the two. As long as the other assumptions of
the proposition are not broken, it immediately extends to the discrete time
versions of the many multi-period problems discussed earlier.
This completes a chain of ideas leading to Markowitz’ famous result and
its universality:
(i) the axioms of Preference, Substitution, and Continuity lead to ex-
pected utility via rational choice theory by Proposition 3, and
(ii) expected utility leads to MVO for all MVE distributions by Proposi-
tion 2.
4. Myths and Facts
In order to be wrong, a statement must first be formulated precisely

enough that it becomes falsifiable. Rigor and clarity are so important, that
research neglecting them altogether commands a lower status than ideas
which were duly formulated, studied, and falsified by the scientific method or
by mathematical proofs. Rejecting some hypotheses while finding evidence
for others is part of the scientific method. Generating vague or ambiguous
theories is not. Several of the memes that we shall unmask as myths in the
present section, such as “mean-variance optimization is error maximizing” are,
paraphrasing physicist Wolfgang Pauli, not even wrong (Peierls, 1960). Our
contribution will be to first formulate such myths rigorously enough that they
can be falsified, and then, to falsify them or to ascertain some grain of truth.
Myth 1. MVO requires normally-distributed asset returns. MVO cannot or

should not be applied in cases where return distributions are asymmetric.
Proposition 1 debunks a prevalent misconception regarding MVO, which

falsely links it to the assumption of a normal distribution for asset returns.
At most, one could argue that an MVO user is implicitly assuming asset
returns are MVE-distributed (Definition 4), a large class of asymmetric
distributions which includes all elliptical families (which in turn are symmetric
distributions).
Myth 2. MVO is an error maximizer. MVO is not robust.

Any optimization method can only be as good as its objective function. As-
sumption 2 demands that when we ask our optimizers to minimize predicted
13

variance, the predictions we are using simply must be good predictions, or
at least economically reasonable. If we consider variance forecasts of the
form h′ Σh as in (4), then not all SPD matrices Σ are admissible. Specif-
ically, not all SPD matrices can satisfy the more stringent requirement of
always providing reasonable variance forecasts for all portfolios satisfying
the constraints. For instance, when n is comparable to the size of the stock
market, the widely recognized historical sample covariance estimator is rarely
admissible for optimization, as we will illustrate below. This does not imply
that MVO itself is not robust. It only implies the inadmissibility of the
historical sample covariance as an estimator, for the purposes of computing
the optimal portfolio.
Consider covariance matrix estimation as a statistical procedure which has
n(n + 1)/2 free parameters corresponding to the independent elements in a
symmetric matrix. With k historical periods, we have nk data points, hence
2k/(n + 1) data points per parameter. Thus k = n + 1 is the threshold to
have two data points per parameter, implying an extreme loss of precision
in the parameter estimation procedure. Practically, with n = 3000 we need
about twelve years of daily returns to get two data points per parameter!
We can gain a clearer understanding of the inadmissibility of the sample
covariance estimator by examining the rank of the matrix. Let R be a
k × n matrix having the stock return time series as columns. De-mean each
column of R and rescale by 1/(k − 1) so that the sample covariance matrix is
R′ R. Due to its inadmissibility as an estimator for the portfolio optimization
problem, we shall refer to R′ R as the naive estimator. We have
rank(R′ R) = rank(R) ≤ min(n, k) , (24)
so if k < n it is impossible that R′ R or Σ is invertible. Again with n = 3000,

and assuming a market with 250 trading days per year, we need twelve
years to get k = 3000 and hence to cross the threshold where Σ−1 begins to
exist. However, sample covariance matrices based on k slightly larger than
n continue to fail Assumption 2 due to their condition number, as is shown
below.
By our above assumptions, Σ is an n-by-n SPD matrix, thus allowing us
to express it as
Xn
Σ = QΛQ′ = λi qi q′i , (25)
i=1
14

where Λ := diag(λ1 , . . . , λn ) is a diagonal matrix of real non-negative eigen-
values, and the columns of Q form an orthonormal set of eigenvectors.
The order is not important, so without loss of generality we may assume
λ1 ≥ λ2 ≥ . . . ≥ λn ≥ 0.
If λn = 0, then eigenvector qn is a long-short portfolio with predicted vari-
ance equal to zero. In the absence of redundant assets, such a prediction is not
economically reasonable and fails Assumption 2. Henceforth, we consider the
case where λn > 0, ensuring the existence of Σ−1 = QΛ−1 Q′ and, therefore,
the existence of the unconstrained Markowitz portfolio. The eigenvectors of
Σ−1 are the same as the eigenvectors of Σ, while the eigenvalues of Σ−1 are
1/λi for all i. The largest eigenvalue of Σ−1 is 1/λn , the reciprocal of the
smallest eigenvalue of Σ. The condition number of Σ−1 is
cond(Σ−1 ) = λ1 /λn . (26)
The condition number of a matrix is always at least one, but a matrix is

said to be ill-conditioned if the condition number is much larger than one.
Informally, as a common rule of thumb used by statisticians in regression
analysis, if the condition number is greater than thirty, then multicollinearity
and identifiability become important concerns, and parameter inference is
called into question. As discussed above, if k < n then the naive estimator
R′ R is not invertible, but once we cross the threshold of k > n, it tends to
happen that the smallest eigenvalue λn crosses from 0 to a tiny number, while
λ1 remains bounded away from zero. Hence the condition number λ1 /λn
crosses over from ∞ to an astronomically large number as k crosses from just
below n to just over n (cf. Laloux et al. (1999) and Fan et al. (2008)).
The qi ’s are the normalized principal component portfolios, so that ∥qi ∥ = 1
for all i. The condition number (26) is the ratio of the largest and smallest
principal component portfolio variances. What does Assumption 2 tell us
about this ratio? Let us frame the question in trading language. Let us
consider two portfolios, A and B, that are uncorrelated with each other.
Both portfolios have the sum of their squared holdings equal to one. The
trader asks the risk department to forecast the variance of each, and the
risk department forecasts variance λA , λB respectively, with λB < λA . How
high can the ratio λA /λB become, before the trader responds incredulously
to the risk department that their forecasts cannot possibly make economic
sense, regardless of the actual portfolios’ contents? This would represent
the maximum economically reasonable condition number, according to this
15

trader. More generally, any trader should be suspicious that a so-called
“perfect hedge” will always continue to work out as such in the future, and
hence should demand of their risk departments to use forecasting methods
that do not predict perfect hedges among companies that are truly distinct
businesses. Therefore, the practical implication of Assumption 2 is that
the condition number should be bounded by the maximum λA /λB deemed
reasonable by any conservative, risk-averse trader.
In summary, matrices with high condition number fail Assumption 2
and for n the size of the equity market, the naive historical estimator will
almost always have high condition number. The procedure of MVO is a
robust procedure, but the naive historical covariance estimator is inadmissible
as an estimator for the purposes of optimization, and should not be used.
Error maximization could occur, for example, if λ1 /λn were huge, and if the
expected return vector µ has a nonzero component in the direction qn , then
that component would get multiplied by the huge factor 1/λn when computing
Σ−1 µ (see, also the discussions in Best et al. (1991) and Palczewski et al.
(2014). However, this form of error maximization would only arise if we
attempt to apply MVO to a covariance matrix estimator that Assumption 2
would rule out.
If the historical covariance R′ R is a naive estimator for optimization,
what then is a proper, admissible estimator? Fortunately, there are readily
available covariance matrix estimators which are admissible for MVO in the
form of factor models. Let ri denote the ex ante return of the i-th asset in the
market, as before. Instead of allowing very complex forms for the multivariate
distribution p(r), representations based on arbitrage pricing theory (APT)
(Ross, 1976; Roll et al., 1980) constrain the distribution’s complexity by
assuming a linear structure such that the return in excess of the risk-free rate
of the i-th asset is given by
ri = xi,1 f1 + xi,2 f2 + · · · + xi,p fp + εi , εi ∼ N (0, σi2 ) , (27)
where fj , j = 1, . . . , p, are factors, xi,j denote the j-th factor loading of the
i-th asset, and εi and σi are the residual return (or idiosyncratic return)
and idiosyncratic volatility of the i-th asset, respectively. We assume (27)
represents a strict factor model, such that the residual returns are mutually
uncorrelated with each other and across time, as well as uncorrelated with
all factors.
16

Jointly estimating all loadings xi,j and factors fj is challenging, so typically,
one assumes that one of them is known and calculates the other through
regression. In what follows, we focus on the Rosenberg approach (Rosenberg,
1974) where the loadings xi,j are taken to be known (or exogenous), and
the factors fj are hidden (also known as latent) variables. If we denote by
f := (f1 , . . . , fp )′ the random factor process, then we can succinctly express
model (27) as
r = Xf + ε, ε ∼ N (0, D) , (28)
where X is the matrix with the loadings xi,j as elements, and
D := diag(σ12 , . . . , σn2 ) , (29)
is the diagonal matrix of the variances of the residual returns.

Since one cannot directly observe the f -process, we must obtain information
about it through statistical inference. We assume that the f -process has finite
first and second moments given by
µf := E[f ] ∈ Rp , and F := V[f ] ∈ Rp×p . (30)
The model (28)–(30) entails associated reductions of the first and second
moments of the asset returns
µ = Xµf , and Σ = D + XFX′ . (31)
In the usual case p ≪ n, this entails a substantial dimension reduction over

the naive model in which all of the n(n + 1)/2 elements of Σ are treated as
independent parameters. Indeed, the two moments in (30) together with the
volatilities from (29) entail a total of n + p + p(p + 1)/2 parameters. In an
industrial example, the Barra USE4 model is of the above form, and uses
p = 72 in the form of twelve style factors and sixty industry factors. Hence,
with n = 3, 000 the naive model entails
n(n + 1)/2 = 4, 501, 500 parameters , (32)
while the p = 72 model entails
n + p + p(p + 1)/2 = 5, 700 parameters . (33)
Equation (31) implies that Σ is in a diagonal plus low-rank form, which is

quite useful for portfolio construction and analyzing existing portfolios. For
example, for a portfolio with dollar holdings h ∈ Rn , from (31) we have
h′ Σh = h′ Dh + h′ XFX′ h , (34)
17

which expresses the portfolio’s variance using the idiosyncratic variance
Pn
h′ Dh = 2 2
i=1 hi σi , and a second term which depends on the portfolio
through the portfolio exposure vector X′ h.
Recall the Woodbury matrix-inversion lemma,
−1
(A + UCV)−1 = A−1 − A−1 U C−1 + VA−1 U VA−1 , (35)
where A, U, C and V all denote matrices of the correct (conformable) sizes.

The lemma holds in the case when all of the matrix inverses in (35) exist.
Applying (35) with A := D, U := X, V := X′ and C := F, we obtain
−1 ′ −1
Σ−1 = D−1 − D−1 X F−1 + X′ D−1 X XD . (36)
The right hand side of (36) is computationally efficient. Note that D−1
merely involves n scalar reciprocals of the idiosyncratic variances σi2 , and
hence calculating D−1 is O(n) and not O(n3 ) like standard matrix inversion
algorithms. The remaining inverses in (36) all involve p × p matrices where
p ≪ n.
We can use (36) to obtain a strong lower bound on eigenvalues of Σ−1 ,
and hence an upper bound on the condition number. Hence, one of the main
issues with covariance matrices and optimization – namely, the existence
of portfolios with very low forecasted volatility – does not occur when the
covariance matrix comes from a reasonably-constructed APT model. In some
sense, factor models are the perfect complement to Markowitz optimization.
One of the key assumptions of the Rosenberg approach is that the factor
loadings xi,j represent exposures to common sources of risk. For example,
many stocks have exposure (either positive or negative) to the prices of energy
commodities because they are net producers or consumers of energy, so one
could argue that the price of a bundle of energy represents a common source
of risk. In contrast to this, top stock analysts conduct in-depth analyses of
the companies they cover, often making predictions that cannot be attributed
to any common factor. In so doing, they are effectively predicting the next
realization of the residual εi from (27). More generally, one could take any
expected return prediction vector µ and remove the component spanned by
X, thus forming
α := PX⊥ µ := (I − XX+ )µ. (37)
where X+ is the pseudo-inverse and PX⊥ denotes the orthogonal projection
onto the kernel of X′ . Form an augmented matrix Xα , which has α as the
first column, and X in the remaining columns. If we regress a cross section
18

r of realized asset returns on Xα , the coefficient of the α column can be
interpreted as the returns on a specific MVO portfolio, so MVO portfolio
returns are equivalent to alpha factor returns in cross-sectional asset return
models (Ritter, 2016; Raponi et al., 2023). Consequently, any instabilities are
the same. Statisticians are comfortable with the instabilities in regression,
and know how to control them.
Myth 3. MVO fails to outperform a 1/n portfolio out-of-sample.
It is hardly surprising that this and the previous myth have sparked significant
controversy and intrigue among practitioners and academics alike. The 1/n
portfolio “debate” is extensively covered in a number of articles, so we will
keep our discussion brief, focusing primarily on the main ideas.
In this battle of ideas, we playfully dub the opposing sides as “team-1/n”
and “team-MVO,” respectively. On the team-1/n side of the debate, several
studies suggest that MVO fails to outperform basic portfolio rules such as
equal weights in out-of-sample tests (see, for example, Jobson et al. (1981),
Jorion (1985), and DeMiguel et al. (2009)). In contrast, on team-MVO’s side,
research such as Kinlaw et al. (2017, Ch. 7), Kritzman et al. (2010) and Allen
et al. (2019) disprove these findings.
So how can the findings of these two teams be so different? We argue
that the majority of team-1/n studies arrive at their conclusions primarily
due to the reliance on using sample estimates of historical returns and
covariances as inputs in MVO. As we saw earlier, the sample covariance
estimator does not satisfy Assumption 2. Moreover, historical return-based
estimates are notoriously inadequate predictors of expected returns (Merton,
1980; Campbell et al., 2008). Rather, it is crucial to incorporate anomalies,
style and risk factors, and other sources of information beyond historical
returns when estimating expected return (Kelly et al., 2013; Pettenuzzo
et al., 2014; Gu et al., 2020). In contrast to those of team-1/n, studies by
team-MVO demonstrate that investors endowed with some forecasting ability
benefit from using MVO in constructing their portfolios, and outperform 1/n
portfolios and other basic portfolio rules.
To implement MVO in practice, portfolio managers need an admissible
covariance matrix and reasonable estimates of expected returns. Interestingly,
Markowitz himself did not address this issue. In his seminal article, Markowitz
(1952) begins with the declaration
19

“The process of selecting a portfolio may be divided into two
stages. The first stage starts with observation and experience
and ends with beliefs about the future performances of available
securities. The second stage starts with the relevant beliefs
about future performances and ends with the choice of portfolio.
This paper is concerned with the second stage.”
However, on page 91 of the same article, Markowitz writes
“...we must have procedures for finding reasonable µi and σij .
These procedures, I believe, should combine statistical tech-
niques and the judgment of practical men.”
“One suggestion as to tentative µi , σij is to use the observed
µi , σij for some period of the past. I believe that better meth-
ods, which take into account more information, can be found.”
Markowitz advice is clear: construct forecasts that incorporate information

beyond historical samples of returns. But when queried about how to con-
struct these, his retort was: “That’s your job, not mine.” (Sexauer et al.,
2024)
Beyond their simplicity and explainability, 1/n portfolios present several
practical issues. First, they employ a “one-size-fits-all” approach, ignoring
investors’ risk tolerances. Second, rebalancing such portfolios involves selling
winners and buying losers, a strategy that may yield profits in mean-reverting
markets but can lead to losses in trending environments. Third, a 1/n
portfolio fails to account for liquidity and trading frictions, such as price
impact and cost to borrow.
Under what circumstances would a 1/n portfolio actually be optimal?
We mention two situations. From equation (5), it is evident that when the
expected returns are given by µ1/n := nλ Σe, where e := (1, . . . , 1)′ ∈ Rn ,
the 1/n portfolio is optimal. This is quite a special case that is unlikely
in practice. Another case where the 1/n portfolio emerges as the optimal
strategy is for an investor who (i) is faced with ambiguous asset return
distributions and, (ii) adopts a worst-case approach considering all measures
within an ambiguity set. That said, such an investor’s preferences represent
a most pessimistic view of the world, one which cannot be formulated by
standard utility functions that satisfy Assumption 3.
Myth 4. Quadratic utility does not reflect investor preferences.

20

As discussed above, the properties required by Assumption 3 make economic
sense, at least for utility functions of wealth that describe choices among
different portfolios of financial assets. We shall not consider utility theory
as applied to life-and-death choices or other such extreme examples. A
quadratic function does not satisfy Assumption 3 and so for all economically
reasonable purposes, “quadratic utility” is a misnomer as already stated above.
Nevertheless, the function we are truly interested in, namely the expected
utility (7), may be such that its maximum is located at the same portfolio as
a certain mean-variance form. The mean-variance form is not itself a utility
function in any economic sense.
Myth 5. MVO does not handle signals at different horizons, or transaction

costs.
Baldacci et al. (2021) study the multi-horizon version of the mean-variance

objective, which is known as mean-quadratic-variation (MQV). The authors
show that for any initial portfolio h0 ∈ Rn , there is a unique solution to the
optimization problem
Z ∞
max E L(ht , ḣt ) dt , (38)
h∈A; h(0)=h0 0
with
1 ′ κ
L(ht , ḣt ) := h′t µt − ḣt Λḣt − h′t Σht , (39)
2 2
where A is a large space containing admissible stochastic processes represent-
ing a diverse range of trading strategies1, and a dot above a variable denotes
d
the time derivative ḣt := dt ht . The first two terms in (39) represent the
expected portfolio return net of cost, while the third term is the quadratic
variation. The above formulation generalizes the work of Gârleanu et al.
(2016), who also use a mean-quadratic-variation approach to generalize their
earlier discrete time model (Gârleanu et al., 2013) to continuous time.
We emphasize that the maximization in (38) is not simply over static,
predetermined trading trajectories, but over the substantially larger class of
all admissible stochastic processes. A wide variety of optimal trading and
optimal execution models with alphas and/or price impact are special cases of
this model. For example, the continuous-time model of Almgren et al. (1999)
1Mathematically, the stochastic processes they consider are square integrable and adapted
to the filtration F = {Ft : t ≥ 0}, representing the available information at time t. This
space encompasses a wide range of processes, including non-smooth and non-Markovian
ones.
21

is the special case with zero alpha. The model of Gârleanu et al. (2016) is
a special case where the return-predicting factors follow a Markovian jump
diffusion.
Baldacci et al. (2021) show that the unique solution to (38) satisfies the
stochastically-forced ODE system
ḣt = −Γht + bt , (40)
where
Γ := (κΛ−1 Σ)1/2 , (41)

Z ∞
exp Γ(t − s) Λ−1 Et [µs ] ds .

bt := (42)
t
Given that the model (38)–(39) maximizes a mean-quadratic-variation
objective, it should not be surprising that the process bt from (42) is related
in a straightforward way to an integral of expected future Markowitz portfolios
Z ∞
Γ−1 bt = Γ

exp Γ(t − s) Et [Ms ] ds , (43)
t
where Ms := (κΣ)−1 µs . (44)
Following Gârleanu et al. (2016), we refer to (43) as the aim portfolio, because
by (40), at optimality, the trading process ḣt is aiming towards it. Hence
the optimal trade aims towards an integral of future expected Markowitz
portfolios. With fond recollections of Markowitz’s work on multi-period
portfolio models (Markowitz et al., 2011; Blay et al., 2020), we hope that, if
he were still here, he would look upon equations (43) and (44) with approval.
Myth 6. MVO is not suitable for intraday and high-frequency trading.
We distinguish between ultra-high-frequency trading (UHFT) and everything

else. There is a class of trading which attempts to render markets more
efficient by removing pure arbitrages created by simple physics such as the
speed of light between cities. As such, UHFT strategies do not represent risk-
taking in the usual sense as described by modern portfolio theory. Hardware
or communication problem aside, such trading is effectively riskless. While
important to ensure markets continue to function smoothly and efficiently,
this form of trading is a niche.
Once we look beyond the niche of UHFT latency arbitrage, we enter the
realm of strategies that are intraday, with holding periods of minutes to hours,
or longer. For strategies at these slightly longer timescales, we can use the
22

mean-quadratic-variation model (38) to determine a continuous-time trading
path by solving (40). A given strategy can then use individual orders, which
are necessarily discrete, to minimize deviations from the continuous-time
optimal path.
The model of Baldacci et al. (2021) admits an an explicit form for the
value function, namely the following quadratic value function
Z ∞
1
Vs (p) := sup E L(ht , ḣt ) dt = − p′ ΛΓp + p′ bs + Cs , (45)
h∈A; h(s)=p s 2
where Cs is a constant. Economically, the number Vs (p) represents the
value to a rational MQV-maximizer at time s of the arbitrary portfolio p.
This directly leads to a practically implementable algorithm for applying
continuous-time MQV in a trading strategy. If trading were truly possible in
continuous time, we would simply trade to ht that solves (40). For simplicity,
let us denote the current time by t = 0 in the following. Since real trading is
discrete, we choose an interval of length τ minutes, over which the next wave
of child orders will execute (say, 30 minutes). We observe that
Z ∞ Z τ Z ∞
max E L(ht , ḣt ) dt = max E L(ht , ḣt ) dt + E L(ht , ḣt ) dt .
0 0 τ
(46)
Since any sub-part of an optimal path is optimal with respect to its endpoints,
we can replace the second term in (46) with Vτ (ht ) for optimal paths. Let
δ be the list of orders we intend to trade over the first τ minutes. The first
term of (46) then contains our expected costs, short-term alpha, and risk
for the trade list δ, while the second term is the value function Vτ (h0 + δ).
Therefore, we obtain the trade list by solving

maxn f (δ) + Et=0 [Vτ (h0 + δ)] , (47)
δ∈R
where f (δ) is an appropriate representation of the expected cost, alpha, and

risk over the short interval [0, τ ]. The number of decision variables in (47)
is equal to the number of assets, which makes this formulation significantly
more attractive than standard approaches that work by discretizing time
into many sub-intervals and suffering from a corresponding “blowup” in the
number of decision variables. Colloquially, we refer to this formulation as the
“from here to eternity” method since the second term in (46) extends from
our intended target to infinity.
23

5. Conclusion
Since Markowitz’s pioneering work on mean-variance optimization (MVO),

mean-variance approaches have permeated nearly every facet of quantitative
finance. In this article, we argued that their widespread adoption can be
attributed to the universality of the mean-variance paradigm, wherein the
maximum expected utility and mean-variance allocations coincide for a broad
range of distributional assumptions of asset returns.
We introduced a formal definition of mean-variance equivalence and pro-
vided a novel and comprehensive characterization of distributions, termed
mean-variance-equivalent (MVE) distributions, wherein expected utility max-
imization and the solution of an MVO problem are the same.
In the second part of the article, we shifted focus toward common myths
associated with MVO. The myths we debunked include the misconception
that MVO necessitates normally-distributed asset returns, the belief that it
is unsuitable for cases with asymmetric return distributions, the notion that
it maximizes errors, and the perception that it underperforms a simple 1/n
portfolio in out-of-sample tests. In addition, we addressed misunderstandings
regarding MVO’s ability to handle signals across different time horizons,
its treatment of transaction costs, its applicability to intraday and high-
frequency trading, and whether quadratic utility accurately represents investor
preferences.
While Harry Markowitz will forever be remembered for his contributions
to financial economics and investment management, he was first and foremost
a philosopher. Descartes and Hume were two of his favorite philosophers. In
an interview, he shared
“(But) the greatest philosopher ever, in my mind, is Aristotle.
He spoke of eudaimonia, which means living a good life, so
that people think well of you after you are dead.” (Powell,
2023)
In conclusion, we continue to be deeply inspired by Harry Markowitz’s
contributions, which have left an indelible mark on us and countless others.
We believe that by prioritizing the cultivation of virtues, the pursuit of
meaningful goals, and the promotion of well-being, we can lead lives full of
fulfillment and purpose, echoing his remarkable legacy.
Appendix A. Mathematical Proofs
In this appendix, we provide the mathematical details of the article.

24

A.1. Representing Preferences through Utility Functions. This sec-
tion serves to justify the assertions made in the main text, concerning rational
investors and the existence of a utility function. Our favorite reference for this
is Kreps (1988); see also Jonathan Levin’s Stanford lectures.2 In the theory of
rational choice, with or without uncertainty, a decision-maker (or agent) must
decide between elements of a set X. The agent’s choices are modeled by means
of a relation ≻, where x ≻ y means that the agent definitely prefers x to y.
The relation is called a preference relation if it satisfies x ≻ y =⇒ y ̸≻ x
(asymmetry) and x ≻ z =⇒ ∀y, x ≻ y or y ≻ z (negative transitivity).
If X is countable, then a binary relation ≻ is a preference relation if and
only if there is a function p̃ : X → R, which we call a preference function to
distinguish it from the utility functions to come later, such that
x ≻ y ⇐⇒ p̃(x) > p̃(y). (48)
Also, if X is any subset of a separable metric space, and ≻ is a continuous

preference relation, then once again there exists a function p̃ : X → R such
that (48) holds (Kreps, 1988, Prop. 3.3, Thm. 3.7).
This is about as abstract and as general as one can get; the elements of
X could represent almost any conceivable kind of choice. To apply abstract
rational choice theory to decision-making under uncertainty, we need to
introduce probabilities in some way, and to think of an element x ∈ X as an
“uncertain prospect.” One can think of probabilities as objective or subjective.
In the approach of von Neumann et al. (1947), probabilities are objective
and there is given some set Z whose elements are referred to as rewards or
prizes. The choice set is effectively X = P, a suitable space of probability
distributions over Z, so when the agent chooses an uncertain prospect x ∈ X,
they effectively lock themselves into a specific probability distribution p over
the rewards. In the application to portfolio choice, elements of Z correspond
to possible values for the investor’s final wealth at some future horizon, and
the probability distribution of asset returns is assumed to be objective and
known to the agent. We shall discuss models in which Z is finite, or models
in which Z is a continuous subset of R, with the finite case being simpler
mathematically.
First assume Z ⊂ R is finite, and P is the space of all probability distribu-
tions on Z, which is just the standard simplex. Consider the following three
axioms (5.1 - 5.3 from Kreps (1988)):
2Available at: https://web.stanford.edu/~jdlevin/teaching.html.
25

Axiom 1 (Preference). The binary relation ≻ is a preference relation.
Axiom 2 (Substitution). For all p, q, r ∈ P and a ∈ [0, 1),
p ≻ q =⇒ ap + (1 − a)r ≻ aq + (1 − a)r , (49)
Axiom 3 (Continuity). For all p, q, r ∈ P, if p ≻ q ≻ r, then there exist

a, b ∈ (0, 1) such that
ap + (1 − a)r ≻ q ≻ bp + (1 − b)r . (50)
The von Neumann-Morgenstern (NM) theorem states that a binary relation

on P satisfies Preference, Substitution, and Continuity if and only if there
exists a function u : Z → R such that
p ≻ q ⇐⇒ Ep [u(z)] > Eq [u(z)] , (51)

P
where Ep [u(z)] is defined as z∈Z p(z)u(z). Hence, while Axiom 1 implies the
existence of a preference function, the NM-theorem identifies the preference
function as expected utility of the reward.
To represent wealth levels, we are particularly interested in the case where
Z ⊂ R is a bounded, closed interval which we consider to contain all practical
portfolio selection problems. Let P be the space of all Borel probability
measures on Z. Full mathematical details addressing general probability
measures can be found in Fishburn (1979, Ch. 10). However, for bounded
intervals, the problem simplifies significantly. We only need to replace Axiom 3
with weak continuity, a notion which is covered in any standard graduate-level
probability course.
Axiom 4 (Weak Continuity). Relation ≻ is continuous in the weak topology.
The version of the continuous NM-theorem that is arguably the most

relevant to modeling of portfolio selection is the following.
Proposition 3. Let Z = [−M, N ] ⊂ R be a bounded, closed interval and

P the space of Borel probability measures on Z. A binary relation ≻ on P
satisfies Axiom 1, Axiom 2, Axiom 4 if and only if there exists a continuous
function u : Z → R such that
p ≻ q ⇐⇒ Ep [u(z)] > Eq [u(z)]. (52)
We emphasize that Proposition 3 is a direct consequence of the theorems

in Fishburn (1979, Ch. 10) and Kreps (1988, Cor. 5.22).
26

In summary, it is perhaps not surprising that expected-utility theory is,
paraphrasing Jonathan Levin, “the workhorse of modern economics.” The role
played by expected-utility as the preference-function for uncertain prospects
is equivalent to Preference, Substitution, and Continuity in both continuous
and discrete contexts.
A.2. Proof of Proposition 2. Before proving the proposition, as a sanity

check, we verify that elliptical distributions satisfy the conditions of the
proposition. Let r be an n-dimensional elliptically distributed random variable.
Note that the class of MVE distributions, characterized by (19) and (20)–(23),
is closed under changes of variables given by invertible linear transformations.
Without loss of generality, then, we can assume that µ = e1 (the first standard
basis vector) and Σ = I. Then
r = e1 + RU , (53)
where U is uniformly distributed on the sphere S n−1 , and R is a positive

random variable independent of U. Setting
Z := e′1 r , (54)
ε := r − Ze1 , (55)
we clearly have E[Z] = 1 and e′1 ε = 0, so conditions (20) and (21) are satisfied.
To verify condition (22), by writing U = (U1 , . . . , Un )′ , we have that
Z = 1 + RU1 , (56)
′
ε = (0, RU2 , . . . , RUn ) . (57)
The conclusion now follows from
E[Uj |Z] = E[Uj |U1 ] = 0 , (58)
for j = 2, . . . , n.
Now, we turn to the proof of Proposition 2. In the following, we denote by
π the probability distribution of random vectors r that satisfy (19)–(22) of
the proposition.
Proof. (“if”): Let h∗ := argmaxh E u(h′ r) . We can decompose h∗ into a

component along the Markowitz portfolio and an orthogonal component q

(in the inner product given by Σ) such that
h∗ = cΣ−1 µ + q, (59)
27

with q′ µ = 0. Then, we have
h ′ i
E[u(h′ r)] = E u cΣ−1 µ + q (µZ + ε) (60)
= E u cµ′ Σ−1 µZ + q′ ε

(61)
= E E u cµ′ Σ−1 µZ + q′ ε Z

(62)
≤ E u E cµ′ Σ−1 µZ + q′ ε Z

(63)
= E u cµ′ Σ−1 µZ

(64)
= E u (cΣ−1 µ)′ r ,

(65)
where we used Jensen’s inequality in (63). This shows that the expected
utility of wealth of the portfolio h∗ is at most that of the mean-variance
portfolio, as desired.
(“only if”): Let h∗ := Σ−1 µ. By assumption, for every utility function u
there is a scalar c such that
c h∗ = argmaxh E u(h′ r) .

(66)
The first-order conditions of (66) imply

Z
ru′ c (h∗ )′ r dπ(r) = 0 .

(67)
Applying this to the family of utility functions − exp(−x) + bx, b > 0, we

observe that for every b, there is a c such that
Z
r exp − c (h∗ )′ r dπ(r) + bµ = 0 .

(68)
By varying b, it follows that for uncountably many c, the vector

Z
r exp − c (h∗ )′ r dπ(r)

(69)
is proportional to µ. In other words, for uncountably many c, for any h such

that h′ µ = 0, we obtain
Z
fh (c) := h′ r exp − c (h∗ )′ r dπ(r) = 0 .

(70)
fh (c) is an analytic function in c. Therefore, by the identity theorem for

analytic functions, this implies that fh (c) = 0 for all c in its domain in C,
which includes the strip {−δ < ℜ(c) < δ}, δ > 0. In particular, this implies
that for every t ∈ R, and for any h such that h′ µ = 0,
Z
h′ r exp it(h∗ )′ r dπ(r) = 0 .

(71)
28

Therefore, by standard arguments, we have
E[h′ r|(h∗ )′ r] = 0 . (72)
Let us define
(h∗ )′ r
Z := , (73)
E [(h∗ )′ r]
ε := r − µZ , (74)
α′ µ ∗
hα := α − h , (75)
(h∗ )′ µ
where α is an arbitrary vector. Then, (h∗ )′ ε = 0 and α′ ε = h′α r. Since
(hα )′ µ = 0, we see that
E[α′ r|Z] = 0 . (76)
Therefore, as (76) is true for any vector α, we obtain
E[ε|Z] = 0 , (77)
which completes the proof. □
References
Aliaga-Diaz, R., G. Renzi-Ricci, A. Daga, and H. Ahluwalia (2020). “Portfolio

Optimization with Active, Passive, and Factors: Removing the Ad Hoc
Step”. In: The Journal of Portfolio Management 46.4, pp. 39–51.
Allen, D., C. Lizieri, and S. Satchell (2019). “In Defense of Portfolio Opti-
mization: What If We Can Forecast?” In: Financial Analysts Journal 75.3,
pp. 20–38.
Almgren, R. and N. Chriss (1999). “Value under Liquidation”. In: Risk 12.12,
pp. 61–63.
Almgren, R. and T. M. Li (2016). “Option Hedging with Smooth Market
Impact”. In: Market Microstructure and Liquidity 2.01, p. 1650002.
Arrow, K. J. (1971). Essays in the Theory of Risk-Bearing. North-Holland
Amsterdam.
Auer, B. R., F. Schuhmacher, and H. Kohrs (2023). “Rehabilitating Mean-
Variance Portfolio Selection: Theory and Evidence”. In: The Journal of
Portfolio Management.
Baldacci, B., J. Benveniste, and G. Ritter (2021). “Optimal Turnover, Liquid-
ity, and Autocorrelation”. In: arXiv preprint arXiv:2110.03810.
29

Bank, P., H. M. Soner, and M. Voß (2017). “Hedging with Temporary Price
Impact”. In: Mathematics and Financial Economics 11, pp. 215–239.
Benveniste, J., P. N. Kolm, and G. Ritter (2024). “Mean-Variance Equivalent
Distributions and Expected Utility Maximization”. In: In preparation.
Bergault, P., D. Evangelista, O. Guéant, and D. Vieira (2021). “Closed-Form
Approximations in Multi-Asset Market Making”. In: Applied Mathematical
Finance 28.2, pp. 101–142.
Bernstein, P. L. (1993). Capital Ideas: The Improbable Origins of Modern
Wall Street. Simon and Schuster.
Best, M. J. and R. R. Grauer (1991). “On the Sensitivity of Mean-Variance-
Efficient Portfolios to Changes in Asset Means: Some Analytical and
Computational Results”. In: The Review of Financial Studies 4.2, pp. 315–
342.
Blay, K., A. Gosh, S. Kusiak, H. Markowitz, N. Savoulides, and Q. Zheng
(2020). “Multiperiod Portfolio Selection: A Practical Simulation-Based
Framework”. In: Journal of Investment Management 18.4, pp. 94–129.
Blume, M. E. (1972). “On the Assessment Of Risk”. In: Journal of Finance
26.1.
Campbell, J. Y. and S. B. Thompson (2008). “Predicting Excess Stock Returns
Out of Sample: Can Anything Beat the Historical Average?” In: The Review
of Financial Studies 21.4, pp. 1509–1531.
Cartea, Á., S. Jaimungal, and J. Ricci (2014). “Buy Low, Sell High: A
High Frequency Trading Perspective”. In: SIAM Journal on Financial
Mathematics 5.1, pp. 415–444.
Chamberlain, G. (1983). “A Characterization of the Distributions That Imply
Mean-Variance Utility Functions”. In: Journal of Economic Theory 29.1,
pp. 185–201.
Collin-Dufresne, P., K. Daniel, and M. Sağlam (2020). “Liquidity Regimes and
Optimal Dynamic Asset Allocation”. In: Journal of Financial Economics
136.2, pp. 379–406.
DeMiguel, V., L. Garlappi, and R. Uppal (2009). “Optimal Versus Naive
Diversification: How Inefficient is the 1/N Portfolio Strategy?” In: The
Review of Financial Studies 22.5, pp. 1915–1953.
Du, J., M. Jin, P. N. Kolm, G. Ritter, Y. Wang, and B. Zhang (2020). “Deep
Reinforcement Learning for Option Replication and Hedging”. In: The
Journal of Financial Data Science 2.4, pp. 44–57.
30

El Aoud, S. and F. Abergel (2015). “A Stochastic Control Approach to Option
Market Making”. In: Market Microstructure and Liquidity 1.01, p. 1550006.
Fabozzi, F. J., P. N. Kolm, D. A. Pachamanova, and S. M. Focardi (2007).
Robust Portfolio Optimization and Management. John Wiley & Sons.
Fan, J., Y. Fan, and J. Lv (2008). “High Dimensional Covariance Matrix
Estimation Using a Factor Model”. In: Journal of Econometrics 147.1,
pp. 186–197.
Feldstein, M. S. (1969). “Mean-Variance Analysis in the Theory of Liquidity
Preference and Portfolio Selection”. In: The Review of Economic Studies
36.1, pp. 5–12.
Fishburn, P. C. (1979). Utility Theory for Decision Making. Krieger.
Fu, H., B. Hientzsch, P. N. Kolm, J. Pan, and S. Xu (2024). “Dynamic Hedging
of Multi-Stock Options under Price Impact: A Deep Learning Approach”.
In: Working Paper.
Gârleanu, N. and L. H. Pedersen (2013). “Dynamic Trading with Predictable
Returns and Transaction Costs”. In: The Journal of Finance 68.6, pp. 2309–
2340.
– (2016). “Dynamic Portfolio Choice with Frictions”. In: Journal of Economic
Theory 165, pp. 487–516.
Graham, B., K. Hirano, and G. Imbens (2023). “The ET Interview: Professor
Gary Chamberlain”. In: Econometric Theory 39.1, pp. 1–26.
Gu, S., B. Kelly, and D. Xiu (2020). “Empirical Asset Pricing via Machine
Learning”. In: The Review of Financial Studies 33.5, pp. 2223–2273.
Ingersoll, J. E. (1987). Theory of financial decision making. Vol. 3. Rowman
& Littlefield.
Jobson, J. D. and R. M. Korkie (1981). “Putting Markowitz Theory to Work”.
In: The Journal of Portfolio Management 7.4, pp. 70–74.
Jorion, P. (1985). “International portfolio diversification with estimation risk”.
In: Journal of Business, pp. 259–278.
Kelly, B. and S. Pruitt (2013). “Market Expectations in the Cross-Section of
Present Values”. In: The Journal of Finance 68.5, pp. 1721–1756.
King, B. F. (1966). “Market and Industry Factors in Stock Price Behavior”.
In: The Journal of Business 39.1, pp. 139–190.
Kinlaw, W., M. P. Kritzman, and D. Turkington (2017). A Practitioner’s
Guide to Asset Allocation. John Wiley & Sons.
Kolm, P. N. and G. Ritter (2015). “Multiperiod Portfolio Selection and
Bayesian Dynamic Models”. In: Risk 28.3, pp. 50–54.
31

Kolm, P. N. and G. Ritter (2019). “Dynamic Replication and Hedging: A
Reinforcement Learning Approach”. In: The Journal of Financial Data
Science 1.1, pp. 159–171.
Kolm, P. N., R. Tütüncü, and F. J. Fabozzi (2014). “60 Years of Portfolio
Optimization: Practical Challenges and Current Trends”. In: European
Journal of Operational Research 234.2, pp. 356–371.
Kreps, D. (1988). Notes on the Theory of Choice. Westview Press.
Kritzman, M., S. Page, and D. Turkington (2010). “In Defense of Optimization:
The Fallacy of 1/N”. In: Financial Analysts Journal 66.2, pp. 31–39.
Kroll, Y., H. Levy, and H. M. Markowitz (1984). “Mean-Variance versus
Direct Utility Maximization”. In: The Journal of Finance 39.1, pp. 47–61.
Laloux, L., P. Cizeau, J.-P. Bouchaud, and M. Potters (1999). “Noise Dressing
of Financial Correlation Matrices”. In: Physical Review Letters 83.7, p. 1467.
Levy, H. and H. M. Markowitz (1979). “Approximating Expected Utility by
Function of Mean and Variance”. In: The American Economic Review 69.3,
pp. 308–317.
Ma, G., C. C. Siu, and S.-P. Zhu (2019). “Dynamic Portfolio Choice with
Return Predictability and Transaction Costs”. In: European Journal of
Operational Research 278.3, pp. 976–988.
Markowitz, H. (1952). “Portfolio Selection”. In: The Journal of Finance 7.1,
pp. 77–91.
– (1990). Foundations of Portfolio Theory: Nobel Lecture. url: https://
www.nobelprize.org/uploads/2018/06/markowitz-lecture.pdf.
Markowitz, H. M. (1991). “Foundations of Portfolio Theory”. In: The Journal
of Finance 46.2, pp. 469–477.
– (2010). “Portfolio Theory: As I Still See It”. In: Annual Review of Financial
Economics 2.1, pp. 1–23.
Markowitz, H. M. and E. L. van Dijk (2011). “Single-Period Mean-Variance
Analysis in a Changing World”. In: Stochastic Programming: The State of
the Art In Honor of George B. Dantzig, pp. 213–237.
Merton, R. C. (1980). “On Estimating the Expected Return on the Market:
An Exploratory Investigation”. In: Journal of Financial Economics 8.4,
pp. 323–361.
Moallemi, C. C. and M. Sağlam (2017). “Dynamic Portfolio Choice with Linear
Rebalancing Rules”. In: Journal of Financial and Quantitative Analysis
52.3, pp. 1247–1278.
32

Palczewski, A. and J. Palczewski (2014). “Theoretical And Empirical Esti-
mates of Mean-Variance Portfolio Sensitivity”. In: European Journal of
Operational Research 234.2, pp. 402–410.
Peierls, R. E. (1960). “Wolfgang Ernst Pauli, 1900-1958”. In: Biographical
Memoirs of Fellows of the Royal Society 5, pp. 174–192.
Pettenuzzo, D., A. Timmermann, and R. Valkanov (2014). “Forecasting Stock
Returns Under Economic Constraints”. In: Journal of Financial Economics
114.3, pp. 517–553.
Powell, R. (2023). Lessons in Life and Investing from the Late Harry Markowitz.
url: https : / / www . evidenceinvestor . com / lessons - in - life - and -
investing-from-harry-markowitz/.
Pratt, J. W. (1964). “Risk Aversion in the Small and in the Large”. In:
Econometrica: Journal of the Econometric Society, pp. 122–136.
Raponi, V., R. Uppal, and P. Zaffaroni (2023). “Robust Portfolio Choice”. In:
Available at SSRN 3933063.
Ritter, G. (2016). “Stable Linear-Time Optimization in Arbitrage Pricing
Theory Models”. In: Risk Magazine.
– (2017). “Machine Learning for Trading”. In: Risk 30.10, pp. 84–89.
Roll, R. and S. A. Ross (1980). “An Empirical Investigation of the Arbitrage
Pricing Theory”. In: The Journal of Finance 35.5, pp. 1073–1103.
Rosenberg, B. (1974). “Extra-Market Components of Covariance in Security
Returns”. In: Journal of Financial and Quantitative Analysis 9.2, pp. 263–
274.
Ross, S. A. (1976). “The Arbitrage Theory of Capital Asset Pricing”. In:
Journal of Economic Theory 13.3, pp. 341–360.
Rubinstein, M. (2011). A History of the Theory of Investments: My Annotated
Bibliography. Vol. 335. John Wiley & Sons.
Schuhmacher, F., H. Kohrs, and B. R. Auer (2021). “Justifying Mean-Variance
Portfolio Selection When Asset Returns Are Skewed”. In: Management
Science 67.12, pp. 7812–7824.
Sexauer, S. C. and L. B. Siegel (2024). “Harry Markowitz and the Philosopher’s
Stone”. In: Financial Analysts Journal 80.1, pp. 1–11.
Sharpe, W. F. (1963). “A Simplified Model for Portfolio Analysis”. In: Man-
agement Science 9.2, pp. 277–293.
Tobin, J. (1958). “Liquidity Preference as Behavior Towards Risk”. In: The
Review of Economic Studies 25.2, pp. 65–86.
33

von Neumann, J. and O. Morgenstern (1947). Theory of Games and Economic
Behavior. Princeton University Press.
34

SSRN Id4747461

Uploaded by

Copyright:

Available Formats

SSRN Id4747461

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id4747461

Uploaded by

Copyright:

Available Formats

UNTANGLING UNIVERSALITY AND DISPELLING

MYTHS IN MEAN-VARIANCE OPTIMIZATION

JEROME BENVENISTEa , PETTER N. KOLMb , AND GORDON RITTERc

Abstract. Following Markowitz’s pioneering work on mean-variance

Keywords: Investment management; Mean-variance optimization; Mean-

Electronic copy available at: https://ssrn.com/abstract=4747461

Electronic copy available at: https://ssrn.com/abstract=4747461

We consider a market with n investable assets and let r denote the n-

Electronic copy available at: https://ssrn.com/abstract=4747461

µ := E[r] ∈ Rn , and Σ := V[r] ∈ Rn×n , (1)

which we assume exist.

µp := E[h′ r] = h′ µ and (3)

denote the ex ante portfolio return and variance respectively, C ⊆ Rn is the

Assumption 1. Our marketplace cannot contain any linear combination of

Assumption 2. We require that the covariance matrix Σ is such that h′ Σh

If there is a non-trivial vector v ̸= 0 such that Σv = 0, then by extension

Electronic copy available at: https://ssrn.com/abstract=4747461

Thus, in a world satisfying Assumption 2, unconstrained MVO is both as

In the 75 years since Markowitz’ pioneering work, mean-variance approaches

Electronic copy available at: https://ssrn.com/abstract=4747461

3.1. From Preferences to Utility Functions. The theory of portfolio

Electronic copy available at: https://ssrn.com/abstract=4747461

3.2. Mean-Variance Equivalent Distributions. A quadratic function can

Electronic copy available at: https://ssrn.com/abstract=4747461

Definition 1. For two scalars E, V ∈ R, where V > 0, let L(E, V ) denote

E[wT ] = E and V[wT ] = V . (8)

We say expected utility is a function of mean and variance if there exists

E[u(wT )] = f (E, V ) (9)

for all allocations in L(E, V ).

Electronic copy available at: https://ssrn.com/abstract=4747461

fn (x) = |Ω|−1/2 gn (x − µ)′ Ω−1 (x − µ) ,

We remark that µ is the mean vector and Ω is proportional to the covariance

gn (s) = (2π)−n/2 exp(−s/2) . (11)

The elliptical class also encompasses several non-normal distributions, such

Proposition 1. If the distribution of r is elliptical, and if u is a standard

Proposition 1 is due to Chamberlain (1983), whose interest in this problem,

Electronic copy available at: https://ssrn.com/abstract=4747461

Definition 3. The underlying asset return distribution, p(r), is said to be

Electronic copy available at: https://ssrn.com/abstract=4747461

covariance Σ. Of course, we know that for this distribution, expected utility

Now, if we let ε be the residual random vector of returns

then the return of the corresponding Markowitz portfolio is zero,

Electronic copy available at: https://ssrn.com/abstract=4747461

where Σ is an n-by-n SPD matrix.

Proposition 2. A distribution is mean-variance equivalent if and only if it

To the best of our knowledge, this proposition is the first to characterize

Definition 4. Distributions with the stochastic representation (19) are said

Exhibit 1 presents an example of a MVE-distribution. In a forthcoming

Electronic copy available at: https://ssrn.com/abstract=4747461

4. Myths and Facts

In order to be wrong, a statement must first be formulated precisely

Myth 1. MVO requires normally-distributed asset returns. MVO cannot or

Proposition 1 debunks a prevalent misconception regarding MVO, which

Myth 2. MVO is an error maximizer. MVO is not robust.

Electronic copy available at: https://ssrn.com/abstract=4747461

rank(R′ R) = rank(R) ≤ min(n, k) , (24)

so if k < n it is impossible that R′ R or Σ is invertible. Again with n = 3000,

Electronic copy available at: https://ssrn.com/abstract=4747461

cond(Σ−1 ) = λ1 /λn . (26)