SSRN Id4747461
SSRN Id4747461
SSRN Id4747461
(a) Courant Institute of Mathematical Sciences, New York University, NY, USA. Email:
ejb14@nyu.edu
(b) Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., NY
10012, USA. Email: petter.kolm@nyu.edu
(c) Courant Institute of Mathematical Sciences, NYU Tandon School of Engineering,
Baruch College, Columbia University and General Partner/CIO at Ritter Alpha, LP. Email:
ritter@post.harvard.edu
Date: March 3, 2024.
1
With hindsight gained over almost 75 years, we are in the enviable position
of being able to place Markowitz’ pioneering work in historical context,
and use subsequent developments to shed new light on his contributions.
Rubinstein (2011) defines 1950 as the boundary between the “ancient” and
“classical” periods in the history of financial economics. The critical juncture
is the publication of Harry Markowitz’s seminal article, “Portfolio Selection”
in 1952 (Markowitz, 1952). In this article, Markowitz addresses a fundamental
question in investment management of how investors should allocate their
capital among a diverse set investment options. He formally quantifies the
return and risk of assets through their expected returns and their pairwise
return covariances. Then, Markowitz proposes that investors should consider
the trade-off between portfolio risk and return when determining the allocation
across investment alternatives.
His work is groundbreaking for several reasons. First, it represents a
significant departure from traditional financial wisdom by challenging the
notion that investors should solely prioritize assets based on which ones offer
the highest future value relative to their current price, without taking risk
into account. Second, it is arguably the first time a financial decision-making
process is formalized as a mathematical optimization problem. Specifically,
the mean-variance optimization (MVO) (or simply “mean-variance” for short)
problem prescribes that among the infinite number of portfolios achieving
a particular desired return, the investor should select the portfolio with the
lowest risk, measured as portfolio variance. Portfolios with identical target
returns but greater portfolio variances are deemed “inefficient,” owing to their
increased risk.
In some sense, Markowitz’ work came far too early. The optimization
of the risk-reward trade-off in portfolio theory by means of MVO is not
an accident; indeed, mean-variance is a necessary consequence of utility
theory for the broad family of elliptical distributions, as demonstrated by
Chamberlain (1983). In the 1960s and 1970s, the foundation for multi-factor
risk models are laid by Sharpe, Blume, King and Rosenberg (Sharpe, 1963;
Blume, 1972; King, 1966; Rosenberg, 1974). It is now known that mean-
variance optimization remains numerically stable when we employ “reasonable”
multi-factor risk models and expected return forecasts. Therefore, Markowitz’
pioneering work came about 25 years too early to stave off controversies
2. Mean-Variance Optimization
This assumption can be made without loss of generality, since the problem of
how to allocate between redundant assets is trivial.
Bad forecasts, even when input into an excellent optimizer, should not be
expected to lead to favorable results. In that spirit, we want to focus on the
subset of the portfolios where the model of variances and covariances can be
trusted.
h∗ := (λΣ)−1 µ . (5)
3. Universality
may represent an integral that is not available in closed form due to the high
dimension n, the nonlinearity of u, and further nonlinearity of the probability
7
where µ is the median vector, Ω is a positive definite matrix, |Ω| denotes the
determinant of Ω, and gn : R+ → R+ is a generator function (that does not
depend on µ and Ω).
We note that when expected utility is a function of the mean and variance,
then the underlying asset return distribution is also MVE. We contend that
a rational expected-utility-maximizer cares more about Definition 3 than
that of Definition 1. Particularly, a utility-maximizer cares about their
ability to find the exact solution to the expected utility problem by solving
an equivalent MVO problem, rather than concerning themselves with the
multitude of portfolios that represent sub-optimal expected-utility allocations.
The risk-aversion parameter κ typically depends on w0 and on the shape of
the function u, but in the situation of Definition 3, other nuances of u are not
needed to find the optimum. In order to apply the MVO formulation in (14),
the investor does not need to know everything about their utility function.
They only need to know enough to determine the appropriate κ given their
current wealth. In practice, determining κ entails solving the MVO problem
for all values of κ in some range, and selecting the optimal portfolio that is
compatible with the investor’s risk tolerance.
Then, which distributions are mean-variance equivalent? Tobin (1958)
conjectures that for n > 2, any two-parameter distribution is mean-variance
equivalent. But Feldstein (1969) provides a counterexample to Tobin’s
idea. Many distributions, including heavy-tailed distributions such as the
multivariate Student-t, are also mean-variance equivalent, but not all two-
parameter families.
To motivate the following, let us consider a random vector r of one period
asset returns from a multivariate normal distribution with mean µ and
10
E[r|Z] = Zµ . (16)
ε := r − Zµ , (17)
E[Z] = 1 , (20)
ε′ Σ−1 µ = 0 , (21)
E[ε | Z] = 0 , (22)
V[ε] = Σ − V[Z]µµ′ , (23)
14
where fj , j = 1, . . . , p, are factors, xi,j denote the j-th factor loading of the
i-th asset, and εi and σi are the residual return (or idiosyncratic return)
and idiosyncratic volatility of the i-th asset, respectively. We assume (27)
represents a strict factor model, such that the residual returns are mutually
uncorrelated with each other and across time, as well as uncorrelated with
all factors.
16
The model (28)–(30) entails associated reductions of the first and second
moments of the asset returns
h′ Σh = h′ Dh + h′ XFX′ h , (34)
17
The right hand side of (36) is computationally efficient. Note that D−1
merely involves n scalar reciprocals of the idiosyncratic variances σi2 , and
hence calculating D−1 is O(n) and not O(n3 ) like standard matrix inversion
algorithms. The remaining inverses in (36) all involve p × p matrices where
p ≪ n.
We can use (36) to obtain a strong lower bound on eigenvalues of Σ−1 ,
and hence an upper bound on the condition number. Hence, one of the main
issues with covariance matrices and optimization – namely, the existence
of portfolios with very low forecasted volatility – does not occur when the
covariance matrix comes from a reasonably-constructed APT model. In some
sense, factor models are the perfect complement to Markowitz optimization.
One of the key assumptions of the Rosenberg approach is that the factor
loadings xi,j represent exposures to common sources of risk. For example,
many stocks have exposure (either positive or negative) to the prices of energy
commodities because they are net producers or consumers of energy, so one
could argue that the price of a bundle of energy represents a common source
of risk. In contrast to this, top stock analysts conduct in-depth analyses of
the companies they cover, often making predictions that cannot be attributed
to any common factor. In so doing, they are effectively predicting the next
realization of the residual εi from (27). More generally, one could take any
expected return prediction vector µ and remove the component spanned by
X, thus forming
α := PX⊥ µ := (I − XX+ )µ. (37)
where X+ is the pseudo-inverse and PX⊥ denotes the orthogonal projection
onto the kernel of X′ . Form an augmented matrix Xα , which has α as the
first column, and X in the remaining columns. If we regress a cross section
18
It is hardly surprising that this and the previous myth have sparked significant
controversy and intrigue among practitioners and academics alike. The 1/n
portfolio “debate” is extensively covered in a number of articles, so we will
keep our discussion brief, focusing primarily on the main ideas.
In this battle of ideas, we playfully dub the opposing sides as “team-1/n”
and “team-MVO,” respectively. On the team-1/n side of the debate, several
studies suggest that MVO fails to outperform basic portfolio rules such as
equal weights in out-of-sample tests (see, for example, Jobson et al. (1981),
Jorion (1985), and DeMiguel et al. (2009)). In contrast, on team-MVO’s side,
research such as Kinlaw et al. (2017, Ch. 7), Kritzman et al. (2010) and Allen
et al. (2019) disprove these findings.
So how can the findings of these two teams be so different? We argue
that the majority of team-1/n studies arrive at their conclusions primarily
due to the reliance on using sample estimates of historical returns and
covariances as inputs in MVO. As we saw earlier, the sample covariance
estimator does not satisfy Assumption 2. Moreover, historical return-based
estimates are notoriously inadequate predictors of expected returns (Merton,
1980; Campbell et al., 2008). Rather, it is crucial to incorporate anomalies,
style and risk factors, and other sources of information beyond historical
returns when estimating expected return (Kelly et al., 2013; Pettenuzzo
et al., 2014; Gu et al., 2020). In contrast to those of team-1/n, studies by
team-MVO demonstrate that investors endowed with some forecasting ability
benefit from using MVO in constructing their portfolios, and outperform 1/n
portfolios and other basic portfolio rules.
To implement MVO in practice, portfolio managers need an admissible
covariance matrix and reasonable estimates of expected returns. Interestingly,
Markowitz himself did not address this issue. In his seminal article, Markowitz
(1952) begins with the declaration
19
where
Following Gârleanu et al. (2016), we refer to (43) as the aim portfolio, because
by (40), at optimality, the trading process ḣt is aiming towards it. Hence
the optimal trade aims towards an integral of future expected Markowitz
portfolios. With fond recollections of Markowitz’s work on multi-period
portfolio models (Markowitz et al., 2011; Blay et al., 2020), we hope that, if
he were still here, he would look upon equations (43) and (44) with approval.
23
r = e1 + RU , (53)
Z := e′1 r , (54)
ε := r − Ze1 , (55)
we clearly have E[Z] = 1 and e′1 ε = 0, so conditions (20) and (21) are satisfied.
To verify condition (22), by writing U = (U1 , . . . , Un )′ , we have that
Z = 1 + RU1 , (56)
′
ε = (0, RU2 , . . . , RUn ) . (57)
for j = 2, . . . , n.
Now, we turn to the proof of Proposition 2. In the following, we denote by
π the probability distribution of random vectors r that satisfy (19)–(22) of
the proposition.
h∗ = cΣ−1 µ + q, (59)
27
where we used Jensen’s inequality in (63). This shows that the expected
utility of wealth of the portfolio h∗ is at most that of the mean-variance
portfolio, as desired.
(“only if”): Let h∗ := Σ−1 µ. By assumption, for every utility function u
there is a scalar c such that
c h∗ = argmaxh E u(h′ r) .
(66)
28
Let us define
(h∗ )′ r
Z := , (73)
E [(h∗ )′ r]
ε := r − µZ , (74)
α′ µ ∗
hα := α − h , (75)
(h∗ )′ µ
where α is an arbitrary vector. Then, (h∗ )′ ε = 0 and α′ ε = h′α r. Since
(hα )′ µ = 0, we see that
E[α′ r|Z] = 0 . (76)
Therefore, as (76) is true for any vector α, we obtain
E[ε|Z] = 0 , (77)
References
29
30
32
33
34