Forecasting Prices From Level-I Quotes in The Presence of Hidden Liquidity
Forecasting Prices From Level-I Quotes in The Presence of Hidden Liquidity
Forecasting Prices From Level-I Quotes in The Presence of Hidden Liquidity
org
Abstract
Bid and ask sizes at the top of the order book provide information on short-term price
moves. Drawing from classical descriptions of the order book in terms of queues and orderarrival rates (Smith et al (2003)), we consider a diffusion model for the evolution of the
best bid/ask queues. We compute the probability that the next price move is upward,
conditional on the best bid/ask sizes, the hidden liquidity of the market and the correlation
between changes in the bid/ask sizes. The model can be useful, among other things, to rank
trading venues in terms of the information content of their quotes and to estimate the
hidden liquidity in a market based on high-frequency data. We illustrate the approach with
an empirical study of a few liquid stocks using quotes from various exchanges.
Contents
1 Introduction
2.1
Hidden liquidity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
3 Diffusion approximation
3.1
3.2
3.3
Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
4 Data analysis
10
4.1
Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
4.2
Estimation procedure
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
4.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
5 Conclusions
14
Introduction
The term order book (OB) is generally used to describe the bid and ask prices and sizes
in continuous-auction exchanges, such as NYSE-ARCA, BATS or NASDAQ. A distinction
is often made between Level I quotes, i.e. the best bid/ask prices and sizes, and Level II
quotes, which consist of all prices and sizes available in the order book. In either case, the
OB provides information on market depth, allowing traders to estimate the impact of their
trades. A question of obvious interest, given the high degree of transparency of OB data, is
whether the order book provides any information on short-term price moves.
The role of private information on the behavior of agents in financial markets has been a
central theme of the market microstructure literature. In recent years, with the emergence
of competing electronic trading venues (ECNs and Dark Pools) for the same asset and
algorithmic trading, questions related to the quality, speed and transparency of information
on various exchanges have become ever more relevant for regulators and practitioners alike.
Parlour and Seppi offer a comprehensive review of the theoretical microstructure literature
in light of these, now prevalent, limit order markets.
In the empirical microstructure literature, the focus is on the net informational content
of measurable quantities in the limit order book, rather than the information content of
particular agents in the market. For instance, the influence of variables such as the sizes at
the best quotes on future price moves have been shown to be statistically significant (see
Harris and Panchapagesun, Hellstorm and Simonsen, Cao, Hansch and Wang). All these
studies show that this effect is particularly strong at short time intervals, on the order of
1-5 minutes. In fact, Hasbrouck [6] shows how one may compare the relative information
content of various time series variables on the efficient price of an asset. Hasbrouck used these
econometric methods to analyze the initial stages of the US stock market fragmenting into
regional exchanges, and at the time found that the preponderance of the price discovery
takes place in the New York Stock Exchange (NYSE) (a median 92.7 percent information
share). Using similar techiques, Cao, Hansch and Wang find that the order book beyond
the first step (best quotes) is informative - its information share is about 30%.
We propose here a modeling approach that allows one to measure and compare the
information content of order books as well as make short term price forecasts (on the order
of seconds). We ask a simple, fundamental, question about the OB. Can we forecast the
direction of the next price movement, based on bid and ask sizes? The degree to which this
can be done given the OB could be called the information content. For example, if the sizes
of queues do not provide information, then, if P denotes the next price move,
In the Markov model of CST, the OB has two distinguished queues representing the sizes
at best bid and the best ask levels, which are separated by the minimum tick size. Market,
limit and cancellation orders arrive at both queues according to Poisson processes. One of
the following two events must then happen first:
1. The ask queue is depleted and the best ask price goes up by one tick and the price
moves up.
2. The bid queue is depleted and the best bid price goes down by one tick and the price
moves down.
The dynamics leading to a price change may thus be viewed as a race to the bottom: the
queue that hits zero first causes the price to move in that direction.
As it turns out, the predictions of such models are not consistent with market observations. If they were, this would imply that if the best ask size becomes much smaller than
the best bid size, the probability that the next price move is upward should approach 100%.
However, empirical analysis (see Section 4) shows that this probability does not increase to
unity as the ask size goes to zero.
2.1
Hidden liquidity
We hypothesize that this happens for two reasons: first, markets are fragmented; liquidity
is typically posted on various exchanges. In the U.S. stock markets, for example, Reg NMS
requires that all market orders be routed to the venue with the best price. Moreover, limit
orders that could be immediately executed at their limit price on another market need to be
rerouted to those venues. Thus, one needs to consider the possibility that once the best ask
2.2
Adopting the language of queuing theory, we refer to the number of shares offered at the
lowest ask price as the ask queue. Similarly, the number of shares bid at the highest bid
price is called the bid queue. In the spirit of CST, we view these queues as following a
continuous time Markov chain (CTMC) where time is continuous and share quantities are
discrete, consistently with a minimum order size. We adopt the following notation,
h =
arrival rate of simultaneous cancellations at the bid and limit orders at the ask
(2.1)
h ( ) t + o(t)
h2 ( + + 2) t + o(t)
h2 (2) t + o(t).
(2.2)
If we assume, for simplicity, that = the drifts, variances and correlations of queue
sizes simplify to
mX = mY
2
X
= Y2
2h2 ( + )
3
3.1
(2.3)
Diffusion approximation
Probability of the ask queue depleting
Let < X > and < Y > denote, respectively, the median size of the queues Xt , Yt . We define
the coarse-grained variables
An alternative approach would be to keep the 4-point template and make the transition rates state-dependent.
We chose a simple diagonal transition model instead. The latter can be viewed as describing transitions observed
after a two time-units instead of one, for example. Such microstructural distinctions are not essential at the
macroscopic level, i.e. when we fit the model to transaction data.
dxt
= dWt
dyt
= dWt
E dW (1) dW (2)
(2)
= dt,
(3.1)
where
2 =
2h2 ( + )
,
< X >2
(3.2)
x > 0, y > 0,
(3.3)
or, simply,
uxx + 2uxy + uyy = 0 for x > 0, y > 0.
(3.4)
If we assume, naively, that the order-book represents fully the liquidity in the market at
a particular price level, then the mid-price will move up once the ask queue is depleted i.e.
when yt = 0 for the first time, since no more sellers are present at that level. In this case, the
probability that the price will increase corresponds to the probability that the diffusion (3.4)
exits the quadrant {(x, y); x > 0, y > 0} through the x-axis. The corresponding boundary
conditions for u(x, y) are therefore
u(0, y) = 0,
for
y > 0,
u(x, 0) = 1,
for
x > 0.
(3.5)
Note that the processes Xt and Yt do not have a drift and therefore do not have a
stationary distribution. A drift could be introduced, as in CST, by modeling market orders
and cancellations separately, resulting in queues sizes that revert around an equilibrium
3.2
What is essential, however, is to make a distinction between an ask queue being depleted and
a genuine price move where a new bid order appears at price. We know that an upward
price move might not take place when the ask queue is depleted, due to additional liquidity
at that level, which we call hidden liquidity. This hidden liquidity can be attributed to either
iceberg orders or by virtue of a Reg-NMS-type mechanism in which there are other markets
that still post liquidity on the ask-side at the same level and which must be honored before
the mid price can move up.
A simple way to model this is to assume that there is an additional amount of liquidity,
denoted by H, representing the fraction of average book size (< X > or < Y >) which is
hidden or absent from the book. A true price transition takes place if the hidden liquidity
is exhausted. In other words, we observe queues of size x or y but the true size of the
queues are x + H and y + H. Thus, if we denote by p(x, y; H) the probability of an upward
price move conditional on the observed queue sizes (x, y) and the hidden liquidity parameter
H, we have
2 (pxx + 2pxy + pyy ) = 0,
x > H, y > H,
p(H, y) = 0,
for
y > H,
p(x, H) = 1,
for
x > H.
In other words we can solve the problem with boundary conditions at zero and use the
relation
p(x, y; H) = u(x + H, y + H),
(3.6)
where u(x, y) satisfies the diffusion equation (3.4) on the first quadrant of the (x, y) plane
with boundary conditions (3.5). One could also obtain an effect similar to that of our hidden
liquidity parameter by considering mixed Von-Neumann/Dirichlet boundary conditions at
zero or bid and ask processes with jumps. Our modeling choice is motivated by the simplicity
of our closed form solution.
Solution
Theorem 3.1. The probability of an upward move in the mid price is given by
where
u(x, y) =
1
1
2
Arctan
q
Arctan
1+ yx
1 y+x
q
1+
1
(3.7)
.
(3.8)
u(x, y) =
2
Arctan
x
.
y
(3.9)
4. As approaches 1, the numerator and denominator in (5.7) both tend to zero. The
limit as 1 is
u(x, y) =
x
.
x+y
(3.10)
5. If we consider the sector y < x, i.e. the sector for which the ask queue is smaller than
q
1+
and
=
the bid queue, u(x, y) is an increasing function of . In fact, setting = yx
y+x
1 ,
u
1 + 2
1
1
=
.
2
2
2
1 + (1 ) 2
This is a positive quantity since is negative in the sector. Therefore, the assumption = 1
will underestimate the probability of an up-tick if the true correlation was higher than 1.
Data analysis
In this section, we study the information content of the best quotes for the tickers QQQQ,
XLF, JPM, and AAPL, over the first five trading days in 2010 (i.e. Jan 4-8). All four tickers
are traded on various exchanges, and this allows us to compare the information content of
these venues. In other words we will be computing the probability:
10
x+H
x + y + 2H
(4.1)
which we estimate by minimizing square errors with respect to the empirical probabilities.
We make this choice because the parameter H has a more important impact on p than
and optimization routines often converged towards the = 1 case in our data set.
In practice, when performing our data analysis, we find it easier to bucket the data in
deciles of queue sizes, rather than normalizing by the mean queue size, as we did in Section 3.
The implied hidden liquidity parameter we compute in the sequel can therefore be interpreted
as a fraction of the maximum observed queue size.
4.1
Data description
The data comes from the WRDS database, more specifically the consolidated quotes of the
NYSE-TAQ data set. Each row has a timestamp (between the hours of 10:00 and 16:00,
rounded to the nearest second), a bid price, an ask price, a bid size, an ask size and an
exchange flag, indicating if the quote was on NASDAQ (T), NYSE-ARCA (P) or BATS (Z),
see Table 1 for a sample of the data. There are other regional exchanges, but for the purpose
of this study, we focus on these venues as they have significantly more than one quote per
second.
In table 2, we present some summary statistics for the tickers QQQQ, XLF, JPM and
AAPL, across the three exchanges. The tickers QQQQ, XLF and JPM are ideal candidates,
because their bid-ask spread is almost always one tick (or one cent) wide, much like our
stochastic model. We also pick AAPL, whose spread most often trades at 3 cents (or three
ticks wide), due to AAPLs relatively high stock price. Though our model does not strictly
consider spreads greater than one, we use it to fit our model, conditional on the spread, i.e.
OB = (x, y, s) where s is the spread in cents.
4.2
Estimation procedure
1. We split the data set into three subsets, one for each exchange. Items 2-6 are repeated
separately for each exchange and each ticker.
2. We remove zero and negative spreads.
3. We bucket the bid and ask sizes, by taking deciles of the bid and ask size and
11
symbol
QQQQ
QQQQ
QQQQ
QQQQ
QQQQ
QQQQ
date
2010-01-04
2010-01-04
2010-01-04
2010-01-04
2010-01-04
2010-01-04
time
09:30:23
09:30:23
09:30:23
09:30:24
09:30:24
09:30:24
bid
46.32
46.32
46.32
46.32
46.32
46.32
ask
46.33
46.33
46.33
46.33
46.33
46.33
bsize
258
260
264
210
210
161
asize
242
242
242
271
271
271
exchange
T
T
T
P
P
P
Ticker
XLF
XLF
XLF
QQQQ
QQQQ
QQQQ
JPM
JPM
JPM
AAPL
AAPL
AAPL
Exchange
NASDAQ
NYSE
BATS
NASDAQ
NYSE
BATS
NASDAQ
NYSE
BATS
NASDAQ
NYSE
BATS
num quotes
0.7M
0.4M
0.4M
2.7M
4.0M
1.6M
1.2M
0.7M
0.6M
1.3M
0.4M
0.6M
quotes/sec
7
4
4
25
36
15
11
6
5
13
4
6
avg(spread)
0.010
0.010
0.011
0.010
0.011
0.011
0.011
0.012
0.014
0.034
0.046
0.054
12
avg(bsize+asize)
8797
10463
7505
1455
1152
1055
87
47
39
9.1
5.7
4.5
avg(price)
15.02
15.01
14.99
46.30
46.27
46.28
43.81
43.77
43.82
212.50
212.66
212.43
min
H
"
X
i,j
i+H
uij
i + j + 2H
2
dij
(4.2)
4.3
Results
We first illustrate the predictions of our model for the ticker XLF on the Nasdaq exchange
(T). We report the empirical probabilities of an up move, given the bid and ask sizes in
table 3, as well as the model probabilities, given by equation (4.1) with H estimated with
the procedure described above. Notice that even for very large bid sizes and small ask sizes
(say the 90th percentile of sizes at the bid and the 10th percentile of sizes at the ask) the
empirical probability of the mid price moving upward is high (0.85) but not arbitrary close
to one. The same is true of our model, which assumes there is a hidden liquidity H behind
both quotes. We interpret H as a measure of the information content of the bid and ask
sizes: the smaller H is, the more size matters. The larger the H, the closer all probabilities
will be to 0.5, even for drastic size imbalances.
In table 4, we display the hidden liquidity H for the four tickers and three exchanges.
These results indicate that size is most important for
XLF on NASDAQ,
QQQQ on NYSE-ARCA and for
JPM on BATS.
Finally we calculate H for AAPL, for different values of the bid-ask spread (s = 1, 2 and 3
cents). We find that sizes of
AAPL are more informative on NASDAQ, and that they matter most when the spread
is small.
13
0.1
< 1250
0.50
0.61
0.75
0.74
0.68
0.74
0.78
0.77
0.85
0.1
= 1250
0.50
0.58
0.64
0.69
0.72
0.75
0.77
0.79
0.81
0.2
< 1958
0.38
0.50
0.53
0.58
0.64
0.60
0.62
0.73
0.79
0.2
= 1958
0.42
0.50
0.56
0.61
0.65
0.68
0.71
0.73
0.75
0.3
< 2753
0.25
0.47
0.50
0.57
0.61
0.63
0.57
0.61
0.72
0.3
= 2753
0.36
0.44
0.50
0.55
0.59
0.63
0.65
0.68
0.70
0.4
< 3841
0.25
0.41
0.43
0.50
0.58
0.58
0.53
0.54
0.63
0.4
= 3841
0.31
0.39
0.45
0.50
0.54
0.58
0.61
0.63
0.66
0.5
< 4835
0.32
0.36
0.39
0.42
0.50
0.49
0.52
0.51
0.60
0.5
= 4835
0.28
0.35
0.41
0.46
0.50
0.54
0.57
0.59
0.62
0.6
< 5438
0.26
0.40
0.37
0.42
0.51
0.50
0.50
0.50
0.51
0.6
= 5438
0.25
0.32
0.37
0.42
0.46
0.50
0.53
0.56
0.58
0.7
< 5820
0.23
0.38
0.43
0.47
0.48
0.50
0.50
0.40
0.47
0.7
= 5820
0.23
0.29
0.35
0.39
0.43
0.47
0.50
0.53
0.55
0.8
< 6216
0.23
0.27
0.39
0.46
0.49
0.50
0.60
0.50
0.57
0.8
= 6216
0.21
0.27
0.32
0.37
0.41
0.44
0.47
0.50
0.53
0.9
< 6742
0.15
0.20
0.28
0.37
0.41
0.49
0.53
0.42
0.50
0.9
= 6742
0.19
0.25
0.30
0.34
0.38
0.42
0.45
0.47
0.50
Table 3: Empirical vs. Model probabilities for the probability of an upward move (XLF), on
Nasdaq (T). Rows represent bid size percentiles (i), columns represent ask size percentiles (j).
i+H
The model is given by p(i, j) = i+j+H
with H = 0.15
Modeling stocks with larger spreads may require more sophisticated models of the order
book, possibly including Level II information. Since a majority of US equities trade at
average spreads of several cents, we consider this avenue worthy of future research.
Conclusions
Based on a diffusion model of the liquidity at the top of the order book, we proposed
closed-form solutions for the probability of a price uptick conditional on Level-I quotes. The
Ticker
XLF
QQQQ
JPM
AAPL s = 1
AAPL s = 2
AAPL s = 3
NASDAQ
0.15
0.21
0.17
0.16
0.31
0.31
NYSE
0.17
0.04
0.17
0.90
0.60
0.69
BATS
0.17
0.18
0.11
0.65
0.64
0.63
14
15
Appendix
Solution of the PDE for general
Proposition 1 Let (X, Y ) be a harmonic function. Let us set
v(, ) = (
, ).
1 2
Then,
12 v + 22 v = 0.
(5.1)
(5.2)
(5.3)
satisfies
Proof: Let 1 =
1 + , 2 =
1 and set =
y+x
,
2
yx
.
2
Clearly, by Proposition 1,
v(, ) ( 1 , 2 ) satisfies
12 v + 22 v = 0.
, yx
), we have, after differentiating twice the function u
Since u(x, y) = v( y+x
2
2
uxx
uyy
uxy
1
1
v, + v, v
2
2
1
1
v, + v, + v
2
2
1
1
v, v, .
2
2
Adding the first two terms and then adding the third one multiplied by 2 gives
16
(5.4)
1
1
v, + v, v +
2
2
1
1
2
v, v, +
2
2
1
1
v, + v, + v
2
2
= v, + (v, v, ) + v,
=
(1 + )v, + (1 )v,
= 12 v + 22 v
=
0.
(5.5)
Theorem 3.1 The probability of an upward move in the mid price is given by
where
u(x, y) =
1
1
2
Arctan
q
Arctan
1+ yx
1 y+x
q
1+
1
(5.6)
.
(5.7)
u(x, y) =
1
1
2
Arctan
q
Arctan
1+ yx
1 y+x
q
1+
1
17
(5.8)
References
[1] P. Billingsley, Convergence of Probability Measures, John Wiley and Sons, 1999, New
York.
[2] G. Burghadt, J. Hanweck, and L. Lei (2006) Measuring Market Impact and Liquidity,
The Journal of Trading, Fall 2006, Vol. 1, No. 4, pp. 70-84
[3] R. Cont and A. Larrard (2011) Price Dynamics in a Markovian Limit Order Market,
working paper.
[4] R. Cont (2011) Statistical Modeling of High Frequency Financial Data: Facts, Models
and Challenges, working paper.
[5] R. Cont, S. Stoikov, R. Talreja (2010) A Stochastic Model for Order Book Dynamics,
Operations Research, Vol. 58, No. 3, May-June 2010, pp. 549-563.
[6] J. Hasbrouck (1995), One security, many markets: determining the contributions to price
discovery, Journal of Finance, Vol 1, No. 4 , pp 1175-1199
[7] E. Smith, J. D. Farmer, L. Gillemot, and S. Krishnamurthy, (2003), Statistical Theory
of the Continuous Double Auction, Quantitative Finance, Vol. 3, pp. 481-514.
18