A High Performance Pair Trading Application
Jieren Wang
Department of Mathematics
University of British Columbia
Vancouver, British Columbia
cwang@math.ubc.ca
Abstract— This paper describes a high-frequency pair trading strategy that exploits the power of MarketMiner, a highperformance analytics platform that enables a real-time, marketwide search for short-term correlation breakdowns across multiple markets and asset classes. The main theme of this paper is to
discuss the computational requirements of model formulation and
back-testing, and how a scalable solution built using a modular,
MPI-based infrastructure can assist quantitative model and
strategy developers by increasing the scale of their experiments
or decreasing the time it takes to thoroughly test different
parameters. We describe our work to date which is the design
of a canonical pair trading algorithm, illustrating how fast
and efficient backtesting can be performed using MarketMiner.
Preliminary results are given based on a small set of stocks,
parameter sets and correlation measures.
I. I NTRODUCTION
Pair trading is a popular quantitative method of statistical
arbitrage that has been widely used in the financial industry for
over twenty years [1]. The essence of pairs trading is to exploit
pairs of stocks whose co-movement are related to each other.
When the co-movement deteriorates, the strategy is to buy the
under-performer and sell the over-performer, anticipating that
the co-movement will recover and gains can be made. If the
co-movement does recover, the positions are reversed yielding
arbitrage profits from the spread of the two stock prices.
In the last few years the trading industry has seen an
explosive growth in low-latency networks and infrastructure.
While this has enhanced many aspects of the trading lifecyle,
it has also led to an increase in data volume and frequency,
posing additional challenges for processing and analyzing
data in the context of designing and backtesting trading
strategies. Software platforms that assist in the process of data
acquisition and management, quantitative analysis, backtesting
and deployment are broadly referred to as alpha generation
platforms [2]. There are a number of sophisticated alpha
generation platforms in the market [3], [4], [5], with one
notable open-source project [6], but the problem with all these
solutions is that they are not inherently parallel and thus their
ability to scale across a large number of securities or over a
large amount of data, particularly intra-day tick data, is limited.
Financial engineering practitioners and researchers have begun
to address this problem by parallelizing computationally intensive aspects of the analysis pipeline such as monte carlo for
options pricing [7], [8] or by developing more generic analysis
frameworks [9]. Given the current meltdown in the financial
markets, we should expect the next generation of models and
Camilo Rostoker and Alan Wagner
Department of Computer Science
University of British Columbia
Vancouver, British Columbia
{rostokec,wagner}@cs.ubc.ca
strategies to be faster, smarter, and have the ability to take into
account market-wide dependencies.
This paper will describe how a canonical intra-day pure
statistical pair trading strategy can be quickly and efficiently
backtested using the MarketMiner analytics platform [10].
Since we are backtesting a strategy based on correlation, we
wanted to compare various correlation measures to determine
which one performs better and under what circumstances. To
eliminate the potential bias of selecting specific pairs, our
goal is to take a brute-force approach by backtesting over as
many pairs as possible to determine the relative performance
of the strategy under different correlation measures. We’ll
elaborate the challenges we’ve encountered while designing
and backtesting the strategy in Matlab, even when the correlation matrices had been computed beforehand, and then
describe how we plan to integrate the strategy directly into
MarketMiner in order to speed up the backtesting process
by overcoming the main bottleneck, the computation of all
pair-wise correlations. Section II provides an overview of the
MarketMiner system, followed by Section III which provides
details on our canonical pair trading strategy. Sections IV
describes the usual backtesting process, and how ours differs
given we want to analyze more data and more pairs than
ever before, while Section V will provide some preliminary
performance results. Section VI concludes the paper and
suggests interesting avenues for future work.
II. OVERVIEW
Pair trading can be broadly categorized into three forms:
fundamental, statistical, and risk. A fundamental pair is a pair
that has been highly correlated over a historical period, usually
a few years or more, and often belong to the same industry or sector. A few well-known pairs are Exxon/Chevron,
UPS/Fedex and Wal-Mart/Target. A risk pair occurs when
a company is about to merge with or acquire another one,
and thus the two securities will become highly correlated in
anticipation of the adjusted price levels. A statistical pair
refers to a pair that may or may not be fundamental linked, but
have been found to be highly correlated over a given historical
period, with a high degree of statistical certainty. Intra-day
statistical pairs trading is high-turnover strategy that uses only
very recent data to determine correlations (e.g., the last few
hours or days at most). This means that the strategy makes
no assumptions that a pair will remain correlated next year or
month, but has a certain level of confidence that the pair will
remain correlated for the next few hours or days. Generally
speaking, we would expect the correlations to remain stable for
approximately the same amount of time used in the correlation
calculation.
The usual routine for a fundamental pair trader is to first
identify a number of candidate pairs. Each pair is then backtested over a given set of data and parameter sets before being
promoted to a live trading environment. The exact method
used to identify and backtest pairs differs from trader to trader.
Some traders may employ a rigorous statistical analysis, while
others simply “eye-ball” two charts to determine the degree of
correlation. In live trading, the number of pairs monitored per
trader can range from a few to a thousand or more; once the
number of pairs exceeds what a human can watch, software
for monitoring the pairs must be utilized.
In today’s fast-paced trading environments, it is increasingly
true that to out-compete the competition, we must out-compute
them [11]. The explosive trend toward automated trading and
the availability of tick data at sub-millisecond rates introduces
new demands and opportunities which require quick online
analysis and decision processing. MarketMiner is an ongoing
research project that addresses this data analysis problem
by supporting the computational workload associated with
performing market-wide backtesting of trading strategies. The
original design of MarketMiner was a basic MPI-enabled
pipeline for processing quote data [12], and has since been
extended to support arbitrary directed acyclic graph (DAG)
stream processing workflows. One of the strengths of MPI is
that it is the de-facto standard for messaging-passing parallel
programming and there are a large number of high quality
open-source numerical libraries available.
Given the requirements of a pair trading strategy, the enabling feature of MarketMiner is its ability to handle a large
amount of market-wide, high frequency “tick” data from a live
feed or from a historical database, and use this data to produce
large correlation matrices in an online fashion. One obvious
challenge in working with high-frequency data is due to its
sheer volume - a single day’s worth of uncompressed Trade
and Quote (TAQ) data 1 typically consumes over 50 gigabytes
of disk space! While research using high-frequency data
appears to be gaining momentum, the ability to incorporate the
data in a market-wide backtesting context has been limited due
to the lack of a scalable solution for processing and analyzing
such data.
It is well-known that the quality of high-frequency realtime stock quote data is low and difficult to use in measuring
correlation. Due to its frequent nature it may contain a large
proportion of transmission or human errors. Indeed, traditional
correlation measures are quite sensitive to outliers and this
presents a major challenge. Traditionally, traders use a variety
of data filters to filter out the ‘bad’ data, and then use
the standard Pearson definition to find correlation. This way
correlation is somewhat more robust and reliable. However,
this approach still has potential bias due to choice of filter. The
MarketMiner system has the ability to use a robust correlation
1 TAQ data is a consolidated dataset of all equity trades and quotes from
the NYSE, NASDAQ and AMEX.
measure, Maronna correlation, which is much less sensitive
to outliers and smooths the underlying timeseries used for
computing correlations [13]. Despite these advantages, the
robust method is computationally expensive and thus not
commonly used in statistical software packages, especially
those that operate on real-time data. The MarketMiner system
overcomes this difficulty by implementing a parallel algorithm
for computing robust correlation matrices [14]. The original
work investigated its scalability as an offline algorithm, and
more recently in an online setting [12].
III. A C ANONICAL PAIR T RADING S TRATEGY
Our high-frequency pair trading strategy exploits the power
MarketMiner to perform a real-time search for short-term
correlation divergences. Unlike other pair trading approaches
described in the literature [15], [16], [17], [18], we are able
to take a brute-force approach looking over all possible pairs
and combinations of parameters. Table I describes the strategy
parameters and typical values we use within our experimental
framework. Where these parameters arise in our pairs trading
algorithm is discussed below. All time-based parameters are
in time units, defined by the time window ∆s and indexed by
s = {0, . . . , smax }, where smax defines the total number of
∆s intervals in the analysis. For example, there are exactly
23400 seconds in a typical trading day, and if ∆s = 30
23400
= 780 intervals.
seconds, then there will be smax =
30
We let K denote the set of parameter sets under consideration,
and use k to index a particular parameter set. Thus, for instance
{∆s = 30, Ctype = P earson, A = 0.1, M = 100, W =
60, Y = 10, d = 0.01, ℓ = 2/3, RT = 60, HP = 30, ST =
20} is one element of the set K. Each unique combination of
parameters gives rise to a unique pair trading strategy. Using
the MarketMiner system we are able to backtest a trading
strategy for each pair p ∈ Φ, with Φ denoting the set of all
pairs under consideration, and for each parameter vector k ∈
K over the given time period. In our high-frequency analysis
we use the bid-ask midpoint (BAM) as an approximation to
the stock price, and from that calculate the 1-period return.
The bid price is the highest price someone is willing to pay
for a stock, and the ask price is the lowest price someone is
willing to sell a stock. We choose to use the BAM instead of
just the actual price as it allows for a closer approximation
to the actual price level between trades (e.g., as opposed to
using regression), which is especially useful for stocks which
trade infrequently. Quote data is much higher in frequency
and volume than trade data, which makes processing and
analyzing the data more challenging, and thus a particularly
well-suited problem for an HPC solution. A small sample of
intra-day quote data is shown in Table II. Raw tick TAQ data
contains every raw quote, not just the best offer, so there
can be many spurious ticks originating from various sources,
some human typing errors but mainly from electronic trading
systems generating test quotes (e.g., when testing a new
feature) or far-out limit orders which have little probability
of getting filled. Raw data, whether from a database or a live
stream, needs to be cleaned before being analyzed and used in
Parameter
∆s
Ctype
A
M
W
Y
d
ℓ
RT
HP
ST
Description
Time window
Type of correlation measure
Minimum correlation for trading
Time window for correlation calculation
Time window of average correlation calculation
Time window over which divergences from the correlation average are considered
Divergence level from correlation average required to trigger a trade
Retracement level for determining when to reverse a position
Time window for measuring the spread level (used in calculating retracement level)
Maximum holding period for any position
Minimum time before market close required to open a new position
Pearson
50
Values
30 sec
Maronna
0.1
100
60
10
0.01%
0.02%
Combined
200
120
20
0.03% 0.04%
1/3
60
30
20
40
0.05%
0.10%
TABLE I
S TRATEGY PARAMETER D ESCRIPTIONS AND VALUES
Timestamp
Symbol
Bid Price
Ask Price
Bid Size
Ask Size
09:30:04
09:30:04
09:30:04
09:30:04
09:30:04
09:30:04
09:30:04
09:30:04
09:30:04
09:30:04
09:30:04
09:30:04
NVDA
NVDA
NVDA
ORCL
ORCL
SLB
TWX
TWX
BK
BK
BK
BK
16.38
18.23
18.24
19.56
19.58
82.81
14.01
14.01
41.11
41.13
41.11
41.13
20.1
18.26
18.26
19.59
19.62
83.11
14.2
14.65
42.1
41.5
42.1
41.5
3
3
1
2
1
1
18
2
41
1
38
3
3
3
4
104
1
1
5
6
1
1
1
1
TABLE II
S AMPLE DATA FROM THE NYSE TAQ DATASET.
a financial model or strategy. There are many techniques used
in practice to clean tick data [19], [20], each having its own
advantages and disadvantages. The exact method of cleaning
will vary depending on the particular task at hand, and tradeoffs between the quality of cleaning and delay need to be
managed; e.g., in a real-time environment cleaning process
needs to be fast and efficient. Our approach is to use a very
simple but effective TCP-like filter to eliminate prices that are
more than a few standard deviations from their corresponding
moving average and deviation. The remaining outliers will be
gracefully down-weighted by the robust correlation method
implemented in MarketMiner.
The enabling aspect of this market-wide strategy is the
ability to quickly compute a large correlation matrix using
a sliding window of recent data points. The input to each
pair-wise correlation calculation at time s are two vectors
Xi (s) and Xj (s), containing the last M log-returns for stocks
i and j respectively. Each element xi ∈ Xi (s) is defined
as xi = log(ri (s)), where log(·) is the natural logarithm
i (s)
operator, and ri (s) = PiP(s−1)
) is the 1-period return with
Pi (s) and Pj (s) the prices of stocks i and j at time s.
The reason for using log-returns instead of the raw prices is
twofold: taking the difference of the returns yields a stationary
process, while taking the log of the differences results in a
(log) normal distribution; both results are necessary in order
to utilize statistics which assume stationarity and normality.
The following pseudo-code outlines a canonical statistical
pair trading strategy, defined by the particular set of stocks Φ
and parameter set k ∈ K over a given trading day t. We want
to choose Φ as the full set of stocks which may potentially
be chosen for backtesting, so as to optimize the strategy to
perform well under that set of stocks. If there are n stocks then
. If our goal was to backtest over all US stocks,
|Φ| = n(n−1)
2
of which there are approximately 8000, this would require
our strategy to support backtesting on over 32 million pairs!
While many stocks are not liquid enough (too few trades) to be
considered in our style of pair trading, the number of potential
pairs is still so large that a parallel algorithm is essential for
real-time trading.
1) At time s, calculate the average correlation over the last
W time intervals as
Ps
Ci,j (σ)
C̄i,j (s) = σ=s−W +1
,
W
where Ci,j (σ) is the correlation coefficient calculated
using the log-return vectors Xi (σ) and Xj (σ).
2) Check to see if C̄i,j (s) is greater than threshold A, and
the current correlation coefficient at time s has diverged
more than d% from C̄i,j (s) within the last Y time
intervals. We refer to d as the divergence threshold. Note
that typical divergence levels for pair traders with longer
time horizons tend to be larger, due to the fact that the
volatility will also be greater. With our intra-day strategy
we use a smaller divergence level to account for lower
volatility.
3) If no divergence is detected or C̄i,j (s) ≤ A, move on to
the next pair. If a divergence is detected, trigger a pair
trade. Go long on the stock that has “under-performed”
and short the one which has “over-performed”. The overperformer is simply the one which has a higher W period return relative to the other.
4) To choose a long/short ratio, we choose a ratio that keeps
us as close to cash-neutral as possible, but just slightly
on the long side. For example, if we are buying MSFT
at $30 and selling IBM at $130, a ratio of 5:1 would
give us an allocation of $150 long and $130 short. To
be more specific, suppose we have two prices Pi > Pj ,
and we want to long stock i and short j, then we want
the ratio of long/short shares for stocks i and j to be
1:x, where
x = ⌊ PPji ⌋
Similarly, if we short i, and long j, then
x = ⌈ PPji ⌉
5) The next step is to decide when to reverse the positions.
We reverse the position when we have reached a retracement level L, or if a given amount of time has elapsed
since we entered the position. The retracement level is
calculated in the following way. Let Sl , Sh and S̄ be
the high, low and average of the spread during the last
M time intervals, and Se be the spread of the two stock
prices at the time we opened the position. If Se ≤ S̄,
then
L = Sl + ℓ(Sh − Sl ),
and if Se ≥ S̄, then
L = Sh − ℓ(Sh − Sl )
where 1 > ℓ > 0 is the retracement parameter. For
example, if the high of a MSFT-IBM spread is $100,
and the low $80, and we opened the position when the
spread was around $80, and ℓ = 31 , then we reverse
when the spread has reached the retracement level
L = $80 + 13 ($100 − $80) = $80 + 31 $20 = $86.67.
Similarly, if we opened the position when the spread
was around $100, then
L = $100 − 13 ($100 − $80) = $100 − 13 $20 = $93.40
and we will reverse the position when the spread is lower
than L. We also need to add a time-based reversal trigger
in case the retracement level is never reached. Therefore,
we choose not to hold a position longer than HP time
periods. Thus after HP time periods the position is
reversed, regardless of the situation. Finally, we should
reverse all positions at the end of the trading day. We
note here that the key to a good strategy is to mitigate
losses and control risk. Thus, we point out, but do not
consider any further, several other reversal conditions.
The first is an absolute stop-loss: If the spread continues
to drop rapidly, we want to exit and minimize our loss.
The second is correlation reversion: If the correlation
returns within the average range (i.e., [C̄(1 − d), C̄),
then we reverse the positions. The reasoning behind
correlation reversion is that the prices may have adjusted
to new levels and watching for spread reversion may not
give us this information.
6) Once the position is reversed, we calculate the return
Ri,j for pair of stocks over both the long and short
positions:
πi,j
Ri,j = Pi Ni +P
j Nj
where πi,j is the profit/loss of the trade (in dollars), Pi
and Pj are the prices and Ni and Nj the number of
shares held for stock i and j respectively. For example,
suppose a trade was to long MSFT at $30 and short IBM
at $130 with the ratio of MSFT to IBM 5:1. If when
we reverse the position MSFT is $29 and IBM is $120,
then we profit ($29 − $30)5 + ($120 − $130(−1)) = $5
from this trade. The total cost, not including transaction
costs, is 5($30) + 1($130) = $280, and thus the return
is $5/$180 = 2.8%.
IV. BACKTESTING OF TRADING STRATEGIES
The next natural question is to ask which configuration
of parameters results in the best performance. One way to
compare them is to test on historical data and measure the
performance of each. This procedure is called by backtesting.
Backtesting a pair trading strategy on a particular pair of stocks
involves choosing a suitable set of historical data H, running
the strategy on H and noting wins and losses of each trade and
computing some measure of performance, such as cumulative
returns. For comparison, one can do backtesting on alternative
configurations of a given pair trading strategy on the same
data H and compare the relative performance results. This
basic procedure can be done across a variety of strategies,
pairs, sets of historical data and performance measures to help
identify the best overall trading strategy. In our experiments
we focused on testing the performance of trading strategies
where the major difference was in the method of correlation.
The raw data used in the experiments are TAQ bid-ask
data for 61 highly liquid US stocks frequently traded by
professional pair traders. Since we examine all pairs for a
given
¡61¢ set of stocks, the results presented here are based on
2 = 1830 pairs. Our strategy works on high-frequency time
frames, and thus the total dataset we consider here is limited
to one month (March 2008) which consists of 20 trading
days. While designing our market-wide pair trading strategy
we performed some preliminary experiments using Matlab to
get a feel for the different parameters and range of values
they would take. These values are given in Table I. While
this approach worked reasonably well for a small dataset of
61 stocks, we are aware that this solution will not scale.
In the following paragraphs we briefly describe our initial
experiments using Matlab and how the need for scalability
motivated us to consider a design more tightly integrated into
MarketMiner.
Approach 1: Using Matlab to read in MarketMiner’s precomputed correlation matrices. In this approach we turn
sets of pre-computed correlation matrices Ωt (s) generated by
MarketMiner into 1830 time series {Cpt (s) : M ≤ s ≤ smax }
for each trading day t, by picking out the relevant entry of
each correlation matrix. We soon abandoned this approach as
we were unable to read in multiple matrices due to memory
constraints. Each matrix Ωt (s) is 61 × 61, and when we use
∆s = 30 and M = 100, we need read in 680 such matrices
in order to define a particular time series {Cpt (s) : M ≤
s ≤ smax }, and that is for just one day t out of 20 under
consideration!
Approach 2: Using Matlab to re-create the correlation
timeseries, not utilizing MarketMiner correlation matrices.
In this approach we did not use MarketMiner’s correlation
matrices, but rather re-created all correlation timeseries in
Matlab. The caveat here is that calculating the Maronna
correlation coeffficients independently no longer assures the
resulting matrix is positive semi-definite (PSD). Nonetheless,
using Matlab to generate our correlations directly proved to
be more efficient than reading the pre-computed matrices into
Matlab. We were able to produce a daily return vector Rpt,k for
a given pair p, day t and parameter vector k in approximately 2
seconds, depending on the specific pair and parameters, using
an Open SUSE Linux PC with a dual core Intel Pentium 4
2.80 GHz processor. Even when using a Sun Grid Engine
scheduler (SGE) to distribute jobs, it is clear the computations
are prohibitively slow. With the need to produce 1830 (number
of pairs) · 20 (number of business days in March, 2008) · 42
(number of parameter sets) daily return vectors to track returns
over a given month, a rough estimate for the computation
time on a single computer is 854 hours. Using this same
scenario but backtesting over a year would take about 445
days, and even worse, scaling up to 1000 pairs over just one
month would take an estimated 19425 days, or 53 years! We
were able to reduce the computation time by creating scripts
which sent out independent Matlab jobs to a Sun Grid Engine
scheduler. This solution still has problems as the matrices are
still not PSD, and more importantly does not allow for a tight
interaction between independent pairs throughout the course
of a trading day, which can be used to optimize certain aspects
of the strategy.
Approach 3: The integrated MarketMiner solution. Given
the challenging task of analyzing market-wide correlation
matrices, it seems apparent that a custom implementation
integrated directly with the MarketMiner platform is necessary
to achieve the desired scale and timing objectives. Figure 1
illustrates a potential pair trading system using MarketMiner
to power a pair trading strategy with a particular set of parameters. The advantage of a tight integration with MarketMiner
is that the outputs from each strategy (trades decision) can
be gathered by a master process to perform additional tasks
such as risk management and liquidity provisioning. Also,
aggregating the results into a single basket, as opposed to
many individual trade orders, allows the trading system to
send utilize a sophisticated list-based algorithm to optimize
the actual execution of the trades.
Evaluating a Trading Strategy
The approach in which an intraday strategy is evaluated
differs from strategies which make trades only occasionally
(e.g., every few days or even just once a month, in the case of
a pension or mutual fund). Since we have many trades each
day, we want to evaluate how the strategy performs within the
day, but also over multiple days through a given period of time.
We adapt some of the trading model evaluation measurements
from the high frequency finance literature [21]. In a given
trading day t, for each for pair p and parameter vector k, a set
Rpt,k of returns is generated. Therefore the total set of returns
for the trading periods is just the union of each days returns:
Rpk =
T
[
Rpt,k .
(1)
t=1
The following analysis uses three key performance metrics
commonly used to assess the performance of a trading strategy: cumulative returns, maximum draw-down and win-loss
ratio. These performance measures can be defined either over a
given pair p and parameter set k, or summarized over all pairs
or over all parameter sets. Each of the three variants provides
a different view of the results. For example summarizing the
results over all pairs but for a given parameter set indicates
which parameters are most effective, while summarizing over
all parameter sets but with a given pair indicates that the pair
may be a particular good candidate for pair trading and less
sensitive to choice of parameters. The formulas for each of
the three performance measures is given below.
1) Cumulative Returns: Cumulative returns measures the
equity growth of a particular strategy. This measure is
appropriate when we assume that the strategy always
reinvests the total available capital at the start of each
period. The daily cumulative return for pair p and
parameter vector k on day t is defined as
rpt,k =
t,k
|Rp
|
Y
t,k
(rp,q
+ 1) − 1
(2)
q=1
t,k
where rp,q
is the qth return on day t. The total cumulative return rpk over the entire trading period, again for
pair p and parameter set k, is calculated as
rpk =
T
Y
(rpt,k + 1) − 1.
(3)
t=1
Both the daily and total cumulative returns can be further
summarized by aggregating the returns over all pairs
using a given parameter set, or over all parameter sets
but for a particular pair. These measures can be used to
test the effects of pairs choice and parameter choice on
returns of trading strategies, and thus help with refining
trading strategies. For example, the total cumulative
return over all pairs on day t using parameter set k is
Y
rt,k =
(rpt,k + 1) − 1
(4)
p∈Φ
and similarly, the total cumulative return for pair p on
day t over all parameter sets is
Y
rpt =
(rpt,k + 1) − 1.
(5)
k∈K
The same summary calculations can be applied to daily
cumulative returns.
2) Maximum Drawdown: Maximum drawdown is a measure of the riskiness of a trading strategy. It can be
interpreted as the “worst peak to valley drop”, for the
pair p:
k
k
− rp,q
M DDp = max(rp,q
: qa , qb ∈ Rpk , qa ≤ qb ),
a
b
k∈K
(6)
k
k
and
r
where rp,q
are
the
total
returns
for
pair
p,qb
a
p using parameter set k from trade number 1 to qa
and qb , respectively. Note that we could also define
maximum drawdown M DDk for a given parameter set
k. Moreover, we can define the two variants of maximum
Live
Data
Feed 1
Live
Data
Feed 2
Custom
TAQ Files
MySQL
DB
MarketMiner System Architecture
(components linked together using MPI-based middleware)
OHLC Bars
(every
Live
Collector
's
sec)
Technical
Analysis
(15 sec returns)
File
Collector
DB
Collector
OHLC Bar
Accumulator
's = 15 sec
Technical
Analysis
(15 sec returns)
Correlation
(over 25 mins)
Quotes &
Prices
Analytics Components
Pair Trading
Strategy
Data Adapters
Fig. 1.
Parallel
Correlation
Engine
(M=100)
Order Requests
with human
confirmation
Order Requests
with no human
confirmation
MarketMiner enabling rapid backtesting or live execution of a pair trading strategy
drawdown on a daily basis for pair p and parameter set
k, which is:
k
k
− rp,t
: ta , tb ∈ T, ta ≤ tb ). (7)
M DDpk = max(rp,t
a
b
3) Win-Loss Ratio: The win over loss trades ratio provides
information on the type of strategy used by the model.
Its definition is
k
k
Wpk
|{rp,q
: rp,q
> 0, q ∈ Rpk }|
=
,
k : r k < 0, q ∈ Rk }|
Lkp
|{rp,q
p,q
p
(8)
k
k
where {rp,q
: rp,q
> 0)} is a set of trades that gives
k
k
positive returns, and {rp,q
: rp,q
< 0)} is the set of
trades with negative returns. The numerator corresponds
to the number of winning trades and the denominator is
the number of losing trades over the same period. If we
are interested in the difference of the performance of the
strategies with different parameters values, we can use
k
k
|{rp,q
: rp,q
> 0, p ∈ Φ, q ∈ Rpk }|
Wk
=
,
k : r k < 0, p ∈ Φ, q ∈ Rk }|
Lk
|{rp,q
p,q
p
(9)
where again Φ denotes the set of all pairs under consideration.
V. R ESULTS
The results presented here focus on some preliminary
performance data from trading 61 stocks using our Matlab
implementation. The purpose is to demonstrate how a wide
range of parameters can be backtested to find configurations
that result in different strategies which can be matched to
particular risk profiles.
Strategy Performance Results
Performance comparisons of two different trading strategies
can be done across several dimension: the type of correlation
measure used, the choices of parameters and pairs, etc. We
focus attention on differences in performance arising from
different choices of correlation type. With a large set of
returns data and their corresponding performance measures
we may ask whether this information can help to shed some
light on which strategies are more effective - those using
Pearson, Maronna or Combined correlation. We analyze three
performance measures - cumulative monthly return as defined
in Equation (3), maximum daily draw down (7), and the winloss ratio (9) - and aggregate the data by taking an average
over different parameter sets.
Here are the specific details for our analysis. We may
consider Pearson, Maronna and Combined correlations as our
treatments, which are applied to our 1830 pairs of stocks,
with other factors (not considered part of our treatment)
consisting of the remaining elements in our parameter sets:
{∆s , A, M, W, Y, d, ℓ, RT, HP, ST }. We run the experiments
on different levels of these factors to account for bias of
choosing any one level. Each pair of stocks receives each
treatment at each level of the remaining factors.
The response from each treatment is one of our three
performance measures - cumulative monthly return, maximum
daily draw down, and win-loss ratio. We discuss in detail the
case of cumulative monthly returns, but the other cases are
similar. Recall our notation that rpk is the total cumulative
return of pair p using parameter vector k over the period of one
month. To highlight
the fact that there are three treatments we
Ctype ,k′
let rp
denote the return with a specified correlation type
Ctype with k ′ ∈ K ′ representing the 14 different parameter
vectors of the form {∆s , M, W, d, ℓ, RT, HP, ST, Y }. Thus
there are 14 levels of
non-treatment factors, and each pair has
C
,k′
a response rp type
for each of these levels. Our approach
is to average these responses over the different factor levels
to get a single estimate of the performance of pair p using
correlation type Ctype . Thus, the sample observations from our
populations are average cumulative returns over the month:
P
Ctype ,k′
k′ ∈K ′ rp
Ctype
+1
=
r̄p
|K ′ |
where the average is over the set of alternate parameter vectors
C
K ′ . We see that r̄p type is a measure of returns for pair p
when using Ctype as the type of correlation. We define average
maximum daily drawdown and win-loss ratio for each pair of
stocks and each correlation measure analogously, again where
the average is over the 14 different levels of the non-treatment
factors {∆s , A, M, W, Y, d, ℓ, RT, HP, ST }. Tables III, IV
and V contain descriptive statistics for each performance
measure with respect to the different correlation types. The
“best” value for each measurement is shown in bold. In
Table III we also show the Sharpe ratio, which is a measure
of risk-adjusted return and defined as
SR =
√r̄
σ̂ 2
where r̄ is the average return and σ̂ 2 is the variance of the
return around its mean.
Mean
Median
Standard Deviation
Sharpe Ratio
Skewness
Kurtosis
Correlation type:
Maronna Pearson
1.1473
1.1521
1.1204
1.1278
0.1235
0.1085
9.2899
10.6184
2.8484
1.9281
16.6541
9.4091
Ctype
Combined
1.1098
1.0979
0.0747
14.8568
1.4871
7.1706
TABLE III
AVERAGE CUMULATIVE MONTHLY RETURNS
Mean
Median
Standard Deviation
Skewness
Kurtosis
Correlation type:
Maronna Pearson
1.6662%
1.5433%
1.2446%
1.1533%
1.5481
1.4606
3.4443
3.5005
21.5922
21.5295
Ctype
Combined
1.5666%
1.1702%
1.4668
3.889
27.3131
TABLE IV
AVERAGE MAXIMUM DAILY DRAWDOWN
Mean
Median
Standard Deviation
Skewness
Kurtosis
Correlation type:
Maronna Pearson
1.2697
1.2724
1.2652
1.2688
0.1263
0.1269
0.2897
0.2521
3.0781
3.0665
Ctype
Combined
1.2787
1.2689
0.1356
0.3002
3.0991
TABLE V
AVERAGE WIN - LOSS RATIO
In addition to these tables, box plots are included to give a
qualitative appreciation of the data. See Figure 2(c). On each
box, the central mark is the median of the distribution, the
edges of the box are the 25th and 75th percentiles (or first
and third quartiles), the whiskers extend to the most extreme
data points not considered outliers, and outliers are plotted
individually. From these plots we see that the distributions
contain a significant number of outliers with some abnormally
high values.
The data presented here leads to several interesting observations. For simplicity, we will call any trading strategies
using correlation type Ctype a Ctype strategy. First, Pearson
strategies have higher mean cumulative returns than Maronna
and Combined, but it also has higher standard deviation than
Combined. Therefore, the Sharpe ratio of Combined strategies
are much higher than both Maronna and Pearson.
The average maximum draw down of Pearson strategies are
lower than both of Maronna and Combined. This means if we
use the automated trading strategies using the three correlation
measures to trade for a month, Pearson strategies will have the
least average “worst-peak-to-valley” drop. Indeed, Combined
and Pearson strategies have the same average absolute “peakto-valley” drop value, but the Pearson has higher peak return
value which results in less of a maximum drop down. Since
we would prefer to have a low maximum draw down it appears
that Maronna strategies are a less favorable choice according
to this performance measure.
As for win-loss ratios, we find that the results for all three
strategies are fairly similar, with Combined strategies having
a small advantage in both in terms of mean and standard
deviation. Related to the fact that the Combined strategies
make much less cumulative returns over the month on average,
this suggests that Combined correlation is more conservative
but generates lower returns, whereas Pearson can generate
higher returns, but bears more risk. Maronna strategies are
somewhere in between.
Also included are measures of skewness and kurtosis, which
are the third and fourth moments of a distribution respectively.
Generally speaking, skewness measures the lack of symmetry
of a distribution around its mean. A better strategy would
have a higher degree of skew to the right (large positive
skewness statistic), which means there are more proportion
of samples that are greater than the mean than those less than
the mean. Kurtosis measures the degree to which a distribution
is more or less “peaked” than a normal distribution. A greater
kurtosis means there is a relatively greater probability of an
observed value being either close to the mean or far from
the mean. From the summary data in the tables we find that
the cumulative returns of Maronna trading strategies are more
skewed to the right and has fatter tails than the others, which
suggests more trades yield unusually high returns, compared
to say Pearson trading strategies. This can also be clearly seen
in the box plots. This suggests that Maronna strategies yield
high returns for select pairs. Identifying which pairs perform
well is worthy a further investigation.
We should also note that the mean is affected by the
skewness of the distribution, and so we also give a robust
estimate of central tendency, the median, which is less affected
by lack of symmetry of the population distribution. Notice
that by a simple comparison of medians we draw identical
conclusions to when we used means.
It is important to stress that all of these simple comparisons
between values in the tables need to be examined on a more
rigorous standard of statistical significance in order to be
truly meaningful. To do so we may consider a few simple
inferential statistical tests. We discuss the ideas of a basic
scheme for designing tests on cumulative monthly returns as
one example of the type of analysis we are interested in,
and the other performance measures can be analyzed in a
similar fashion. To be clear about the underlying statistical
18
1.6
1.4
1.2
1
1.7
14
1.6
12
Win-Loss Ratio
Maximum Drawdown (%)
1.8
Monthly Returns
1.8
16
2
10
8
6
Pearson
Combined
(a) Average cumulative monthly returns
1.3
1.1
1
0
Maronna
Pearson
Combined
(b) Average maximum daily drawdown
Fig. 2.
1.4
1.2
4
2
Maronna
1.5
Maronna
Pearson
Combined
(c) Average win-loss ratio
Box plots for the three performance metrics
model on which we are basing our analysis we can consider
our tests with respect to three populations. One population
is cumulative monthly returns of pairs averaged over the
14 different parameter sets using Pearson correlation in the
trading strategy amongst all ‘highly’ correlated pairs in the
market. The other populations are similarly defined, where
instead we use Maronna and Combined correlations. The
averaged cumulative monthly returns of the 1830 pairs yields
1830 sample data points per population.
Details of this more rigorous statistical approach are not
be included this paper, and will be the subject of further
studies. The end goal of these future studies would be to give
a clearer picture of which correlation measure performs best
under different scenarios. These initial results shed some light
on the question and lead the way to further explorations.
VI. C ONCLUSIONS
To design a pair trading strategy using high-frequency data,
and to backtest and compare with existing strategies, requires
intensive computational resources. This paper demonstrates
the limitations of using Matlab to meet this computational
challenge, and the promise of using parallel systems like that
of MarketMiner. MarketMiner can accelerate the data analysis
process to consider real-time trading across the whole market,
and gives the promise of fast and accurate backtesting across
many alternate strategies.
We have also presented some preliminary work on comparing pair trading strategies using three different measures of
correlation - Maronna, Pearson and Combined. Preliminary results suggest that there are some important differences among
these measures for trading, each with different strengths and
weaknesses in terms of their risk vs. return profiles. These
preliminary results motivate further investigation into determining the characteristics of each correlation measure. Further
experiments will include considering more parameter sets,
identification of optimal parameter sets for a given correlation
measure, longer time frames (more than one month) and a
larger universe of stocks.
Future studies would also benefit from considering various
“implementation shortfalls” that occur in practice such as
transaction costs, moving the market (on big orders) and lost
opportunity (inability to fill an order).
Acknowledgments: We would like to thank Darren Clifford
of PairCo for his valuable feedback and suggestions on our
pair trading strategy.
R EFERENCES
[1] E. Gatev, W. N. Goetzmann, and K. G. Rouwenhorst, “Pairs Trading:
Performance of a Relative Value Arbitrage Rule,” SSRN eLibrary, 2006.
[2] “The world according to quants: Enter alpha generation platforms,”
2008.
[3] “ClariFi ModelStation.” [Online]. Available: http://w.clarifi.com
[4] “Alpacet Discovery.” [Online]. Available: http://ww.alphacet.com
[5] “Openquant.”
[Online].
Available:
http://www.smartquant.com/
openquant.php
[6] “Marketcetera Trading Platform.” [Online]. Available: http://www.
marketcetera.com/
[7] K. Kola, A. Chhabra, R. K. Thulasiram, and P. Thulasiraman, A Software
Architecture Framework for On-Line Option Pricing. Springer Berlin
/ Heidelberg, 2006, pp. 747–759.
[8] G. Pauletto, “Parallel monte carlo methods for derivative security pricing,” in In Computing in Economics, Finance 2000. Springer-Verlag,
2000, pp. 650–657.
[9] M.-P. Leong, C.-C. Cheung, C.-W. Cheung, P. P. M. Wan, I. K. H.
Leung, W. M. M. Yeung, W.-S. Yuen, K. S. K. Chow, K.-S. Leung,
and P. H. W. Leong, “Cpe: A parallel library for financial engineering
applications,” Computer, vol. 38, no. 10, pp. 70–77, 2005.
[10] “Marketminer analytics platform,” 2008, Scalable Analytics Inc.
[Online]. Available: http://www.scalableanalytics.com
[11] “Third annual high performance computing users conference report,”
2006, council on Competitiveness.
[12] C. Rostoker, A. Wagner, and H. H. Hoos, “A parallel workflow for realtime correlation and clustering of high-frequency stock market data,” in
IPDPS. IEEE, 2007, pp. 1–10.
[13] R. Maronna, “Robust m-estimators of multivariate location and scatter,”
Annals of Statistics, vol. 4, no. 1, pp. 51–67, 1976.
[14] J. Chilson, R. Ng, A. Wagner, and R. Zamar, “Parallel computation of
high-dimensional robust correlation and covariance matrices,” Algorithmica, vol. 45, no. 3, pp. 403–431, 2006.
[15] B. Do, R. Faff, and K. Hamza, “A new approach to modeling and
estimation for pairs trading,” 2006. [Online]. Available: www.fma.org/
Stockholm/Papers/PairsTrading BinhDo.pdf
[16] P. Nath, “High Frequency Pairs Trading with U.S. Treasury Securities:
Risks and Rewards for Hedge Funds,” SSRN eLibrary, 2003.
[17] M. S. Perlin, “M of a Kind: A Multivariate Approach at Pairs Trading,”
SSRN eLibrary, 2007.
[18] ——, “Evaluation of Pairs Trading Strategy at the Brazilian Financial
Market,” SSRN eLibrary, 2007.
[19] H. Green, B. Schmidt, and K. Reher, “Algorithms for filtering of market price data,” Computational Intelligence for Financial Engineering
(CIFEr), 1997., Proceedings of the IEEE/IAFE 1997, pp. 227–231, Mar
1997.
[20] T. N. Falkenberry, “High frequency data filtering,” 2002. [Online].
Available: http://www.tickdata.com/FilteringWhitePaper.pdf
[21] M. M. Dacorogna, R. Gencay, U. Muller, R. B. Olsen, and O. V. Olsen,
An Introduction to High Frequency Finance. Academic Press, New
York, 2001.