Submission 3 M7 Algorithmic Trading Strategy PDF
Submission 3 M7 Algorithmic Trading Strategy PDF
Submission 3 M7 Algorithmic Trading Strategy PDF
Write up all the results from the analyses required in this project into a well-structured formal
report with introduction, comments, code, and conclusion sections.
Introduction
A standard pairs trading strategy involves a long-short pair of equities (such as stocks). Two
companies in the same sector are likely to be exposed to similar market factors. Occasionally
their relative stock prices will diverge due to certain events, but will revert to the long-running
mean.
This strategy was pioneered by Garry Bamberger and Nunzio Tartaglia at Morgan Stanley
around the 1980s. Most of the hedge funds rely on this strategy today as well.
Other examples may include commodities such as gold and the stock of gold mines. Another
example is oil and stocks of oil producer companies. Of course there must be some correlation
between these commodities and assets.
4
Correlation in R
# install.packages("...")
library('tseries'); library('quantmod'); library('PerformanceAnalytics'); library('urca'); library('roll');
## CORRELATION BETWEEN A PAIR OF ASSETS
# Pepsi and Coca-Cola stocks
my_portfolio <- c("PEP", "KO")
stocks <- lapply(my_portfolio,getSymbols,auto.assign=FALSE)
names(stocks) <- my_portfolio
# get the adjusted closing prices
pep <- stocks[[my_portfolio[1]]][,6]
ko <- stocks[[my_portfolio[2]]][,6]
# get the daily returns
return_pep = dailyReturn(pep,type='arithmetic')
return_ko = dailyReturn(ko,type='arithmetic')
colnames(return_pep) <- 'Pepsi Returns'
colnames(return_ko) <- 'Coca-Cola Returns'
# calculating the covariance
cor(return_ko,return_pep)
# plotting the daily returns
chart.CumReturns(cbind(return_ko,return_pep),main="PEPSI-COLA Cumulative
Returns",legend.loc="bottomright")
# We are going to deal with Pepsi (PEP) and Coca-Cola (KO) stocks. The aim is to calculate the
correlation between these assets.
# We are after the adjusted closing prices – this is why we want to get the 6-th column. Then we
calculate the daily returns, the covariance and finally plot the results.
According to the upcoming plot below, there must be some correlation between the two assets.
The exact value is 0.6816855. Because it is a positive number (approximately close to 1) we
can say that there is a positive correlation between Pepsi and Coca-Cola stocks.
5
Cointegration in R
## COINTEGRATION
# So we have to use the Pepsi (xt) adjusted closing prices and the Coca-Cola (yt) adjusted
closing prices in the regression.
# Because there is a high positive correlation factor (~0.7) of course there is some linear
relationship between the prices.
# The β value is the result of the regression. After that we can calculate the s spread based on
the formula above.
# If we consider the spread it performs periodical oscillations around some mean value.
# Obviously, such a pair is easy and convenient to trade with since it is known that the spread
will return to its average value with high probability.
6
# Now let’s use the Engle-Granger cointegration test. It consists of testing whether two paired
assets (stocks in this case) prices linear regression residuals (so the spread itself) are
stationary.
# The single stock prices (Pepsi and Coca-Cola stock prices) are not stationary. In order to use
pairs trading strategy first we have to make sure that the spread itself so the assets’ prices
differences are stationary.
# If we knew β (and we know it from the linear regression) we could just test it for stationarity
with for example Dickey-Fuller test or Phillips-Perron test.
7
We have to check the p-values and if these values < 0.05 then we can say with 95% confidence
level that the process is stationary.
In this case the values are greater than 0.05 which means that this pair (Pepsi and Coca-Cola)
is not good for statistical arbitrage and pairs trading.
Let us take another example. There is an iShares MSCI Australia and an iShares MSCI
Canada. These are two ETFs which track the economy of Australia, respectively Canada. Their
ticker symbols are EWA and EWC. We will test their cointegration. First, we define the
timeframe and download the data from yahoo finance.
There is an iShares MSCI Australia and an iShares MSCI Canada. These are two ETFs which
track the economy of Australia, respectively Canada.
Their ticker symbols are EWA and EWC. We will test their cointegration. First, we define the
timeframe and download the data from yahoo finance.
"""
start = datetime.datetime(2003, 1, 1)
end = datetime.datetime(2008, 1, 27)
ewa_prices = web.DataReader("EWA", 'yahoo', start, end)
ewc_prices = web.DataReader("EWC", 'yahoo', start, end)
#
8
ewa_close=ewa_prices['Adj Close']
ewc_close=ewc_prices['Adj Close']
#
ewa_close.plot(label='ewa', legend=True)
ewc_close.plot(label='ewc', legend=True)
plt.show()
#
#
plt.scatter(ewa_close, ewc_close)
plt.show()
# The graph produced fairly looks like a line. Perfect. So we fit a line.
slope, intercept, r_value, p_value, std_err = stats.linregress(ewa_close, ewc_close)
print("slope: " + str(slope) +
9
We got a p-value of 0.0129. That means we are statistically significant. So we could go on to
build a trading strategy by trading on the error.
The interesting part is that it is only significant, since I chose the timeframe appropriately.
For other timeframes the cointegration relationship interestingly does not hold.
11
###########CODE##############
require(quantstrat)
require(IKTrading)
require(DSTrading)
require(knitr)
require(PerformanceAnalytics)
require(tseries)
require(roll)
require(ggplot2)
12
# Full test
initDate="2003-01-01"
from="2003-01-01"
to="2019-12-31"
adj1 = unclass(EWA$EWA.Adjusted)
adj2 = unclass(EWC$EWC.Adjusted)
## Ratio (EWC/EWA)
ratio = adj2/adj1
## Rolling regression
window = 21
lm = roll_lm(adj2,adj1,window)
## Plot beta
rollingbeta <- fortify.zoo(lm$coefficients[,2],melt=TRUE)
ggplot(rollingbeta, ylab="beta", xlab="time") + geom_line(aes(x=Index,y=Value)) + theme_bw()
13
## Combine columns and turn into xts (time series), remove unnecessary columns
close = sprd
date = as.data.frame(dates[22:4278])
data = cbind(date, close)
dfdata = as.data.frame(data)
xtsData = xts(dfdata, order.by=as.Date(dfdata$date))
xtsData$close = as.numeric(xtsData$close)
xtsData$dum = vector(length = 4257)
xtsData$dum = NULL
xtsData$dates.22.4278. = NULL
z=(x-avg)/std
return(z)
}
## Backtest
currency('USD')
Sys.setenv(TZ="UTC")
stock(symbols, currency="USD", multiplier=1)
type = "enter")
#apply strategy
t1 <- Sys.time()
out <- applyStrategy(strategy=strategy.st,portfolios=portfolio.st)
t2 <- Sys.time()
print(t2-t1)
# Time difference of 5.136077 secs from last execution in Macbook Pro 2019
#set up analytics
updatePortf(portfolio.st)
dateRange <- time(getPortfolio(portfolio.st)$summary)[-1]
updateAcct(portfolio.st,dateRange)
updateEndEq(account.st)
#Stats
tStats <- tradeStats(Portfolios = portfolio.st, use="trades", inclZeroDays=FALSE)
tStats[,4:ncol(tStats)] <- round(tStats[,4:ncol(tStats)], 2)
print(data.frame(t(tStats[,-c(1,2)])))
#Std.Err.Trade.PL 15.69
#Percent.Positive 97.76
#Percent.Negative 2.24
#Profit.Factor 255.81
#Avg.Win.Trade 238.67
#Med.Win.Trade 201.01
#Avg.Losing.Trade -40.74
#Med.Losing.Trade -21.17
#Avg.Daily.PL 232.95
#Med.Daily.PL 195.85
#Std.Dev.Daily.PL 182.15
#Std.Err.Daily.PL 15.79
#Ann.Sharpe 20.30
#Max.Drawdown -1269.75
#Profit.To.Max.Draw 24.53
#Avg.WinLoss.Ratio 5.86
#Med.WinLoss.Ratio 9.49
#Max.Equity 31144.69
#Min.Equity -13.74
#End.Equity 31144.19
#Averages
(aggPF <- sum(tStats$Gross.Profits)/-sum(tStats$Gross.Losses))
#Average profits 255.7999
(aggCorrect <- mean(tStats$Percent.Positive))
#Average positive trades 97.76
(numTrades <- sum(tStats$Num.Trades))
#Number of trades 134
(meanAvgWLR <- mean(tStats$Avg.WinLoss.Ratio))
#Average winLoss ratio 5.86
#portfolio cash PL
portPL <- .blotter$portfolio.EWA_EWC$summary$Net.Trading.PL
## Sharpe Ratio
(SharpeRatio.annualized(portPL, geometric=FALSE))
##### Net.Trading.PL
#####Annualized Sharpe Ratio (Rf=0%) 3.070985
We could then improve our trading strategies by looking at a case outlined by Ernest Chan in
book, “Quantitative Trading: How to Build Your Own Algorithmic Trading Business”, whereby he
looks at two cases within a simple mean-reverting model taken from a paper by Amir Khandani
and Andrew Lo at MIT. Their strategy was built on going long on those stocks which had the
worst previous one-day returns and shorting the ones who had the best previous one-day
returns. One thing we can note here, is that their strategy worked quite well in the presence of
the assumption of “no transaction costs”. Therefore, we can impose the condition of subtracting
about 5 basis points per trade and see how the outcome fares with regards to the situation with
no transaction costs.
Here, we are using the file input from the S&P 500 stock universe.
%% MATLAB Code
Clear;
inputFile=‘Export.txt’;
outputFile=‘SPX 20071123’;
tday=unique(mytday);
op=NaN(length(tday), length(stocks));
hi=NaN(length(tday), length(stocks));
23
lo=NaN(length(tday), length(stocks));
cl=NaN(length(tday), length(stocks));
vol=NaN(length(tday), length(stocks));
for s=1:length(stocks)
stk=stocks{s};
op(idxtB, s)=myop(idxA(idxtA));
hi(idxtB, s)=myhi(idxA(idxtA));
lo(idxtB, s)=mylo(idxA(idxtA));
cl(idxtB, s)=mycl(idxA(idxtA));
vol(idxtB, s)=myvol(idxA(idxtA));
end
clear;
startDate=20060101;
endDate=20061231;
% daily returns
dailyret=(cl-lag1(cl))./lag1(cl);
weights(∼isfinite(cl) | ∼isfinite(lag1(cl)))=0;
dailypnl=smartsum(lag1(weights).*dailyret, 2);
Results case 1:
The original paper posted by Amir Khandani and Andrew Lo lead to obtain a Sharpe ratio of
4.47, however, following the implementation done by Ernest Chan of the strategy we can see
that the Sharpe ratio has now declined at 0.25 for the period in question (2006). This is because
the backtest has been performed on a universe of large market capitalization stocks on the S&P
500, instead of small and microcap stocks used in the original paper.
%y = smartsum(x, dim)
%Sum along dimension dim, ignoring NaN.
hasData=isfinite(x);
x(∼hasData)=0;
y=sum(x,dim);
y(all(∼hasData, dim))=NaN;
"smartmean.m"
function y = smartmean(x, dim)
% y = smartmean(x, dim)
% Mean value along dimension dim, ignoring NaN.
hasData=isfinite(x);
25
x(∼hasData)=0;
y=sum(x,dim)./sum(hasData, dim);
y(all(∼hasData, dim))=NaN; % set y to NaN if all entries are NaN’s.
"smartstd.m"
function y = smartstd(x, dim)
%y = smartstd(x, dim)
% std along dimension dim, ignoring NaN and Inf
hasData=isfinite(x);
x(∼hasData)=0;
y=std(x);
y(all(∼hasData, dim))=NaN;
Case 2. Considering Transaction Costs: Let’s deduct a 5-basis point transaction cost per
trade
Results Case 2: As we can see, if we adjust for the existence of transaction costs the Sharpe
ratio plummets to -3.19 indicating that the strategy now is largely unprofitable.
26
Conclusion
In this work, R and Python were used to create the system trading based on the pair
trading strategy. In the beginning, we chose to study the pair of Pepsi and Coca-Cola stock
prices using R. The data was downloaded from Yahoo finance. The test for the correlation
yielded a highly positive correlation at 0.6816855. This refers to the same linear stochastics
trends of both stock prices. However, the p value from the tests of cointegration (Engle-Granger,
Dickey-Fuller test or Phillips-Perron) yielded a large p rejecting the null hypothesis of
cointegration (Non stationary spread). Although they are highly positive collelated, they are not
suitable to put onto the pair trading they are not cointegrated.
The second pair we studied is a pair of ETFs: iShares MSCI Australia and iShares MSCI
Canada (EWA & EWC). Again, we used the data from Yahoo finance, but this time we
implement coding by Python. We also tested correlation and cointegration similar to the first
pair. The results indicated that all tests were valid for positive correlation and cointegration.
Therefore, we chose to put the pair EWA and EWC into the investigating algorithm.
We created algorithmic trading based on pair trading between EWA and EWC by R. A
prediction based on rolling linear regression of 21 days. The z-score was used as an indicator
for making decisions short or long order. It was calculated by the formula, z = (x -mu)/sigma,
where x is spread between stock’s prices, mu is the average spread over 21 days and and
sigma is the standard deviation of spread. We short the outperforming stock and to long the
underperforming one when z < -1 and do it in the opposite way when z > 1. We performed the
backtesting from 2003-01-01 until 2019-12-31. The performance of the proposed strategy was
compared to the SPY. The result shows that the proposed strategy gives more cumulative profit
than SPY.
transaction costs. The backtesting shows that, in the first case, after we weighted the backtest
with large capitalization stocks, instead of small and microcap stocks, in the S&P 500 the
Sharpe ratio initially obtained in the paper the analysis is based on was way too high as the
results obtained indicate a Sharpe of 0.25. Then, by additionally constraining the trading
process by including a 5-basis point per trade deduction, the Sharpe ratio falls to negative
territory indicating that a trading strategy is only as profitable if you take into account the full
picture of the markets and market microstructures.
Furthermore, theoretically, we can also improve the strategy using more robust statistical
measures for correlation and more robust regression. The reason we should improve it this way
is that Spearman Rho’s is symmetric, non-robust and linear statistical measure. It is difficult to
detect the correlation between the data in the real world.
28
References