L1 Financial Data and Their Properties
L1 Financial Data and Their Properties
L1 Financial Data and Their Properties
Data Analysis
Week1-4: FINANCIAL DATA AND THEIR PROPERTIES
Course Structure
• This course aims at providing basic knowledge of financial data
analysis, introducing the useful statistical tools for analyzing
financial data—R programming language, and gaining experience
in financial applications.
• From this course, hopefully you will learn how to use R to build up
and testing trading/research ideas in Finance
Part I: Basics in Programming in R and
Financial Asset Properties
• 1. Basic programming knowledge in understanding stocks and
portfolio performance by visualizing financial data
• Two keys for investing: Return and Risk, trade-off
• Return:
• Single asset returns, multiple asset returns
• Factor method in evaluate whether your investment beats the market. CAPM
and Fama-French model
• Risk:
• Variance, VaR Method, Expected Shortfall
• Balance between risk and return:
• Mean-variance efficient portfolio optimization.
Part II: Cross-sectional Portfolio
Construction
• 1. Financial Data are usually time series, eg. Price, volume, returns
all have a timestamp.
• 2. Predict time series by understand the classic properties of
timeseries. Random walk, trend, AR, etc.
• 3. Statistical methods to judge whether the financial data are
predictable?
• 4. How to find potential arbitrage opportunities
Introduction to R Programming
What is R?
• R is a programming language specially designed to resolve statistical
problems and allow the graphical display of data.
• R is rooted from S, a programming language originally created in Bell
Laboratories (formerly AT&T, now Lucent Technologies). The base code
of R was developed by two academics, Ross Ihaka and Robert
Gentleman, resulting in the programming platform we have today. For
anyone curious about the name, the letter R was chosen due to the
common first letter of the name of their creators.
• In the business side, large and established companies, such as Google
and Microsoft, already adopted R as the internal language for data
analysis.
Why R Programming?
• 1. R is available for most operating systems.
• 2. R is a mature, stable platform, continuously supported and intensively
used in the industry.
• 3. Many researchers have developed nice packages for analyzing financial
data. E.g. tidyr, dplyr, TTR, timeseries, quantmod, etc.
• 4. Learning R is easy. Friendly to users without professional programming
background
• 5. Powerful statistical analysis compared to other programming language, eg.
Python
• 6. The interface of R called RStudio is very friendly and productive.
Why R Programming?
• 7. The graphical interface provided by RStudio facilitates the use of R
and increases productivity. E.g. Base R plot, ggplot, ploty, shiny,
• https://plot.ly/r/ohlc-charts/,
• https://shiny.rstudio.com/gallery/#demos
• 8. R is compatible with different operating systems and it can interface
with different programming languages. If you need to use a code in
other programming language, such as C++, Python, Latex, it is easy to
integrate it with R. Therefore, the user is not restricted to a single
language and can use features and functions from other platforms.
• 9. R is free!!!!
What Can You Do With R and RStudio?
• Import, export, process, and store financial data based on local files or the internet
• Substitute and improve data intensive tasks from spreadsheet like software
• Develop routines for managing and controlling investment portfolios and executing
financial orders
• Implementation of various possibilities of empirical research through statistical tools,
such as econometric models and hypothesis testing
• Create dynamic websites with the Shiny package, allowing anyone in the world to
use a financial tool created by you
• Create an automated process of developing technical financial reports with package
knitr
• Write a technical book with bookdown
• Write and publish a blog about finance with blogdown
R Resources
• The CRAN website (https://cran.r-project.org/web/views/Finance.html) offers
a Task Views panel for the topic of Finance. On this page, you can find the
main packages available to perform specific operations in Finance. This
includes importing financial data from the internet, estimating econometric
model, calculation of different risk estimates, among many other possibilities.
Reading this page and the knowledge of these packages is essential for
those who intend to work in Finance. It is worth noting, however, this list
contains only the main items. The complete list of packages related to
Finance is much larger than shown in Task Views.
• R-Bloggers (https://www.r-bloggers.com/) is a website that aggregates these
blogs in a single place, making it easier for anyone to access and participate..
• Stack Overflow (https://stackoverflow.com/) is a question and answer site for
professional and enthusiast programmers.
R and RStudio Installation
• Refer to video
• “L1.1 How to Install R and RStudio”
Explaining the RStudio Screen
Explaining the RStudio Screen
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
• The use of double quotes (" ") or single quotes (' ') defines objects of the class character.
• Numbers are defined by the value itself.
• Each object in R has a class and each class has a different behavior.
• After sending the previous commands to R, the history tab has been updated.
• The print function is one of the main functions for displaying values in the prompt of R. The text
displayed as [1] indicates the index of the first line number.
• # marks comments in R
• There are huge amount of data within the WRDS database. The
WRDS data are stored in SQL database format.
• Call a function
myval = FunctionName(x=price, argument1=50, argument2=200)
Volatility: Bollinger Bands
• The Bollinger Bands have three components:
• 1. The first component is a 20-day simple moving average (SMA).
• 2. The second component is an upper band, which is two standard
deviations above the 20-day SMA.
• 3. The third component is a lower band, which is two standard deviations
below the 20-day SMA.
• Bollinger Bands are considered as volatility indicators because the
Bollinger Bands widen (narrow) with more (less) volatility in the
stock.
• When the bands narrow, it may be used as an indication that
volatility is about to rise.
Steps of Bollinger Bands
• Step 1: Obtain Closing Prices of AAPL
• Step 2: Calculate Rolling 20-Day Mean and Standard Deviation
• We use the rollmean command with k=20 to calculate the 20-day moving average.
• For the standard deviation, we use the rollapply command, which allows us to apply a
function on a rolling basis.
• The FUN=sd tells R that the function we want to apply is the standard deviation and the
width=20 tells R that it is a 20-day standard deviation that we want to calculate.
• Step 3: Subset to 2016
• Step 4: Calculate the Bollinger Bands
• The bands that are two standard deviations around the average. Using January 4, 2016
as an example, this means that the upper Bollinger Band is equal to 119.8524
(=110.726+2*4.563188), and the lower Bollinger Band is equal to 101.5996 (= 110.726-
2*4.563188)
• Step 5: Plot the Bollinger Bands
Interpreting the Bollinger Bands
• Assuming a normal distribution, two standard deviations in either
direction from the mean should cover pretty much the majority of
the data.
• Most of the closing prices fell within the Bollinger Bands.
• Then the stock price was right around the upper band, this may
be taken as an indication that the stock is overbought.
• Conversely, when the stock price moved right around the lower
band, this may be taken as an indication that the stock is oversold.
Momentum: Relative Strength Index
• A common technical analysis momentum indicator is the Relative
Strength Index (RSI).
• The RSI compares the magnitude of a stock's recent gains to the
magnitude of its recent losses and turns that information into a
number that ranges from 0 to 100.
• The RSI is used in conjunction with an overbought line and an oversold
line. The overbought line is typically set at a level of 70 and the
oversold line is typically set at a level of 30.
• A buy signal is created when the RSI rises from below the oversold line
and crosses the oversold line.
• A sell signal is created when the RSI falls from above the overbought
line and crosses the overbought line.
Momentum: RSI
• The RSI is calculated as