Introduction to
Python for Finance
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Why Python for Finance?
Easy to Learn and Flexible
General purpose
Dynamic
High-level language
Integrates with other languages
Open source
Accessible to anyone
INTRODUCTION TO PYTHON FOR FINANCE
Python Shell
In [1]:
Calculations in IPython
In [1]: 1 + 1
INTRODUCTION TO PYTHON FOR FINANCE
INTRODUCTION TO PYTHON FOR FINANCE
Common mathematical operators
Operator Meaning
+ Add
- Subtract
* Multiply
/ Divide
% Modulus (remainder of division)
** Exponent
INTRODUCTION TO PYTHON FOR FINANCE
Common mathematical operators
In [1]: 8 + 4
Out [1]: 12
In [2]: 8 / 4
Out [2]: 2
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Comments and
variables
INTRODUCTION TO PYTHON FOR FINANCE
Name Surname
Instructor
Any comments?
# Example, do not modify!
print(8 / 2 )
print(2**2)
# Put code below here
print(1.0 + 0.10)
INTRODUCTION TO PYTHON FOR FINANCE
Outputs in IPython vs. script.py
IPython Shell script.py
In [1]: 1 + 1 1 + 1
Out[1]: 2 # No output
In [1]: print(1 + 1) print(1 + 1)
2 <script.py> output:
2
INTRODUCTION TO PYTHON FOR FINANCE
Variables
Variable names
Names can be upper or lower case le ers, digits, and underscores
Variables cannot start with a digit
Some variable names are reserved in Python (e.g., class or type) and should be avoided
INTRODUCTION TO PYTHON FOR FINANCE
Variable example
# Correct
day_2 = 5
# Incorrect, variable name starts with a digit
2_day = 5
INTRODUCTION TO PYTHON FOR FINANCE
Using variables to evaluate stock trends
Market price
Price to earning ratio =
Earnings per share
price = 200
earnings = 5
pe_ratio = price / earnings
print(pe_ratio)
40
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Variable Data Types
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Python Data Types
Variable Types Example
Strings 'hello world'
Integers 40
Floats 3.1417
Booleans True or False
INTRODUCTION TO PYTHON FOR FINANCE
Variable Types
Variable Types Example Abbreviations
Strings 'Tuesday' str
Integers 40 int
Floats 3.1417 float
Booleans True or False bool
INTRODUCTION TO PYTHON FOR FINANCE
What data type is a variable: type()
To identify the type, we can use the function type() :
type(variable_name)
pe_ratio = 40
print(type(pe_ratio))
<class 'int'>
INTRODUCTION TO PYTHON FOR FINANCE
Booleans
operators descriptions
== equal
!= does not equal
> greater than
< less than
INTRODUCTION TO PYTHON FOR FINANCE
Boolean Example
print(1 == 1)
True
print(type(1 == 1))
<class 'bool'>
INTRODUCTION TO PYTHON FOR FINANCE
Variable manipulations
x = 5 y = 'stock'
print(x * 3) print(y * 3)
15 'stockstockstock'
print(x + 3) print(y + 3)
8 TypeError: must be str, not int
INTRODUCTION TO PYTHON FOR FINANCE
Changing variable types
pi = 3.14159
print(type(pi))
<class 'float'>
pi_string = str(pi)
print(type(pi_string))
<class 'str'>
print('I love to eat ' + pi_string + '!')
I love to eat 3.14159!
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Lists in Python
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Lists - square brackets [ ]
months = ['January', 'February', 'March', 'April', 'May', 'June']
INTRODUCTION TO PYTHON FOR FINANCE
Python is zero-indexed
INTRODUCTION TO PYTHON FOR FINANCE
Subset lists
months = ['January', 'February', 'March', 'April', 'May', 'June']
months[0]
'January'
months[2]
'March'
INTRODUCTION TO PYTHON FOR FINANCE
Negative indexing of lists
months = ['January', 'February', 'March', 'April', 'May', 'June']
months[-1]
'June'
months[-2]
'May'
INTRODUCTION TO PYTHON FOR FINANCE
Subsetting multiple list elements with slicing
Slicing syntax
# Includes the start and up to (but not including) the end
mylist[startAt:endBefore]
Example
months = ['January', 'February', 'March', 'April', 'May', 'June']
months[2:5]
['March', 'April', 'May']
months[-4:-1]
['March', 'April', 'May']
INTRODUCTION TO PYTHON FOR FINANCE
Extended slicing with lists
months = ['January', 'February', 'March', 'April', 'May', 'June']
months[3:]
['April', 'May', 'June']
months[:3]
['January', 'February', 'March']
INTRODUCTION TO PYTHON FOR FINANCE
Slicing with Steps
# Includes the start and up to (but not including) the end
mylist[startAt:endBefore:step]
months = ['January', 'February', 'March', 'April', 'May', 'June']
months[0:6:2]
['January', 'March', 'May']
months[0:6:3]
['January', 'April']
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Lists in Lists
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Lists in Lists
Lists can contain various data types, including lists themselves.
Example: a nested list describing the month and its associated consumer price index
cpi = [['Jan', 'Feb', 'Mar'], [238.11, 237.81, 238.91]]
INTRODUCTION TO PYTHON FOR FINANCE
Subsetting Nested Lists
months = ['Jan', 'Feb', 'Mar']
print(months[1])
'Feb'
cpi = [['Jan', 'Feb', 'Mar'], [238.11, 237.81, 238.91]]
print(cpi[1])
[238.11, 237.81, 238.91]
INTRODUCTION TO PYTHON FOR FINANCE
More on Subsetting Nested Lists
How would one subset out a speci c price index?
cpi = [['Jan', 'Feb', 'Mar'], [238.11, 237.81, 238.91]]
print(cpi[1])
[238.11, 237.81, 238.91]
print(cpi[1][0])
238.11
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Methods and
functions
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Methods vs. Functions
Methods Functions
All methods are functions Not all functions are methods
List methods are a subset of built-in
functions in Python
Used on an object Requires an input of an object
prices.sort() type(prices)
INTRODUCTION TO PYTHON FOR FINANCE
List Methods - sort
Lists have several built-in methods that can help retrieve and manipulate data
Methods can be accessed as list.method()
list.sort() sorts list elements in ascending order
prices = [238.11, 237.81, 238.91]
prices.sort()
print(prices)
[237.81, 238.11, 238.91]
INTRODUCTION TO PYTHON FOR FINANCE
Adding to a list with append and extend
list.append() adds a single element to a list
months = ['January', 'February', 'March']
months.append('April')
print(months)
['January', 'February', 'March', 'April']
list.extend() adds each element to a list
months.extend(['May', 'June', 'July'])
print(months)
['January', 'February', 'March', 'April', 'May', 'June', 'July']
INTRODUCTION TO PYTHON FOR FINANCE
Useful list methods - index
list.index(x) returns the lowest index where the element x appears
months = ['January', 'February', 'March']
prices = [238.11, 237.81, 238.91]
months.index('February')
print(prices[1])
237.81
INTRODUCTION TO PYTHON FOR FINANCE
More functions ...
min(list) : returns the smallest element
max(list) : returns the largest element
INTRODUCTION TO PYTHON FOR FINANCE
Find the month with smallest CPI
months = ['January', 'February', 'March']
prices = [238.11, 237.81, 238.91]
# Identify min price
min_price = min(prices)
# Identify min price index
min_index = prices.index(min_price)
# Identify the month with min price
min_month = months[min_index]
print(min_month)
February
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Arrays
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Installing packages
pip3 install package_name_here
pip3 install numpy
INTRODUCTION TO PYTHON FOR FINANCE
Importing packages
import numpy
INTRODUCTION TO PYTHON FOR FINANCE
NumPy and Arrays
import numpy
my_array = numpy.array([0, 1, 2, 3, 4])
print(my_array)
[0, 1, 2, 3, 4]
print(type(my_array))
<class 'numpy.ndarray'>
INTRODUCTION TO PYTHON FOR FINANCE
Using an alias
import package_name
package_name.function_name(...)
import numpy as np
my_array = np.array([0, 1, 2, 3, 4])
print(my_array)
[0, 1, 2, 3, 4]
INTRODUCTION TO PYTHON FOR FINANCE
Why use an array for financial analysis?
Arrays can handle very large datasets e ciently
Computationally-memory e cient
Faster calculations and analysis than lists
Diverse functionality (many functions in Python packages)
INTRODUCTION TO PYTHON FOR FINANCE
What's the difference?
NumPy arrays Lists
my_array = np.array([3, 'is', True]) my_list = [3, 'is', True]
print(my_array) print(my_list)
['3' 'is' 'True'] [3, 'is', True]
INTRODUCTION TO PYTHON FOR FINANCE
Array operations
Arrays Lists
import numpy as np list_A = [1, 2, 3]
list_B = [4, 5, 6]
array_A = np.array([1, 2, 3])
array_B = np.array([4, 5, 6]) print(list_A + list_B)
print(array_A + array_B) [1, 2, 3, 4, 5, 6]
[5 7 9]
INTRODUCTION TO PYTHON FOR FINANCE
Array indexing
import numpy as np
months_array = np.array(['Jan', 'Feb', 'March', 'Apr', 'May'])
print(months_array[3])
Apr
print(months_array[2:5])
['March' 'Apr' 'May']
INTRODUCTION TO PYTHON FOR FINANCE
Array slicing with steps
import numpy as np
months_array = np.array(['Jan', 'Feb', 'March', 'Apr', 'May'])
print(months_array[0:5:2])
['Jan' 'March' 'May']
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Two Dimensional
Arrays
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Two-dimensional arrays
import numpy as np
months = [1, 2, 3]
prices = [238.11, 237.81, 238.91]
cpi_array = np.array([months, prices])
print(cpi_array)
[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]
INTRODUCTION TO PYTHON FOR FINANCE
Array Methods
print(cpi_array)
[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]
.shape gives you dimensions of the array
print(cpi_array.shape)
(2, 3)
.size gives you total number of elements in the array
print(cpi_array.size)
INTRODUCTION TO PYTHON FOR FINANCE
Array Functions
import numpy as np
prices = [238.11, 237.81, 238.91]
prices_array = np.array(prices)
np.mean() calculates the mean of an input
print(np.mean(prices_array))
238.27666666666667
np.std() calculates the standard deviation of an input
print(np.std(prices_array))
0.46427960923946671
INTRODUCTION TO PYTHON FOR FINANCE
The `arange()` function
numpy.arange() creates an array with start, end, step
import numpy as np
months = np.arange(1, 13)
print(months)
[ 1 2 3 4 5 6 7 8 9 10 11 12]
months_odd = np.arange(1, 13, 2)
print(months_odd)
[ 1 3 5 7 9 11]
INTRODUCTION TO PYTHON FOR FINANCE
The `transpose()` function
numpy.transpose() switches rows and columns of a numpy array
print(cpi_array)
[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]
cpi_transposed = np.transpose(cpi_array)
print(cpi_transposed)
[[ 1. 238.11]
[ 2. 237.81]
[ 3. 238.91]]
INTRODUCTION TO PYTHON FOR FINANCE
Array Indexing for 2D arrays
print(cpi_array)
[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]
# row index 1, column index 2
cpi_array[1, 2]
238.91
# all row slice, third column
print(cpi_array[:, 2])
[ 3. 238.91]
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Using Arrays for
Analyses
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Indexing Arrays
import numpy as np
months_array = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'])
indexing_array = np.array([1, 3, 5])
months_subset = months_array[indexing_array]
print(months_subset)
['Feb' 'Apr' 'Jun']
INTRODUCTION TO PYTHON FOR FINANCE
More on indexing arrays
import numpy as np
months_array = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'])
negative_index = np.array([-1, -2])
print(months_array[negative_index])
['Jun' 'May']
INTRODUCTION TO PYTHON FOR FINANCE
Boolean arrays
import numpy as np
months_array = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'])
boolean_array = np.array([True, True, True, False, False, False])
print(months_array[boolean_array])
['Jan' 'Feb' 'Mar']
INTRODUCTION TO PYTHON FOR FINANCE
More on Boolean arrays
prices_array = np.array([238.11, 237.81, 238.91])
# Create a Boolean array
boolean_array = (prices_array > 238)
print(boolean_array)
[ True False True]
print(prices_array[boolean_array])
[ 238.11 238.91]
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Visualization in
Python
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Matplotlib: A visualization package
See more of the Matplotlib gallery by clicking this link.
INTRODUCTION TO PYTHON FOR FINANCE
matplotlib.pyplot - diverse plotting functions
import matplotlib.pyplot as plt
INTRODUCTION TO PYTHON FOR FINANCE
matplotlib.pyplot - diverse plotting functions
plt.plot()
takes arguments that describe the data to be plo ed
plt.show()
displays plot to screen
INTRODUCTION TO PYTHON FOR FINANCE
Plotting with pyplot
import matplotlib.pyplot as plt
plt.plot(months, prices)
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Plot result
INTRODUCTION TO PYTHON FOR FINANCE
Red solid line
import matplotlib.pyplot as plt
plt.plot(months, prices, color = 'red')
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Plot result
INTRODUCTION TO PYTHON FOR FINANCE
Dashed line
import matplotlib.pyplot as plt
plt.plot(months, prices, color = 'red', linestyle = '--')
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Plot result
INTRODUCTION TO PYTHON FOR FINANCE
Colors and linestyles
color linestyle
'green' green '-' solid line
'red' red '--' dashed line
'cyan' cyan '-.' dashed dot line
'blue' blue ':' do ed
More documentation on colors and lines can
be found here.
INTRODUCTION TO PYTHON FOR FINANCE
Adding Labels and Titles
import matplotlib.pyplot as plt
plt.plot(months, prices, color = 'red', linestyle = '--')
# Add labels
plt.xlabel('Months')
plt.ylabel('Consumer Price Indexes, $')
plt.title('Average Monthly Consumer Price Indexes')
# Show plot
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Plot result
INTRODUCTION TO PYTHON FOR FINANCE
Adding additional lines
import matplotlib.pyplot as plt
plt.plot(months, prices, color = 'red', linestyle = '--')
# adding an additional line
plt.plot(months, prices_new, color = 'green', linestyle = '--')
plt.xlabel('Months')
plt.ylabel('Consumer Price Indexes, $')
plt.title('Average Monthly Consumer Price Indexes')
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Plot result
INTRODUCTION TO PYTHON FOR FINANCE
Scatterplots
import matplotlib.pyplot as plt
plt.scatter(x = months, y = prices, color = 'red')
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Scatterplot result
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Histograms
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Why histograms for financial analysis?
INTRODUCTION TO PYTHON FOR FINANCE
Histograms and Data
Is your data skewed?
Is your data centered around the average?
Do you have any abnormal data points (outliers) in your data?
INTRODUCTION TO PYTHON FOR FINANCE
Histograms and matplotlib.pyplot
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=3)
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Changing the number of bins
import matplotlib.pyplot as plt
plt.hist(prices, bins=6)
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Normalizing histogram data
import matplotlib.pyplot as plt
plt.hist(prices, bins=6, normed=1)
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Layering histograms on a plot
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=6, normed=1)
plt.hist(x=prices_new, bins=6, normed=1)
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Histogram result
INTRODUCTION TO PYTHON FOR FINANCE
Alpha: Changing transparency of histograms
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=6, normed=1, alpha=0.5)
plt.hist(x=prices_new, bins=6, normed=1, alpha=0.5)
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Histogram result
INTRODUCTION TO PYTHON FOR FINANCE
Adding a legend
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=6, normed=1, alpha=0.5, label="Prices 1")
plt.hist(x=prices_new, bins=6, normed=1, alpha=0.5, label="Prices New")
plt.legend()
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Histogram result
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Introducing the
dataset
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Overall Review
Python shell and scripts
Variables and data types
Lists
Arrays
Methods and functions
Indexing and subse ing
Matplotlib
INTRODUCTION TO PYTHON FOR FINANCE
S&P 100 Companies
Standard and Poor's S&P 100:
made up of major companies that span multiple industry groups
used to measure stock performance of large companies
INTRODUCTION TO PYTHON FOR FINANCE
S&P 100 Case Study
Sectors of Companies within the S&P 100 in 2017
INTRODUCTION TO PYTHON FOR FINANCE
The data
INTRODUCTION TO PYTHON FOR FINANCE
Price to Earnings Ratio
Market price
Price to earning ratio =
Earnings per share
The ratio for valuing a company that measures its current share price relative to its per-
share earnings
In general, higher P/E ratio indicates higher growth expectations
INTRODUCTION TO PYTHON FOR FINANCE
Your mission
GIVEN
Lists of data describing the S&P 100: names, prices, earnings, sectors
OBJECTIVE PART I
Explore and analyze the S&P 100 data, speci cally the P/E ratios of S&P 100 companies
INTRODUCTION TO PYTHON FOR FINANCE
Step 1: examine the lists
In [1]: my_list = [1, 2, 3, 4, 5]
# first element
In [2]: print(my_list[0])
# last element
In [3]: print(my_list[-1])
# range of elements
In [4]: print(my_list[0:3])
[1, 2, 3]
INTRODUCTION TO PYTHON FOR FINANCE
Step 2: Convert lists to arrays
# Convert lists to arrays
import numpy as np
my_array = np.array(my_list)
INTRODUCTION TO PYTHON FOR FINANCE
Step 3: Elementwise array operations
# Elementwise array operations
array_ratio = array1 / array2
INTRODUCTION TO PYTHON FOR FINANCE
Let's analyze!
INTRODUCTION TO PYTHON FOR FINANCE
A closer look at the
sectors
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Your mission
GIVEN
NumPy arrays of data describing the S&P 100: names, prices, earnings, sectors
OBJECTIVE PART II
Explore and analyze sector-speci c P/E ratios within companies of the S&P 100
INTRODUCTION TO PYTHON FOR FINANCE
Step 1: Create a boolean filtering array
stock_prices = np.array([100, 200, 300])
filter_array = (stock_prices >= 150)
print(filter_array)
[ False True True]
INTRODUCTION TO PYTHON FOR FINANCE
Step 2: Apply filtering array to subset another array
stock_prices = np.array([100, 200, 300])
filter_array = (stock_prices >= 150)
print(stock_prices[filter_array])
[200 300]
INTRODUCTION TO PYTHON FOR FINANCE
Step 3: Summarize P/E ratios
Calculate the average and standard deviation of these sector-speci c P/E ratios
import numpy as np
average_value = np.mean(my_array)
std_value = np.std(my_array)
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Visualizing trends
INTRODUCTION TO PYTHON FOR FINANCE
Adina Howe
Instructor
Your mission - outlier?
INTRODUCTION TO PYTHON FOR FINANCE
Step 1: Make a histogram
import matplotlib.pyplot as plt
plt.hist(hist_data, bins = 8)
plt.show()
INTRODUCTION TO PYTHON FOR FINANCE
Step 2: Identify the Outlier
Identify the outlier P/E ratio
Create a boolean array lter to subset this company
Filter out this company information from the provided datasets
INTRODUCTION TO PYTHON FOR FINANCE
Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Representing time
with datetimes
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Datetimes
INTERMEDIATE PYTHON FOR FINANCE
Datetimes
INTERMEDIATE PYTHON FOR FINANCE
Datetimes
from datetime import datetime
black_monday = datetime(1987, 10, 19)
print(black_monday)
datetime.datetime(1987, 10, 19, 0, 0)
INTERMEDIATE PYTHON FOR FINANCE
Datetime now
datetime.now()
datetime.datetime(2019, 11, 6, 3, 48, 30, 886713)
INTERMEDIATE PYTHON FOR FINANCE
Datetime from string
black_monday_str = "Monday, October 19, 1987. 9:30 am"
format_str = "%A, %B %d, %Y. %I:%M %p"
datetime.datetime.strptime(black_monday_str, format_str)
datetime.datetime(1987, 10, 19, 9, 30)
INTERMEDIATE PYTHON FOR FINANCE
Datetime from string
Year
%y Without century (01, 02, ..., 98, 99)
%Y With century (0001, 0002, ..., 1998, 1999, ..., 9999)
Month
%b Abbreviated names (Jan, Feb, ..., Nov, Dec)
%B Full names (January, February, ... November, December)
%m As numbers (01, 02, ..., 11, 12)
Day of Month
%d (01, 02, ..., 30, 31)
INTERMEDIATE PYTHON FOR FINANCE
Datetime from string
Weekday
%a Abbreviated name (Sun, ... Sat)
%A Full name (Sunday, ... Saturday)
%w Number (0, ..., 6)
Hour
%H 24 hour (00, 01, ... 23)
%I 12 hour (01, 02, ... 12)
%M (01, 02, ..., 59)
INTERMEDIATE PYTHON FOR FINANCE
Datetime from string
Seconds
%S (00, 01, ... 59)
Micro-seconds
%f (000000, 000001, ... 999999)
AM/PM
%p (AM, PM)
INTERMEDIATE PYTHON FOR FINANCE
Datetime from string
%m Months
%M Minutes
INTERMEDIATE PYTHON FOR FINANCE
Datetime from string
"1837-05-10"
%Y
%m
%d
"%Y-%m-%d"
INTERMEDIATE PYTHON FOR FINANCE
Datetime from string
"Friday, 17 May 01"
%A
%d
%B
%y
"%A, %d %B %y"
INTERMEDIATE PYTHON FOR FINANCE
String from datetime
dt.strftime(format_string)
INTERMEDIATE PYTHON FOR FINANCE
String from datetime
great_depression_crash = datetime.datetime(1929, 10, 29)
great_depression_crash
datetime.datetime(1929, 10, 29, 0, 0)
great_depression_crash.strftime("%a, %b %d, %Y")
'Tue, Oct 29, 1929'
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Working with
datetimes
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Datetime attributes
now.year now.hour
now.month now.minute
now.day now.second
2019 22
11 34
13 56
INTERMEDIATE PYTHON FOR FINANCE
Comparing datetimes
equals ==
less than <
more than >
INTERMEDIATE PYTHON FOR FINANCE
Comparing datetimes
from datetime import datetime
asian_crisis = datetime(1997, 7, 2)
world_mini_crash = datetime(1997, 10, 27)
asian_crisis > world_mini_crash
False
asian_crisis < world_mini_crash
True
INTERMEDIATE PYTHON FOR FINANCE
Comparing datetimes
asian_crisis = datetime(1997, 7, 2)
world_mini_crash = datetime(1997, 10, 27)
text = "10/27/1997"
format_str = "%m/%d/%Y"
sell_date = datetime.strptime(text, format_str)
sell_date == world_mini_crash
True
INTERMEDIATE PYTHON FOR FINANCE
Difference between datetimes
Compare with < , > , or == .
Subtraction returns a timedelta object.
timedelta a ributes: weeks, days, minutes, seconds, microseconds
INTERMEDIATE PYTHON FOR FINANCE
Difference between datetimes
delta = world_mini_crash - asian_crisis
type(delta)
datetime.timedelta
delta.days
117
INTERMEDIATE PYTHON FOR FINANCE
Creating relative datetimes
dt
datetime.datetime(2019, 1, 14, 0, 0)
datetime(dt.year, dt.month, dt.day - 7)
datetime.datetime(2019, 1, 7, 0, 0)
datetime(dt.year, dt.month, dt.day - 15)
ValueError Traceback (most recent call last)
<ipython-input-28-804001f45cdb> in <module>()
-> 1 datetime(dt.year, dt.month, dt.day - 15)
ValueError: day is out of range for month
INTERMEDIATE PYTHON FOR FINANCE
Creating relative datetimes
delta = world_mini_crash - asian_crisis
type(delta)
datetime.timedelta
INTERMEDIATE PYTHON FOR FINANCE
Creating relative datetimes
from datetime import timedelta
offset = timedelta(weeks = 1)
offset
datetime.timedelta(7)
dt - offset
datetime.datetime(2019, 1, 7, 0, 0)
INTERMEDIATE PYTHON FOR FINANCE
Creating relative datetimes
offset = timedelta(days=16)
dt - offset
datetime.datetime(2018, 12, 29, 0, 0)
cur_week = last_week + timedelta(weeks=1)
# Do some work with date
# set last week variable to cur week and repeat
last_week = cur_week
source_dt = event_dt - timedelta(weeks=4)
# Use source datetime to look up market factors
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Dictionaries
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Lookup by index
my_list = ['a','b','c','d']
0 1 2 3
['a','b','c','d']
my_list[0]
'a'
my_list.index('c')
INTERMEDIATE PYTHON FOR FINANCE
Lookup by key
Dictionaries
INTERMEDIATE PYTHON FOR FINANCE
Representation
{ 'key-1':'value-1', 'key-2':'value-2', 'key-3':'value-3'}
INTERMEDIATE PYTHON FOR FINANCE
Creating dictionaries
my_dict = {}
my_dict
{}
my_dict = dict()
my_dict
{}
INTERMEDIATE PYTHON FOR FINANCE
Creating dictionaries
ticker_symbols = {'AAPL':'Apple', 'F':'Ford', 'LUV':'Southwest'}
print(ticker_symbols)
{'AAPL':'Apple', 'F':'Ford', 'LUV':'Southwest'}
ticker_symbols = dict([['APPL','Apple'],['F','Ford'],['LUV','Southwest']])
print(ticker_symbols)
{'AAPL':'Apple', 'F':'Ford', 'LUV':'Southwest'}
INTERMEDIATE PYTHON FOR FINANCE
Adding to dictionaries
ticker_symbols['XON'] = 'Exxon'
ticker_symbols
{'APPL': 'Apple', 'F': 'Ford', 'LUV': 'Southwest', 'XON': 'Exxon'}
ticker_symbols['XON'] = 'Exxon OLD'
ticker_symbols
{'APPL': 'Apple','F': 'Ford','LUV': 'Southwest','XON': 'Exxon OLD'}
INTERMEDIATE PYTHON FOR FINANCE
Accessing values
ticker_symbols['F']
'Ford'
INTERMEDIATE PYTHON FOR FINANCE
Accessing values
ticker_symbols['XOM']
KeyError Traceback (most recent call last)
<ipython-input-6-782fbf617bf7> in <module>()
-> 1 ticker_symbols['XOM']
KeyError: 'XOM'
INTERMEDIATE PYTHON FOR FINANCE
Accessing values
company = ticker_symbols.get('LUV')
print(company)
'Southwest'
company = ticker_symbols.get('XOM')
print(company)
None
company = ticker_symbols.get('XOM', 'MISSING')
print(company)
'MISSING'
INTERMEDIATE PYTHON FOR FINANCE
Deleting from dictionaries
ticker_symbols
{'APPL': 'Apple', 'F': 'Ford', 'LUV': 'Southwest', 'XON': 'Exxon OLD'}
del(ticker_symbols['XON'])
ticker_symbols
{'APPL': 'Apple', 'F': 'Ford', 'LUV': 'Southwest'}
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Comparison
operators
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Python comparison operators
Equality: == , !=
Order: < , > , <= , >=
INTERMEDIATE PYTHON FOR FINANCE
Equality operator vs assignment
Test equality: ==
Assign value: =
INTERMEDIATE PYTHON FOR FINANCE
Equality operator vs assignment
13 == 13
True
count = 13
print(count)
13
INTERMEDIATE PYTHON FOR FINANCE
Equality comparisons
datetimes
numbers ( oats, ints)
dictionaries
strings
almost anything else
INTERMEDIATE PYTHON FOR FINANCE
Comparing datetimes
date_close_high = datetime(2019, 11, 27)
date_intra_high = datetime(2019, 11, 27)
print(date_close_high == date_intra_high)
True
INTERMEDIATE PYTHON FOR FINANCE
Comparing dictionaries
d1 = {'high':56.88, 'low':33.22, 'closing':56.88}
d2 = {'high':56.88, 'low':33.22, 'closing':56.88}
print(d1 == d2)
True
d1 = {'high':56.88, 'low':33.22, 'closing':56.88}
d2 = {'high':56.88, 'low':33.22, 'closing':12.89}
print(d1 == d2)
False
INTERMEDIATE PYTHON FOR FINANCE
Comparing different types
print(3 == 3.0)
True
print(3 == '3')
False
INTERMEDIATE PYTHON FOR FINANCE
Not equal operator
print(3 != 4)
True
print(3 != 3)
False
INTERMEDIATE PYTHON FOR FINANCE
Order operators
Less than <
Less than or equal <=
Greater than >
Greater than or equal >=
INTERMEDIATE PYTHON FOR FINANCE
Less than operator
print(3 < 4)
True
print(3 < 3.6)
True
print('a' < 'b')
True
INTERMEDIATE PYTHON FOR FINANCE
Less than operator
date_close_high = datetime(2019, 11, 27)
date_intra_high = datetime(2019, 11, 27)
print(date_close_high < date_intra_high)
False
INTERMEDIATE PYTHON FOR FINANCE
Less than or equal operator
print(1 <= 4)
True
print(1.0 <= 1)
True
print('e' <= 'a')
False
INTERMEDIATE PYTHON FOR FINANCE
Greater than operator
print(6 > 5)
print(4 > 4)
True
False
INTERMEDIATE PYTHON FOR FINANCE
Greater than or equal operator
print(6 >= 5)
print(4 >= 4)
True
True
INTERMEDIATE PYTHON FOR FINANCE
Order comparison across types
print(3.45454 < 90)
True
print('a' < 23)
<hr />----------------------------------------------
TypeError Traceback (most recent call last)
...
TypeError: '<' not supported between instances of 'str' and 'int'
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Boolean operators
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Boolean logic
INTERMEDIATE PYTHON FOR FINANCE
What are Boolean operations?
1. and
2. or
3. not
INTERMEDIATE PYTHON FOR FINANCE
Object evaluation
Evaluates as False Evaluates as True
Constants: Almost everything else
False
None
Numeric zero:
0
0.0
Length of zero
""
[]
{}
INTERMEDIATE PYTHON FOR FINANCE
The AND operator
True and True
True
True and False
False
INTERMEDIATE PYTHON FOR FINANCE
The OR operator
False or True
True
True or True
True
False or False
False
INTERMEDIATE PYTHON FOR FINANCE
Short circuit.
is_current() and is_investment()
False
is_current() or is_investment()
True
INTERMEDIATE PYTHON FOR FINANCE
The NOT operator
not True
False
not False
True
INTERMEDIATE PYTHON FOR FINANCE
Order of operations with NOT
True == False
False
not True == False
True
INTERMEDIATE PYTHON FOR FINANCE
Object evaluation
"CUSIP" and True
True
INTERMEDIATE PYTHON FOR FINANCE
Object evaluation
[] or False
False
INTERMEDIATE PYTHON FOR FINANCE
Object evaluation
not {}
True
INTERMEDIATE PYTHON FOR FINANCE
Returning objects
"Federal" and "State"
"State"
[] and "State"
[]
INTERMEDIATE PYTHON FOR FINANCE
Returning objects.
13 or "account number"
13
0.0 or {"balance": 2200}
{"balance": 2200}
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
If statements
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Printing sales only
trns = { 'symbol': 'TSLA', 'type':'BUY', 'amount': 300}
print(trns['amount'])
300
INTERMEDIATE PYTHON FOR FINANCE
Compound statements
control statement
statement 1
statement 2
statement 3
INTERMEDIATE PYTHON FOR FINANCE
Control Statement
if <expression> :
if x < y:
if x in y:
if x and y:
if x:
INTERMEDIATE PYTHON FOR FINANCE
Code blocks
if <expression>:
statement
statement
statement
if <expression>: statement;statement;statement
INTERMEDIATE PYTHON FOR FINANCE
Printing sales only
trns = { 'symbol': 'TSLA', 'type':'BUY', 'amount': 300}
if trns['type'] == 'SELL':
print(trns['amount'])
trns['type'] == 'SELL'
False
INTERMEDIATE PYTHON FOR FINANCE
Printing sales only.
trns = { 'symbol': 'APPL', 'type':'SELL', 'amount': 200}
if trns['type'] == 'SELL':
print(trns['amount'])
200
INTERMEDIATE PYTHON FOR FINANCE
Else
if x in y:
print("I found x in y")
else:
print("No x in y")
INTERMEDIATE PYTHON FOR FINANCE
Elif
if x == y:
print("equals")
elif x < y:
print("less")
INTERMEDIATE PYTHON FOR FINANCE
Elif
if x == y:
print("equals")
elif x < y:
print("less")
elif x > y:
print("more")
elif x == 0
print("zero")
INTERMEDIATE PYTHON FOR FINANCE
Else with elif
if x == y:
print("equals")
elif x < y:
print("less")
elif x > y:
print("more")
elif x == 0
print("zero")
else:
print("None of the above")
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
For and while loops
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Repeating a code block
CUSIP SYMBOL
037833100 AAPL
17275R102 CSCO
68389X105 ORCL
INTERMEDIATE PYTHON FOR FINANCE
Loops.
For loop While loop
INTERMEDIATE PYTHON FOR FINANCE
Statement components
<Control Statement>
<Code Block>
execution 1
execution 2
execution 3
INTERMEDIATE PYTHON FOR FINANCE
For loops
for <variable> in <sequence>:
for x in [0, 1, 2]:
d = {'key': 'value1'}
for x in d:
for x in "ORACLE":
INTERMEDIATE PYTHON FOR FINANCE
List example
for x in [0, 1, 2]:
print(x)
0
1
2
INTERMEDIATE PYTHON FOR FINANCE
Dictionary example
symbols = {'037833100': 'AAPL',
'17275R102': 'CSCO'
'68389X105': 'ORCL'}
for k in symbols:
print(symbols[k])
AAPL
CSCO
ORCL
INTERMEDIATE PYTHON FOR FINANCE
String example
for x in "ORACLE":
print(x)
O
R
A
C
L
E
INTERMEDIATE PYTHON FOR FINANCE
While control statements
while <expression>:
INTERMEDIATE PYTHON FOR FINANCE
While example
x = 0
while x < 5:
print(x)
x = (x + 1)
0
1
2
3
4
INTERMEDIATE PYTHON FOR FINANCE
Infinite loops
x = 0
while x <= 5:
print(x)
INTERMEDIATE PYTHON FOR FINANCE
Skipping with continue
for x in [0, 1, 2, 3]:
if x == 2:
continue
print(x)
0
1
3
INTERMEDIATE PYTHON FOR FINANCE
Stopping with break.
while True:
transaction = get_transaction()
if transaction['symbol'] == 'ORCL':
print('The current symbol is ORCL, break now')
break
print('Not ORCL')
Not ORCL
Not ORCL
Not ORCL
The current symbol is ORCL, break now
INTERMEDIATE PYTHON FOR FINANCE
Let's practice 'for'
and 'while' loops!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Creating a
DataFrame
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Pandas
import pandas as pd
print(pd)
<module 'pandas' from '.../pandas/__init__.py'>
INTERMEDIATE PYTHON FOR FINANCE
Pandas DataFrame
pd.DataFrame()
INTERMEDIATE PYTHON FOR FINANCE
Pandas DataFrame
Col 1 Col 2 Col 3
0 v1 a 00
1 v2 b 01
2 v3 c 13.02
INTERMEDIATE PYTHON FOR FINANCE
From dict
data = {'Bank Code': ['BA', 'AAD', 'BA'],
'Account#': ['ajfdk2', '1234nmk', 'mm3d90'],
'Balance':[1222.00, 390789.11, 13.02]}
df = pd.DataFrame(data=data)
INTERMEDIATE PYTHON FOR FINANCE
From dict
data = {'Bank Code': ['BA', 'AAD', 'BA'],
'Account#': ['ajfdk2', '1234nmk', 'mm3d90'],
'Balance':[1222.00, 390789.11, 13.02]}
df = pd.DataFrame(data=data)
Bank Code Account# Balance
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
INTERMEDIATE PYTHON FOR FINANCE
From list of dicts
data = [{'Bank Code': 'BA', 'Account#': 'ajfdk2', 'Balance': 1222.00},
{'Bank Code': 'AAD', 'Account#': '1234nmk', 'Balance': 390789.11},
{'Bank Code': 'BA', 'Account#': 'mm3d90', 'Balance': 13.02}]
df = pd.DataFrame(data=data)
INTERMEDIATE PYTHON FOR FINANCE
From list of dicts
data = [{'Bank Code': 'BA', 'Account#': 'ajfdk2', 'Balance': 1222.00},
{'Bank Code': 'AAD', 'Account#': '1234nmk', 'Balance': 390789.11},
{'Bank Code': 'BA', 'Account#': 'mm3d90', 'Balance': 13.02}]
df = pd.DataFrame(data=data)
Bank Code Account# Balance
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
INTERMEDIATE PYTHON FOR FINANCE
From list of lists
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
df = pd.DataFrame(data=data)
INTERMEDIATE PYTHON FOR FINANCE
From list of lists
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
df = pd.DataFrame(data=data)
0 1 2
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
INTERMEDIATE PYTHON FOR FINANCE
From list of lists with column names
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
columns = ['Bank Code', 'Account#', 'Balance']
df = pd.DataFrame(data=data, columns=columns)
Bank Code Account# Balance
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
INTERMEDIATE PYTHON FOR FINANCE
From list of lists with column names
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
columns = ['Bank Code', 'Account#', 'Balance']
df = pd.DataFrame(data=data, columns=columns)
Bank Code Account# Balance
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
2 BA mm3d90 13.02
INTERMEDIATE PYTHON FOR FINANCE
Reading data
Excel pd.read_excel
JSON pd.read_json
HTML pd.read_html
Pickle pd.read_pickle
Sql pd.read_sql
Csv pd.read_csv
INTERMEDIATE PYTHON FOR FINANCE
CSV
Comma separated values
client id,trans type, amount
14343,buy,23.0
0574,sell,2000
7093,dividend,2234
INTERMEDIATE PYTHON FOR FINANCE
Reading a csv file
df = pd.read_csv('/data/daily/transactions.csv')
INTERMEDIATE PYTHON FOR FINANCE
Reading a csv file
df = pd.read_csv('/data/daily/transactions.csv')
client id trans type amount
14343 buy 23.0
0574 sell 2000
7093 dividend 2234
INTERMEDIATE PYTHON FOR FINANCE
Non-comma csv
client id|trans type| amount
14343|buy|23.0
0574|sell|2000
7093|dividend|2234
INTERMEDIATE PYTHON FOR FINANCE
Non-comma csv
df = pd.read_csv('/data/daily/transactions.csv', sep='|')
INTERMEDIATE PYTHON FOR FINANCE
Non-comma csv
df = pd.read_csv('/data/daily/transactions.csv', sep='|')
client id trans type amount
14343 buy 23.0
0574 sell 2000
7093 dividend 2234
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Accessing Data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Account Balance
INTERMEDIATE PYTHON FOR FINANCE
Introducing lesson data
Bank Code Account# Balance
a BA ajfdk2 1222.00
b AAD 1234nmk 390789.11
c BA mm3d90 13.02
accounts
INTERMEDIATE PYTHON FOR FINANCE
Access column using brackets
accounts['Balance']
INTERMEDIATE PYTHON FOR FINANCE
Access column using brackets
accounts['Balance']
a 1222.00
b 390789.11
c 13.02
Name: Balance, dtype: oat6
INTERMEDIATE PYTHON FOR FINANCE
Access column using dot-syntax
accounts.Balance
Balance
a 1222.00
b 390789.11
c 13.02
INTERMEDIATE PYTHON FOR FINANCE
Access multiple columns
accounts[['Bank Code', 'Account#']]
INTERMEDIATE PYTHON FOR FINANCE
Access multiple columns
accounts[['Bank Code', 'Account#']]
Bank Code Account#
a BA ajfdk2
b AAD 1234nmk
c BA mm3d90
INTERMEDIATE PYTHON FOR FINANCE
Access rows using brackets
accounts[0:2]
INTERMEDIATE PYTHON FOR FINANCE
Access rows using brackets
accounts[0:2]
Bank Code Account# Balance
a BA ajfdk2 1222.00
b AAD 1234nmk 390789.11
INTERMEDIATE PYTHON FOR FINANCE
Access rows using brackets
accounts[[True, False, True]]
INTERMEDIATE PYTHON FOR FINANCE
Access rows using brackets
accounts[[True, False, True]]
Bank Code Account# Balance
a BA ajfdk2 1222.00
c BA mm3d90 13.02
INTERMEDIATE PYTHON FOR FINANCE
loc and iloc
loc access by name
iloc access by position
INTERMEDIATE PYTHON FOR FINANCE
loc
accounts.loc['b']
Bank Code AAD
Account# 1234nmk
Balance 390789
Name: b, dtype: object
INTERMEDIATE PYTHON FOR FINANCE
loc
accounts.loc[['a','c']] df.loc[[True, False, True]]
Bank Code Account# Balance Bank Code Account# Balance
a BA ajfdk2 1222.00 a BA ajfdk2 1222.00
c BA mm3d90 13.02 c BA mm3d90 13.02
INTERMEDIATE PYTHON FOR FINANCE
Columns with loc
accounts.loc['a':'c','Balance']
accounts.loc['a':'c', ['Balance','Account#']]
accounts.loc['a':'c',[True,False,True]]
accounts.loc['a':'c','Bank Code':'Balance']
INTERMEDIATE PYTHON FOR FINANCE
Columns with loc
accounts.loc['a':'c',['Bank Code', 'Balance']]
INTERMEDIATE PYTHON FOR FINANCE
Columns with loc
accounts.loc['a':'c',['Bank Code', 'Balance']]
Bank Code Balance
a BA 1222.00
b AAD 390789.11
c BA 13.02
INTERMEDIATE PYTHON FOR FINANCE
iloc
accounts.iloc[0:2, [0,2]]
INTERMEDIATE PYTHON FOR FINANCE
iloc
accounts.iloc[0:2, [0,2]]
INTERMEDIATE PYTHON FOR FINANCE
iloc
accounts.iloc[0:2, [0,2]]
Bank Code Balance
a BA 1222.00
b AAD 390789.11
INTERMEDIATE PYTHON FOR FINANCE
Setting a single value
Bank Code Account# Balance
a BA ajfdk2 1222.00
b AAD 1234nmk 390789.11
c BA mm3d90 13.02
accounts.loc['a', 'Balance'] = 0
INTERMEDIATE PYTHON FOR FINANCE
Setting a single value
Bank Code Account# Balance
a BA ajfdk2 0.00
b AAD 1234nmk 390789.11
c BA mm3d90 13.02
accounts.loc['a', 'Balance'] = 0
INTERMEDIATE PYTHON FOR FINANCE
Setting multiple values
Bank Code Account# Balance
a BA ajfdk2 1222.00
b AAD 1234nmk 390789.11
c BA mm3d90 13.02
accounts.iloc[:2, 1:] = 'NA'
INTERMEDIATE PYTHON FOR FINANCE
Setting multiple columns
Bank Code Account# Balance
a BA NA NA
b AAD NA NA
c BA mm3d90 13.02
accounts.iloc[:2, 1:] = 'NA'
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Aggregating and
summarizing
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
DataFrame methods
.count() .sum()
.min() .prod()
.max() .mean()
.first() .median()
.last() .std()
.var()
INTERMEDIATE PYTHON FOR FINANCE
Axis
Rows Columns
default axis=1
axis=0 axis='columns'
axis='rows'
INTERMEDIATE PYTHON FOR FINANCE
Count
AAD GDDL IMA df.count()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 4
2020-10-05 300.00 80.00 45.33 GDDL 4
IMA 4
2020-10-07 302.90 82.92 49.00
dtype: int64
INTERMEDIATE PYTHON FOR FINANCE
Sum
AAD GDDL IMA df.sum(axis=1)
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 2020-10-03 415.44
2020-10-05 300.00 80.00 45.33 2020-10-04 426.47
2020-10-05 425.33
2020-10-07 302.90 82.92 49.00
2020-10-07 434.82
dtype: float64
INTERMEDIATE PYTHON FOR FINANCE
Product
AAD GDDL IMA df.prod(axis='columns')
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 2020-10-03 9.022416e+05
2020-10-05 300.00 80.00 45.33 2020-10-04 1.084987e+06
2020-10-05 1.087920e+06
2020-10-07 302.90 82.92 49.00
2020-10-07 1.230707e+06
dtype: float64
INTERMEDIATE PYTHON FOR FINANCE
Mean
AAD GDDL IMA df.mean()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 301.1525
2020-10-05 300.00 80.00 45.33 GDDL 79.5575
IMA 44.8050
2020-10-07 302.90 82.92 49.00
dtype: float64
INTERMEDIATE PYTHON FOR FINANCE
Median
AAD GDDL IMA df.median()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 300.855
2020-10-05 300.00 80.00 45.33 GDDL 79.995
IMA 45.160
2020-10-07 302.90 82.92 49.00
dtype: float64
INTERMEDIATE PYTHON FOR FINANCE
Standard deviation
AAD GDDL IMA df.std()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 1.337345
2020-10-05 300.00 80.00 45.33 GDDL 3.143548
IMA 3.740183
2020-10-07 302.90 82.92 49.00
dtype: float64
INTERMEDIATE PYTHON FOR FINANCE
Variance
AAD GDDL IMA df.var()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 1.788492
2020-10-05 300.00 80.00 45.33 GDDL 9.881892
IMA 13.988967
2020-10-07 302.90 82.92 49.00
dtype: float64
INTERMEDIATE PYTHON FOR FINANCE
Columns and rows
AAD GDDL IMA df.loc[:,'AAD'].max()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 302.9
2020-10-05 300.00 80.00 45.33
df.iloc[0].min()
2020-10-07 302.90 82.92 49.00
39.9
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Extending and
manipulating data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
PCE
Personal consumption expenditures (PCE)
PCE =
INTERMEDIATE PYTHON FOR FINANCE
PCE
Personal consumption expenditures (PCE)
PCE = PCDG
Durable goods
1 By cactus cowboy 2 Open Clipart, CC0, h ps://commons.wikimedia.org/w/index.php?curid=64953673
INTERMEDIATE PYTHON FOR FINANCE
PCE
Personal consumption expenditures (PCE)
PCE = PCDG + PCNDG
Non-durable goods
1By Smart Servier 2 h ps://smart.servier.com/, CC BY 3.0, h ps://commons.wikimedia.org/w/index.php?
curid=74765623
INTERMEDIATE PYTHON FOR FINANCE
PCE
Personal consumption expenditures (PCE)
PCE = PCDG + PCNDG + PCESV
Services
1By Clip Art by Vector Toons 2 Own work, CC BY-SA 4.0, h ps://commons.wikimedia.org/w/index.php?
curid=65937611
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
DATE PCDGA
1929-01-01 9.829
1930-01-01 7.661
1931-01-01 5.911
1932-01-01 3.959
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce['PCND'] = [[33.941,
30.503,
25.798000000000002,
20.169]
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce
DATE PCDG PCND
1929-01-01 9.829 33.941
1930-01-01 7.661 30.503
1931-01-01 5.911 25.798
1932-01-01 3.959 20.169
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce pcesv
DATE PCDG PCND PCESV
1929-01-01 9.829 33.941 0 33.613
1930-01-01 7.661 30.503 1 31.972
1931-01-01 5.911 25.798 2 28.963
1932-01-01 3.959 20.169 3 24.587
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce['PCESV'] = pcesv pce
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce['PCESV'] = pcesv pce
DATE PCDG PCND PCESV
1929-01-01 9.829 33.941 33.613
1930-01-01 7.661 30.503 31.972
1931-01-01 5.911 25.798 28.963
1932-01-01 3.959 20.169 24.587
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce['PCE'] = pce['PCDG'] + pce['PCND'] + pce['PCESV']
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce['PCE'] = pce['PCDG'] + pce['PCND'] + pce['PCESV']
DATE PCDG PCND PCESV PCE
1929-01-01 9.829 33.941 33.613 77.383
1930-01-01 7.661 30.503 31.972 70.136
1931-01-01 5.911 25.798 28.963 60.672
1932-01-01 3.959 20.169 24.587 48.715
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce.drop(columns=['PCDG', 'PCND', 'PCESV'],
axis=1,
inplace=True)
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing columns
pce.drop(columns=['PCDG', 'PCND', 'PCESV'],
axis=1,
inplace=True)
DATE PCE
1929-01-01 77.383
1930-01-01 70.136
1931-01-01 60.672
1932-01-01 48.715
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
new_row
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
new_row pce.append(new_row)
DATE PCE
1933-01-01 45.945
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
new_row pce.append(new_row)
DATE PCE DATE PCE
1933-01-01 45.945 1929-01-01 77.383
1930-01-01 70.136
1931-01-01 60.672
1932-01-01 48.715
1933-01-01 45.945
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
Adding multiple rows
new_rows = [ row1, row2, row3
]
for row in new_rows:
pce = pce.append(row)
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
Adding multiple rows DATE PCE
1929-01-01 77.383
for row in new_rows:
1930-01-01 70.136
pce = pce.append(row)
1931-01-01 60.672
1932-01-01 48.715
1933-01-01 45.945
1934-01-01 51.461
1935-01-01 55.933
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
pce.drop(['1934-01-01',
'1935-01-01',
'1936-01-01',
'1937-01-01',
'1938-01-01',
'1939-01-01'],
inplace=True)
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
pce.drop(['1934-01-01', DATE PCE
'1935-01-01', 1929-01-01 77.383
'1936-01-01', 1930-01-01 70.136
'1937-01-01',
1931-01-01 60.672
'1938-01-01',
1932-01-01 48.715
'1939-01-01'],
inplace=True) 1933-01-01 45.945
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
all_rows = [row1, row2, row3, pce]
pd.concat(all_rows)
INTERMEDIATE PYTHON FOR FINANCE
PCE - adding and removing rows
all_rows = [row1, row2, row3, pce] DATE PCE
1929-01-01 77.383
pd.concat(all_rows) 1930-01-01 70.136
1931-01-01 60.672
1932-01-01 48.715
1933-01-01 45.945
1934-01-01 51.461
1935-01-01 55.933
INTERMEDIATE PYTHON FOR FINANCE
PCE - operations on DataFrames
ec = 0.88
pce * ec
INTERMEDIATE PYTHON FOR FINANCE
PCE - operations on DataFrames
ec = 0.88
pce * ec
DATE PCE
1934-01-01 45.28568
1935-01-01 49.22104
1936-01-01 54.72544
1937-01-01 58.81832
INTERMEDIATE PYTHON FOR FINANCE
PCE - map
def convert_to_euro(x):
return x * 0.88
pce['EURO'] = pce['PCE'].map(convert_to_euro)
INTERMEDIATE PYTHON FOR FINANCE
PCE - map
def convert_to_euro(x):
return x * 0.88
pce['EURO'] = pce['PCE'].map(convert_to_euro)
DATE PCE EURO
1934-01-01 51.461 45.28568
1935-01-01 55.933 49.22104
1936-01-01 62.188 54.72544
INTERMEDIATE PYTHON FOR FINANCE
Gross Domestic Product (GDP)
GDP = PCE + GE + GPDI + NE
PCE: Personal Consumption Expenditures
GE: Government Expenditures
GPDI: Gross Private Domestic Investment
NE: Net Exports
INTERMEDIATE PYTHON FOR FINANCE
GDP - apply
map - Elements in a column (series)
apply - Across rows or columns
INTERMEDIATE PYTHON FOR FINANCE
GDP - apply
GCE GPDI NE PCE
DATE
1929-01-01 9.622 17.170 0.383 77.383
1930-01-01 10.273 11.428 0.323 70.136
1931-01-01 10.169 6.549 0.001 60.672
1932-01-01 8.946 1.819 0.043 48.715
INTERMEDIATE PYTHON FOR FINANCE
GDP - apply
gdp.apply(np.sum, axis=1)
INTERMEDIATE PYTHON FOR FINANCE
GDP - apply
gdp['GDP'] = gdp.apply(np.sum, axis=1)
GCE GPDI NE PCE GDP
DATE
1929-01-01 9.622 17.170 0.383 77.383 104.558
1930-01-01 10.273 11.428 0.323 70.136 92.160
1931-01-01 10.169 6.549 0.001 60.672 77.391
1932-01-01 8.946 1.819 0.043 48.715 59.523
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Peeking at data with
head, tail, and
describe
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Understanding your data
Data is loaded correctly
Understand the data's shape
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
Date
03/27/2020
03/26/2020
03/25/2020
03/24/2020
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
Price
Date
03/27/2020 247.74
03/26/2020 258.44
03/25/2020 245.52
03/24/2020 246.88
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
Price Volume
Date
03/27/2020 247.74 51054150
03/26/2020 258.44 63140170
03/25/2020 245.52 75900510
03/24/2020 246.88 71882770
INTERMEDIATE PYTHON FOR FINANCE
First look at data
aapl
Price Volume Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
03/24/2020 246.88 71882770 Up
INTERMEDIATE PYTHON FOR FINANCE
Head
aapl.head()
Price Volumne Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
03/24/2020 246.88 71882770 Up
03/23/2020 224.37 84188210 Down
INTERMEDIATE PYTHON FOR FINANCE
Head
aapl.head()
INTERMEDIATE PYTHON FOR FINANCE
Head
aapl.head(3)
```out
Price Volumne Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
INTERMEDIATE PYTHON FOR FINANCE
Tail
aapl.tail()
Price Volumne Trend
Date
03/05/2020 292.92 46893220 Down
03/04/2020 302.74 54794570 Up
03/03/2020 289.32 79868850 Down
03/02/2020 298.81 85349340 Up
02/28/2020 273.36 106721200 Down
INTERMEDIATE PYTHON FOR FINANCE
Describe
aapl.describe()
Price Volume
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
25% 246.670000 6.409497e+07
50% 258.440000 7.505841e+07
75% 285.340000 8.418821e+07
max 302.740000 1.067212e+08
INTERMEDIATE PYTHON FOR FINANCE
Include
aapl.describe(include='object')
Trend
count 21
unique 2
top Down
freq 14
INTERMEDIATE PYTHON FOR FINANCE
Include
aapl.describe(include='all')
Price Volumne Trend
count 21.000000 2.100000e+01 21
unique NaN NaN 2
top NaN NaN Down
freq NaN NaN 14
mean 263.715714 7.551468e+07 NaN
std 23.360598 1.669757e+07 NaN
min 224.370000 4.689322e+07 NaN
25% 246.670000 6.409497e+07 NaN
INTERMEDIATE PYTHON FOR FINANCE
aapl.describe(include=['float', 'object'])
Price Trend
count 21.000000 21
unique NaN 2
top NaN Down
freq NaN 14
mean 263.715714 NaN
std 23.360598 NaN
min 224.370000 NaN
25% 246.670000 NaN
50% 258.440000 NaN
75% 285.340000 NaN
max 302.740000 NaN
INTERMEDIATE PYTHON FOR FINANCE
Percentiles
aapl.describe(percentiles=[.1, .5, .9])
Price Volumne
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
10% 242.210000 5.479457e+07
50% 258.440000 7.505841e+07
90% 292.920000 1.004233e+08
max 302.740000 1.067212e+08
INTERMEDIATE PYTHON FOR FINANCE
Exclude
aapl.describe(exclude='float')
Volumne Trend
count 2.100000e+01 21
unique NaN 2
top NaN Down
freq NaN 14
mean 7.551468e+07 NaN
std 1.669757e+07 NaN
min 4.689322e+07 NaN
25% 6.409497e+07 NaN
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Filtering data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Introducing the data
prices.head()
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
prices.head()
Date Symbol High
0 2020-04-03 AAPL 245.70
1 2020-04-02 AAPL 245.15
2 2020-04-01 AAPL 248.72
3 2020-03-31 AAPL 262.49
4 2020-03-30 AAPL 255.52
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
prices.describe()
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
prices.describe()
High
count 378.000000
mean 881.593138
std 720.771922
min 227.490000
max 2185.950000
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
prices.describe(include='object')
Symbol
count 378
unique 3
top AMZN
freq 126
INTERMEDIATE PYTHON FOR FINANCE
Comparison operators
< <= > >= == !=
INTERMEDIATE PYTHON FOR FINANCE
Column comparison
prices.High > 2160
INTERMEDIATE PYTHON FOR FINANCE
Column comparison
prices.High > 2160
0 False
1 False
2 False
3 False
4 False
...
374 False
375 False
376 False
377 False
INTERMEDIATE PYTHON FOR FINANCE
Column comparison
prices.Symbol == 'AAPL'
INTERMEDIATE PYTHON FOR FINANCE
Column comparison
prices.Symbol == 'AAPL'
0 True
1 True
2 True
3 True
4 True
...
374 False
375 False
376 False
377 False
INTERMEDIATE PYTHON FOR FINANCE
Masking by symbol
mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]
INTERMEDIATE PYTHON FOR FINANCE
Masking by symbol
mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]
aapl.describe(include='object')
Symbol
count 126
unique 1
top AAPL
freq 126
INTERMEDIATE PYTHON FOR FINANCE
Masking by price
mask_high = prices.High > 2160
big_price = prices.loc[mask_high]
INTERMEDIATE PYTHON FOR FINANCE
Masking by price
big_price.describe()
High
count 6.000000
mean 2177.406567
std 7.999334
min 2166.070000
max 2185.95000
INTERMEDIATE PYTHON FOR FINANCE
Pandas Boolean operators
And &
Or |
Not ~
INTERMEDIATE PYTHON FOR FINANCE
Combining conditions
mask_prices = prices['Symbol'] != 'AMZN'
mask_date = historical_highs['Date'] > datetime(2020, 4, 1)
mask_amzn = mask_prices & mask_date
prices.loc[mask_amzn]
INTERMEDIATE PYTHON FOR FINANCE
Combining conditions
Date Symbol High
0 2020-04-03 AAPL 245.7000
1 2020-04-02 AAPL 245.1500
252 2020-04-03 TSLA 515.4900
253 2020-04-02 TSLA 494.2599
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Plotting data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Look at your data
INTERMEDIATE PYTHON FOR FINANCE
exxon.head()
INTERMEDIATE PYTHON FOR FINANCE
Introducing the data
exxon.head()
Date High Volume Month
0 2015-05-01 90.089996 198924100 May
1 2015-06-01 85.970001 238808600 Jun
2 2015-07-01 83.529999 274029000 Jul
3 2015-08-01 79.290001 387523600 Aug
4 2015-09-01 75.470001 316644500 Sep
INTERMEDIATE PYTHON FOR FINANCE
Matplotlib
my_dataframe.plot()
INTERMEDIATE PYTHON FOR FINANCE
Line plot
exxon.plot(x='Date',
y='High' )
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Rotate
exxon.plot(x='Date',
y='High',
rot=90 )
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Title
exxon.plot(x='Date',
y='High',
rot=90,
title='Exxon Stock Price')
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Index
exxon.set_index('Date', inplace=True)
exxon.plot(y='High',
rot=90,
title='Exxon Stock Price')
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Plot types
line density
bar area
barh pie
hist scatter
box hexbin
kde
INTERMEDIATE PYTHON FOR FINANCE
Bar
exxon2018.plot(x='Month',
y='Volume',
kind='bar',
title='Exxon 2018')
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Hist
exxon.plot(y='High',kind='hist')
INTERMEDIATE PYTHON FOR FINANCE
INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Wrapping up
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Kennedy Behrman
Data Engineer, Author, Founder
Chapter 1
Representing time Mapping data
datetime dict()
INTERMEDIATE PYTHON FOR FINANCE
Chapter 2
Comparison operators If statements
< <= > >=
if a < b:
print(a)
Equality operators
== != Loops
Boolean operators while a < b:
and or not a = a + 1
for a in c:
print(a)
INTERMEDIATE PYTHON FOR FINANCE
Chapter 3
Creating a DataFrame Aggregating, summarizing
DataFrame(data=data) stocks.mean()
pd.read_csv('/data.csv') stocks.median()
Accessing data Extending, manipulating
stocks.loc['a', 'Values'] pce['PCESV'] = pcesv
stocks.iloc[2:22, 12] gdp.apply(np.sum, axis=1)
INTERMEDIATE PYTHON FOR FINANCE
Chapter 4
Peeking Plo ing
aapl.head() exxon.plot(x='Date',
aapl.tail() y='High' )
aapl.describe()
Filtering
mask = prices.High > 216
prices.loc[mask]
INTERMEDIATE PYTHON FOR FINANCE
Congratulations!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Fundamental
financial concepts
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Course objectives
The Time Value of Money
Compound Interest
Discounting and Projecting Cash Flows
Making Rational Economic Decisions
Mortgage Structures
Interest and Equity
The Cost of Capital
Wealth Accumulation
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Calculating Return on Investment (% Gain)
vt2 − vt1
Return (% Gain) = =r
vt1
vt1 : The initial value of the investment at time
vt2 : The nal value of the investment at time
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Example
You invest $10,000 at time = year 1
At time = 2, your investment is worth $11,000
$11, 000 − $10, 000
∗ 100 = 10% annual return (gain) on y
$10, 000
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Calculating Return on Investment (Dollar Value)
vt2 = vt1 ∗ (1 + r)
vt1 : The initial value of the investment at time
vt2 : The nal value of the investment at time
r: The rate of return of the investment per period t
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Example
Annual rate of return = 10% = 10/100
You invest $10,000 at time = year 1
10
$10,000 ∗ (1 + ) = $11,000
100
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Cumulative growth (or depreciation)
r: The investment's expected rate of return (growth rate)
t: The lifespan of the investment (time)
vt0 : The initial value of the investment at time 0
Investment Value = vt0 ∗ (1 + r)t
If the growth rate r is negative, the investment's value will
depreciate (shrink) over time.
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Discount factors
1
df =
(1 + r)t
v = f v ∗ df
df : Discount factor
r: The rate of depreciation per period t
t: Time periods
v : Initial value of the investment
f v : Future value of the investment
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Compound interest
r t∗c
Investment Value = vt0 ∗ (1 + )
c
r: The investment's annual expected rate of return (growth
rate)
t: The lifespan of the investment
vt0 : The initial value of the investment at time 0
c: The number of compounding periods per year
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
The power of compounding returns
Consider a $1,000 investment with a 10% annual return,
compounded quarterly (every 3 months, 4 times per year):
0.10 1∗4
$1, 000 ∗ (1 + ) = $1, 103.81
4
Compare this with no compounding:
0.10 1∗1
$1, 000 ∗ (1 + ) = $1, 100.00
1
Notice the extra $3.81 due to the quarterly compounding?
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Exponential growth
Compounded Quarterly Over 30 Years:
0.10 30∗4
$1, 000 ∗ (1 + ) = $19, 358.15
4
Compounded Annually Over 30 Years:
0.10 30∗1
$1, 000 ∗ (1 + ) = $17, 449.40
1
Compounding quarterly generates an extra $1,908.75 over 30
years
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Present and future
value
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
The non-static value of money
Situation 1
Option A: $100 in your pocket today
Option B: $100 in your pocket tomorrow
Situation 2
Option A: $10,000 dollars in your pocket today
Option B: $10,500 dollars in your pocket one year from now
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Time is money
Your Options
A: Take the $10,000, stash it in the bank at 1% interest per
year, risk free
B: Invest the $10,000 in the stock market and earn an
average 8% per year
C: Wait 1 year, take the $10,500 instead
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Comparing future values
A: 10,000 * (1 + 0.01) = 10,100 future dollars
B: 10,000 * (1 + 0.08) = 10,800 future dollars
C: 10,500 future dollars
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Present value in Python
Calculate the present value of $100 received 3 years from now
at a 1.0% in ation rate.
import numpy as np
np.pv(rate=0.01, nper=3, pmt=0, fv=100)
-97.05
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Future value in Python
Calculate the future value of $100 invested for 3 years at a
5.0% average annual rate of return.
import numpy as np
np.fv(rate=0.05, nper=3, pmt=0, pv=-100)
115.76
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Net present value
and cash flows
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Cash flows
Cash ows are a series of gains or losses from an investment
over time.
Year Project 1 Cash Flows Project 2 Cash Flows
0 -$100 $100
1 $100 $100
2 $125 -$100
3 $150 $200
4 $175 $300
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Assume a 3% discount rate
Cash Present
Year Formula
Flows Value
pv(rate=0.03, nper=0, pmt=0,
0 -$100 -100
fv=-100)
pv(rate=0.03, nper=1, pmt=0,
1 $100 97.09
fv=100)
pv(rate=0.03, nper=2, pmt=0,
2 $125 117.82
fv=125)
pv(rate=0.03, nper=3, pmt=0,
3 $150 137.27
fv=150)
pv(rate=0.03, nper=4, pmt=0,
4 $175 155.49
fv=175)
Sum of all present values = 407.67
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Arrays in NumPy
Example:
import numpy as np
array_1 = np.array([100,200,300])
print(array_1*2)
[200 400 600]
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Net Present Value
Project 1
import numpy as np
np.npv(rate=0.03, values=np.array([-100, 100, 125, 150, 175]))
407.67
Project 2
import numpy as np
np.npv(rate=0.03, values=np.array([100, 100, -100, 200, 300]))
552.40
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
A tale of two project
proposals
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Common profitability analysis methods
Net Present Value (NPV)
Internal Rate of Return (IRR)
Equivalent Annual Annuity (EAA)
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Net Present Value (NPV)
NPV is equal to the sum of all discounted cash ows:
Ct
N P V = ∑Tt=1 (1+r)t
− C0
Ct : Cash ow C at time t
r: Discount rate
NPV is a simple cash ow valuation measure that does not allow
for the comparison of di erent sized projects or lengths.
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Internal Rate of Return (IRR)
The internal rate of return must be computed by solving for IRR
in the NPV equation when set equal to 0.
Ct
N P V = ∑Tt=1 (1+IRR)t
− C0 = 0
Ct : Cash ow C at time t
IRR: Internal Rate of Return
IRR can be used to compare projects of di erent sizes and
lengths but requires an algorithmic solution and does not
measure total value.
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
IRR in NumPy
You can use the NumPy function .irr(values) to compute the
internal rate of return of an array of values.
Example:
import numpy as np
project_1 = np.array([-100,150,200])
np.irr(project_1)
1.35
Project 1 has an IRR of 135%
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
The Weighted
Average Cost of
Capital (WACC)
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
What is WACC?
W ACC = FEquity ∗ CEquity + FDebt ∗ CDebt ∗ (1 − T R)
FEquity : The proportion (%) of a company's nancing via
equity
FDebt : The proportion (%) of a company's nancing via debt
CEquity : The cost of a company's equity
CDebt : The cost of a company's debt
T R : The corporate tax rate
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Proportion of financing
The proportion (%) of nancing can be calculated as follows:
MEquity
FEquity = MT otal
MDebt
FDebt = MT otal
MT otal = MDebt + MEquity
MDebt : Market value of a company's debt
MEquity : Market value of a company's equity
MT otal : Total value of a company's nancing
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Example:
Calculate the WACC of a company with a 12% cost of debt, 14%
cost of equity, 20% debt nancing and 80% equity nancing.
Assume a 35% e ective corporate tax rate.
percent_equity = 0.80
percent_debt = 0.20
cost_equity = 0.14
cost_debt = 0.12
tax_rate = 0.35
wacc = (percent_equity*cost_equity) + (percent_debt*cost_debt) *
(1 - tax_rate)
print(wacc)
0.1276
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Discounting using WACC
Example:
Calculate the NPV of a project that produces $100 in cash ow
every year for 5 years. Assume a WACC of 13%.
cf_project1 = np.repeat(100, 5)
npv_project1 = np.npv(0.13, cf_project1)
print(npv_project1)
397.45
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Comparing two
projects of different
life spans
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Different NPVs and IRRs
Year Project 1 Project 2 Project comparison
1 -$100 -$125
NPV IRR Length
2 $200 $100
#1 362.58 200% 3
3 $300 $100
#2 453.64 78.62% 8
4 N/A $100
Notice how you could
5 N/A $100
undertake multiple Project 1's
6 N/A $100 over 8 years? Are the NPVs fair
7 N/A $100 to compare?
8 N/A $100
Assume a 5% discount rate for
both projects
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Equivalent Annual Annuity (EAA) can be used to compare two
projects of di erent lifespans in present value terms.
Apply the EAA method to the previous two projects using the
computed NPVs * -1:
import numpy as np
npv_project1 = 362.58
npv_project2 = 453.64
np.pmt(rate=0.05, nper=3, pv=-1*npv_project1, fv=0)
133.14
np.pmt(rate=0.05, nper=8, pv=-1*npv_project2, fv=0)
70.18
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Mortgage basics
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Taking out a mortgage
A mortage is a loan that covers the remaining cost of a home
a er paying a percentage of the home value as a down
payment.
A typical down payment in the US is at least 20% of the
home value
A typical US mortgage loan is paid o over 30 years
Example:
$500,000 house
20% down ($100,000)
$400,000 remaining as a 30 year mortgage loan
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Converting from an annual rate
To convert from an annual rate Example:
to a periodic rate:
Convert a 12% annual interest
1
RP eriodic = (1 + RAnnual ) −
N rate to the equivalent monthly
rate.
R: Rate of Return (or Interest
1
Rate) (1 + 0.12) − 1 = 0.949% m
12
N: Number of Payment
Periods Per Year
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Mortgage loan payments
You can use the NumPy function .pmt(rate, nper, pv) to
compute the periodic mortgage loan payment.
Example:
Calculate the monthly mortgage payment of a $400,000 30
year loan at 3.8% interest:
import numpy as np
monthly_rate = ((1+0.038)**(1/12) - 1)
np.pmt(rate=monthly_rate, nper=12*30, pv=400000)
-1849.15
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Amortization,
interest and
principal
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Amortization
Principal (Equity): The amount PP: Principal Payment
of your mortgage paid that
MP: Mortgage Payment
counts towards the value of
IP: Interest Payment
the house itself
R: Mortgage Interest Rate
Interest Payment (IP P eriodic )
(Periodic)
= RM B ∗ RP eriodic RMB: Remaining Mortgage
Balance
Principal Payment (
P P P eriodic )
= M P P eriodic − IP P eriodic
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Accumulating values via for loops in Python
Example:
accumulator = 0
for i in range(3):
if i == 0:
accumulator = accumulator + 3
else:
accumulator = accumulator + 1
print(str(i)+": Loop value: "+str(accumulator))
0: Loop value: 3
1: Loop value: 4
2: Loop value: 5
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Home ownership,
equity and
forecasting
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Ownership
To calculate the percentage of the home you actually own
(home equity):
ECumulative,t
Percent Equity Ownedt = PDown + VHome
ECumulative,t = ∑Tt=1 PP rincipal,t
ECumulative,t : Cumulative home equity at time t
PP rincipal,t : Principal payment at time t
VHome : Total home value
PDown : Initial down payment
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Underwater mortgage
An underwater mortgage is when the remaining amount you
owe on your mortgage is actually higher than the value of the
house itself.
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Cumulative operations in NumPy
Cumulative Sum
import numpy as np
np.cumsum(np.array([1, 2, 3]))
array([1, 3, 6])
Cumulative Product
import numpy as np
np.cumprod(np.array([1, 2, 3]))
array([1, 2, 6])
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Forecasting cumulative growth
Example:
What is the cumulative value at each point in time of a $100
investment that grows by 3% in period 1, then 3% again in
period 2, and then by 5% in period 3?
import numpy as np
np.cumprod(1 + np.array([0.03, 0.03, 0.05]))
array([ 1.03, 1.0609, 1.113945])
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Budgeting project
proposal
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Project proposal
Your budget will have to take into account the following:
Rent
Food expenses
Entertainment expenses
Emergency fund
You will have to adjust for the following:
Taxes
Salary growth
In ation (for all expenses)
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Constant cumulative growth forecast
What is the cumulative growth of an investment that grows by
3% per year for 3 years?
import numpy as np
np.cumprod(1 + np.repeat(0.03, 3)) - 1
array([ 0.03, 0.0609, 0.0927])
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Forecasting values from growth rates
Compute the value at each point in time of an initial $100
investment that grows by 3% per year for 3 years?
import numpy as np
100*np.cumprod(1 + np.repeat(0.03, 3))
array([ 103, 106.09, 109.27])
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's build it!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Net worth and
valuation in your
personal financial
life
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Net Worth
Net Worth = Assets - Liabilities = Equity
This is the basis of modern accounting
A point in time measurement
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Valuation
NPV(discount rate, cash ows)
Take into account future cash ows, salary and expenses
Adjust for in ation
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Reaching financial goals
Saving will only earn you a low rate of return
In ation will destroy most of your savings over time if you let
it
The best way to combat in ation is to invest
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
The basics of investing
Investing is a risk-reward tradeo
Diversify
Plan for the worst
Invest as early as possible
Invest continuously over time
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's simulate it!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
The power of time
and compound
interest
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
The power of time
Goal: Save $1.0 million over 40 years. Assume an average 7%
rate of return per year.
import numpy as np
np.pmt(rate=((1+0.07)**1/12 - 1), nper=12*40, pv=0, fv=1000000)
-404.61
What if your investments only returned 5% on average?
import numpy as np
np.pmt(rate=((1+0.05)**1/12 - 1), nper=12*40, pv=0, fv=1000000)
-674.53
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
The power of time
Goal: Save $1.0 million over 25 years. Assume an average 7%
rate of return per year.
import numpy as np
np.pmt(rate=((1+0.07)**1/12 - 1), nper=12*25, pv=0, fv=1000000)
-1277.07
What if your investments only returned 5% on average?
import numpy as np
np.pmt(rate=((1+0.05)**1/12 - 1), nper=12*40, pv=0, fv=1000000)
-1707.26
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Inflation adjusting
Assume an average rate of in ation of 3% per year
import numpy as np
np.fv(rate=-0.03, nper=25, pv=-1000000, pmt=0)
466974.70
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Financial concepts
in your daily life
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Dakota Wixom
Quantitative Finance Analyst
Congratulations
The Time Value of Money
Compound Interest
Discounting and Projecting Cash Flows
Making Rational Economic Decisions
Mortgage Structures
Interest and Equity
The Cost of Capital
Wealth Accumulation
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Congratulations!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
How to use dates &
times with pandas
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Date & time series functionality
At the root: data types for date & time information
Objects for points in time and periods
A ributes & methods re ect time-related details
Sequences of dates & periods:
Series or DataFrame columns
Index: convert object into Time Series
Many Series/DataFrame methods rely on time information in
the index to provide time-series functionality
MANIPULATING TIME SERIES DATA IN PYTHON
Basic building block: pd.Timestamp
import pandas as pd # assumed imported going forward
from datetime import datetime # To manually create dates
time_stamp = pd.Timestamp(datetime(2017, 1, 1))
pd.Timestamp('2017-01-01') == time_stamp
True # Understands dates as strings
time_stamp # type: pandas.tslib.Timestamp
Timestamp('2017-01-01 00:00:00')
MANIPULATING TIME SERIES DATA IN PYTHON
Basic building block: pd.Timestamp
Timestamp object has many a ributes to store time-speci c
information
time_stamp.year
2017
time_stamp.day_name()
'Sunday'
MANIPULATING TIME SERIES DATA IN PYTHON
More building blocks: pd.Period & freq
period = pd.Period('2017-01')
period # default: month-end
Period object has freq
Period('2017-01', 'M') a ribute to store frequency
info
period.asfreq('D') # convert to daily
Period('2017-01-31', 'D')
Convert pd.Period() to
period.to_timestamp().to_period('M') pd.Timestamp() and back
Period('2017-01', 'M')
MANIPULATING TIME SERIES DATA IN PYTHON
More building blocks: pd.Period & freq
period + 2 Frequency info enables
basic date arithmetic
Period('2017-03', 'M')
pd.Timestamp('2017-01-31', 'M') + 1
Timestamp('2017-02-28 00:00:00', freq='M')
MANIPULATING TIME SERIES DATA IN PYTHON
Sequences of dates & times
pd.date_range : start , end , periods , freq
index = pd.date_range(start='2017-1-1', periods=12, freq='M')
index
DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', ...,
'2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'],
dtype='datetime64[ns]', freq='M')
pd.DateTimeIndex : sequence of Timestamp objects with
frequency info
MANIPULATING TIME SERIES DATA IN PYTHON
Sequences of dates & times
index[0]
Timestamp('2017-01-31 00:00:00', freq='M')
index.to_period()
PeriodIndex(['2017-01', '2017-02', '2017-03', '2017-04', ...,
'2017-11', '2017-12'], dtype='period[M]', freq='M')
MANIPULATING TIME SERIES DATA IN PYTHON
Create a time series: pd.DateTimeIndex
pd.DataFrame({'data': index}).info()
RangeIndex: 12 entries, 0 to 11
Data columns (total 1 columns):
data 12 non-null datetime64[ns]
dtypes: datetime64[ns](1)
MANIPULATING TIME SERIES DATA IN PYTHON
Create a time series: pd.DateTimeIndex
np.random.random :
Random numbers: [0,1]
12 rows, 2 columns
data = np.random.random((size=12,2))
pd.DataFrame(data=data, index=index).info()
DatetimeIndex: 12 entries, 2017-01-31 to 2017-12-31
Freq: M
Data columns (total 2 columns):
0 12 non-null float64
1 12 non-null float64
dtypes: float64(2)
MANIPULATING TIME SERIES DATA IN PYTHON
Frequency aliases & time info
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Indexing &
resampling time
series
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Time series transformation
Basic time series transformations include:
Parsing string dates and convert to datetime64
Selecting & slicing for speci c subperiods
Se ing & changing DateTimeIndex frequency
Upsampling vs Downsampling
MANIPULATING TIME SERIES DATA IN PYTHON
Getting GOOG stock prices
google = pd.read_csv('google.csv') # import pandas as pd
google.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504 entries, 0 to 503
Data columns (total 2 columns):
date 504 non-null object
price 504 non-null float64
dtypes: float64(1), object(1)
google.head()
date price
0 2015-01-02 524.81
1 2015-01-05 513.87
2 2015-01-06 501.96
3 2015-01-07 501.10
4 2015-01-08 502.68
MANIPULATING TIME SERIES DATA IN PYTHON
Converting string dates to datetime64
pd.to_datetime() :
Parse date string
Convert to datetime64
google.date = pd.to_datetime(google.date)
google.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504 entries, 0 to 503
Data columns (total 2 columns):
date 504 non-null datetime64[ns]
price 504 non-null float64
dtypes: datetime64[ns](1), float64(1)
MANIPULATING TIME SERIES DATA IN PYTHON
Converting string dates to datetime64
.set_index() :
Date into index
inplace :
don't create copy
google.set_index('date', inplace=True)
google.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)
MANIPULATING TIME SERIES DATA IN PYTHON
Plotting the Google stock time series
google.price.plot(title='Google Stock Price')
plt.tight_layout(); plt.show()
MANIPULATING TIME SERIES DATA IN PYTHON
Partial string indexing
Selecting/indexing using strings that parse to dates
google['2015'].info() # Pass string for part of date
DatetimeIndex: 252 entries, 2015-01-02 to 2015-12-31
Data columns (total 1 columns):
price 252 non-null float64
dtypes: float64(1)
google['2015-3': '2016-2'].info() # Slice includes last month
DatetimeIndex: 252 entries, 2015-03-02 to 2016-02-29
Data columns (total 1 columns):
price 252 non-null float64
dtypes: float64(1)
memory usage: 3.9 KB
MANIPULATING TIME SERIES DATA IN PYTHON
Partial string indexing
google.loc['2016-6-1', 'price'] # Use full date with .loc[]
734.15
MANIPULATING TIME SERIES DATA IN PYTHON
.asfreq(): set frequency
.asfreq('D') :
Convert DateTimeIndex to calendar day frequency
google.asfreq('D').info() # set calendar day frequency
DatetimeIndex: 729 entries, 2015-01-02 to 2016-12-30
Freq: D
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)
MANIPULATING TIME SERIES DATA IN PYTHON
.asfreq(): set frequency
Upsampling:
Higher frequency implies new dates => missing data
google.asfreq('D').head()
price
date
2015-01-02 524.81
2015-01-03 NaN
2015-01-04 NaN
2015-01-05 513.87
2015-01-06 501.96
MANIPULATING TIME SERIES DATA IN PYTHON
.asfreq(): reset frequency
.asfreq('B') :
Convert DateTimeIndex to business day frequency
google = google.asfreq('B') # Change to calendar day frequency
google.info()
DatetimeIndex: 521 entries, 2015-01-02 to 2016-12-30
Freq: B
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)
MANIPULATING TIME SERIES DATA IN PYTHON
.asfreq(): reset frequency
google[google.price.isnull()] # Select missing 'price' values
price
date
2015-01-19 NaN
2015-02-16 NaN
...
2016-11-24 NaN
2016-12-26 NaN
Business days that were not trading days
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Lags, changes, and
returns for stock
price series
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Basic time series calculations
Typical Time Series manipulations include:
Shi or lag values back or forward back in time
Get the di erence in value for a given time period
Compute the percent change over any number of periods
pandas built-in methods rely on pd.DateTimeIndex
MANIPULATING TIME SERIES DATA IN PYTHON
Getting GOOG stock prices
Let pd.read_csv() do the parsing for you!
google = pd.read_csv('google.csv', parse_dates=['date'], index_col='date')
google.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)
MANIPULATING TIME SERIES DATA IN PYTHON
Getting GOOG stock prices
google.head()
price
date
2015-01-02 524.81
2015-01-05 513.87
2015-01-06 501.96
2015-01-07 501.10
2015-01-08 502.68
MANIPULATING TIME SERIES DATA IN PYTHON
.shift(): Moving data between past & future
.shift() :
defaults to periods=1
1 period into future
google['shifted'] = google.price.shift() # default: periods=1
google.head(3)
price shifted
date
2015-01-02 542.81 NaN
2015-01-05 513.87 542.81
2015-01-06 501.96 513.87
MANIPULATING TIME SERIES DATA IN PYTHON
.shift(): Moving data between past & future
.shift(periods=-1) :
lagged data
1 period back in time
google['lagged'] = google.price.shift(periods=-1)
google[['price', 'lagged', 'shifted']].tail(3)
price lagged shifted
date
2016-12-28 785.05 782.79 791.55
2016-12-29 782.79 771.82 785.05
2016-12-30 771.82 NaN 782.79
MANIPULATING TIME SERIES DATA IN PYTHON
Calculate one-period percent change
xt / xt−1
google['change'] = google.price.div(google.shifted)
google[['price', 'shifted', 'change']].head(3)
price shifted change
Date
2017-01-03 786.14 NaN NaN
2017-01-04 786.90 786.14 1.000967
2017-01-05 794.02 786.90 1.009048
MANIPULATING TIME SERIES DATA IN PYTHON
Calculate one-period percent change
google['return'] = google.change.sub(1).mul(100)
google[['price', 'shifted', 'change', 'return']].head(3)
price shifted change return
date
2015-01-02 524.81 NaN NaN NaN
2015-01-05 513.87 524.81 0.98 -2.08
2015-01-06 501.96 513.87 0.98 -2.32
MANIPULATING TIME SERIES DATA IN PYTHON
.diff(): built-in time-series change
Di erence in value for two adjacent periods
xt − xt−1
google['diff'] = google.price.diff()
google[['price', 'diff']].head(3)
price diff
date
2015-01-02 524.81 NaN
2015-01-05 513.87 -10.94
2015-01-06 501.96 -11.91
MANIPULATING TIME SERIES DATA IN PYTHON
.pct_change(): built-in time-series % change
Percent change for two adjacent periods
xt
xt−1
google['pct_change'] = google.price.pct_change().mul(100)
google[['price', 'return', 'pct_change']].head(3)
price return pct_change
date
2015-01-02 524.81 NaN NaN
2015-01-05 513.87 -2.08 -2.08
2015-01-06 501.96 -2.32 -2.32
MANIPULATING TIME SERIES DATA IN PYTHON
Looking ahead: Get multi-period returns
google['return_3d'] = google.price.pct_change(periods=3).mul(100)
google[['price', 'return_3d']].head()
price return_3d
date
2015-01-02 524.81 NaN
2015-01-05 513.87 NaN
2015-01-06 501.96 NaN
2015-01-07 501.10 -4.517825
2015-01-08 502.68 -2.177594
Percent change for two periods, 3 trading days apart
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Compare time series
growth rates
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Comparing stock performance
Stock price series: hard to compare at di erent levels
Simple solution: normalize price series to start at 100
Divide all prices by rst in series, multiply by 100
Same starting point
All prices relative to starting point
Di erence to starting point in percentage points
MANIPULATING TIME SERIES DATA IN PYTHON
Normalizing a single series (1)
google = pd.read_csv('google.csv', parse_dates=['date'], index_col='date')
google.head(3)
price
date
2010-01-04 313.06
2010-01-05 311.68
2010-01-06 303.83
first_price = google.price.iloc[0] # int-based selection
first_price
313.06
first_price == google.loc['2010-01-04', 'price']
True
MANIPULATING TIME SERIES DATA IN PYTHON
Normalizing a single series (2)
normalized = google.price.div(first_price).mul(100)
normalized.plot(title='Google Normalized Series')
MANIPULATING TIME SERIES DATA IN PYTHON
Normalizing multiple series (1)
prices = pd.read_csv('stock_prices.csv',
parse_dates=['date'],
index_col='date')
prices.info()
DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30
Data columns (total 3 columns):
AAPL 1761 non-null float64
GOOG 1761 non-null float64
YHOO 1761 non-null float64
dtypes: float64(3)
prices.head(2)
AAPL GOOG YHOO
Date
2010-01-04 30.57 313.06 17.10
2010-01-05 30.63 311.68 17.23
MANIPULATING TIME SERIES DATA IN PYTHON
Normalizing multiple series (2)
prices.iloc[0]
AAPL 30.57
GOOG 313.06
YHOO 17.10
Name: 2010-01-04 00:00:00, dtype: float64
normalized = prices.div(prices.iloc[0])
normalized.head(3)
AAPL GOOG YHOO
Date
2010-01-04 1.000000 1.000000 1.000000
2010-01-05 1.001963 0.995592 1.007602
2010-01-06 0.985934 0.970517 1.004094
.div() : automatic alignment of Series index & DataFrame
columns
MANIPULATING TIME SERIES DATA IN PYTHON
Comparing with a benchmark (1)
index = pd.read_csv('benchmark.csv', parse_dates=['date'], index_col='date')
index.info()
DatetimeIndex: 1826 entries, 2010-01-01 to 2016-12-30
Data columns (total 1 columns):
SP500 1762 non-null float64
dtypes: float64(1)
prices = pd.concat([prices, index], axis=1).dropna()
prices.info()
DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30
Data columns (total 4 columns):
AAPL 1761 non-null float64
GOOG 1761 non-null float64
YHOO 1761 non-null float64
SP500 1761 non-null float64
dtypes: float64(4)
MANIPULATING TIME SERIES DATA IN PYTHON
Comparing with a benchmark (2)
prices.head(1)
AAPL GOOG YHOO SP500
2010-01-04 30.57 313.06 17.10 1132.99
normalized = prices.div(prices.iloc[0]).mul(100)
normalized.plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Plotting performance difference
diff = normalized[tickers].sub(normalized['SP500'], axis=0)
GOOG YHOO AAPL
2010-01-04 0.000000 0.000000 0.000000
2010-01-05 -0.752375 0.448669 -0.115294
2010-01-06 -3.314604 0.043069 -1.772895
.sub(..., axis=0) : Subtract a Series from each DataFrame
column by aligning indexes
MANIPULATING TIME SERIES DATA IN PYTHON
Plotting performance difference
diff.plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Changing the time
series frequency:
resampling
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Changing the frequency: resampling
DateTimeIndex : set & change freq using .asfreq()
But frequency conversion a ects the data
Upsampling: ll or interpolate missing data
Downsampling: aggregate existing data
pandas API:
.asfreq() , .reindex()
.resample() + transformation method
MANIPULATING TIME SERIES DATA IN PYTHON
Getting started: quarterly data
dates = pd.date_range(start='2016', periods=4, freq='Q')
data = range(1, 5)
quarterly = pd.Series(data=data, index=dates)
quarterly
2016-03-31 1
2016-06-30 2
2016-09-30 3
2016-12-31 4
Freq: Q-DEC, dtype: int64 # Default: year-end quarters
MANIPULATING TIME SERIES DATA IN PYTHON
Upsampling: quarter => month
monthly = quarterly.asfreq('M') # to month-end frequency
2016-03-31 1.0
2016-04-30 NaN
2016-05-31 NaN
2016-06-30 2.0
2016-07-31 NaN
2016-08-31 NaN
2016-09-30 3.0
2016-10-31 NaN
2016-11-30 NaN
2016-12-31 4.0
Freq: M, dtype: float64
Upsampling creates missing values
monthly = monthly.to_frame('baseline') # to DataFrame
MANIPULATING TIME SERIES DATA IN PYTHON
Upsampling: fill methods
monthly['ffill'] = quarterly.asfreq('M', method='ffill')
monthly['bfill'] = quarterly.asfreq('M', method='bfill')
monthly['value'] = quarterly.asfreq('M', fill_value=0)
MANIPULATING TIME SERIES DATA IN PYTHON
Upsampling: fill methods
bfill : back ll
ffill : forward ll
baseline ffill bfill value
2016-03-31 1.0 1 1 1
2016-04-30 NaN 1 2 0
2016-05-31 NaN 1 2 0
2016-06-30 2.0 2 2 2
2016-07-31 NaN 2 3 0
2016-08-31 NaN 2 3 0
2016-09-30 3.0 3 3 3
2016-10-31 NaN 3 4 0
2016-11-30 NaN 3 4 0
2016-12-31 4.0 4 4 4
MANIPULATING TIME SERIES DATA IN PYTHON
Add missing months: .reindex()
dates = pd.date_range(start='2016', quarterly.reindex(dates)
periods=12,
freq='M')
2016-01-31 NaN
2016-02-29 NaN
DatetimeIndex(['2016-01-31', 2016-03-31 1.0
'2016-02-29', 2016-04-30 NaN
..., 2016-05-31 NaN
'2016-11-30', 2016-06-30 2.0
'2016-12-31'], 2016-07-31 NaN
dtype='datetime64[ns]', freq='M') 2016-08-31 NaN
2016-09-30 3.0
2016-10-31 NaN
.reindex() : 2016-11-30 NaN
conform DataFrame to 2016-12-31 4.0
new index
same lling logic as
.asfreq()
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Upsampling &
interpolation with
.resample()
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Frequency conversion & transformation methods
.resample() : similar to .groupby()
Groups data within resampling period and applies one or
several methods to each group
New date determined by o set - start, end, etc
Upsampling: ll from existing or interpolate values
Downsampling: apply aggregation to existing data
MANIPULATING TIME SERIES DATA IN PYTHON
Getting started: monthly unemployment rate
unrate = pd.read_csv('unrate.csv', parse_dates['Date'], index_col='Date')
unrate.info()
DatetimeIndex: 208 entries, 2000-01-01 to 2017-04-01
Data columns (total 1 columns):
UNRATE 208 non-null float64 # no frequency information
dtypes: float64(1)
unrate.head()
UNRATE
DATE
2000-01-01 4.0
2000-02-01 4.1
2000-03-01 4.0
2000-04-01 3.8
2000-05-01 4.0
Reporting date: 1st day of month
MANIPULATING TIME SERIES DATA IN PYTHON
Resampling Period & Frequency Offsets
Resample creates new date for frequency o set
Several alternatives to calendar month end
Frequency Alias Sample Date
Calendar Month End M 2017-04-30
Calendar Month Start MS 2017-04-01
Business Month End BM 2017-04-28
Business Month Start BMS 2017-04-03
MANIPULATING TIME SERIES DATA IN PYTHON
Resampling logic
MANIPULATING TIME SERIES DATA IN PYTHON
Resampling logic
MANIPULATING TIME SERIES DATA IN PYTHON
Assign frequency with .resample()
unrate.asfreq('MS').info()
DatetimeIndex: 208 entries, 2000-01-01 to 2017-04-01
Freq: MS
Data columns (total 1 columns):
UNRATE 208 non-null float64
dtypes: float64(1)
unrate.resample('MS') # creates Resampler object
DatetimeIndexResampler [freq=<MonthBegin>, axis=0, closed=left,
label=left, convention=start, base=0]
MANIPULATING TIME SERIES DATA IN PYTHON
Assign frequency with .resample()
unrate.asfreq('MS').equals(unrate.resample('MS').asfreq())
True
.resample() : returns data only when calling another method
MANIPULATING TIME SERIES DATA IN PYTHON
Quarterly real GDP growth
gdp = pd.read_csv('gdp.csv')
gdp.info()
DatetimeIndex: 69 entries, 2000-01-01 to 2017-01-01
Data columns (total 1 columns):
gpd 69 non-null float64 # no frequency info
dtypes: float64(1)
gdp.head(2)
gpd
DATE
2000-01-01 1.2
2000-04-01 7.8
MANIPULATING TIME SERIES DATA IN PYTHON
Interpolate monthly real GDP growth
gdp_1 = gdp.resample('MS').ffill().add_suffix('_ffill')
gpd_ffill
DATE
2000-01-01 1.2
2000-02-01 1.2
2000-03-01 1.2
2000-04-01 7.8
MANIPULATING TIME SERIES DATA IN PYTHON
Interpolate monthly real GDP growth
gdp_2 = gdp.resample('MS').interpolate().add_suffix('_inter')
gpd_inter
DATE
2000-01-01 1.200000
2000-02-01 3.400000
2000-03-01 5.600000
2000-04-01 7.800000
.interpolate() : nds points on straight line between
existing data
MANIPULATING TIME SERIES DATA IN PYTHON
Concatenating two DataFrames
df1 = pd.DataFrame([1, 2, 3], columns=['df1'])
df2 = pd.DataFrame([4, 5, 6], columns=['df2'])
pd.concat([df1, df2])
df1 df2
0 1.0 NaN
1 2.0 NaN
2 3.0 NaN
0 NaN 4.0
1 NaN 5.0
2 NaN 6.0
MANIPULATING TIME SERIES DATA IN PYTHON
Concatenating two DataFrames
pd.concat([df1, df2], axis=1)
df1 df2
0 1 4
1 2 5
2 3 6
axis=1 : concatenate horizontally
MANIPULATING TIME SERIES DATA IN PYTHON
Plot interpolated real GDP growth
pd.concat([gdp_1, gdp_2], axis=1).loc['2015':].plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Combine GDP growth & unemployment
pd.concat([unrate, gdp_inter], axis=1).plot();
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Downsampling &
aggregation
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Downsampling & aggregation methods
So far: upsampling, ll logic & interpolation
Now: downsampling
hour to day
day to month, etc
How to represent the existing values at the new date?
Mean, median, last value?
MANIPULATING TIME SERIES DATA IN PYTHON
Air quality: daily ozone levels
ozone = pd.read_csv('ozone.csv',
parse_dates=['date'],
index_col='date')
ozone.info()
DatetimeIndex: 6291 entries, 2000-01-01 to 2017-03-31
Data columns (total 1 columns):
Ozone 6167 non-null float64
dtypes: float64(1)
ozone = ozone.resample('D').asfreq()
ozone.info()
DatetimeIndex: 6300 entries, 1998-01-05 to 2017-03-31
Freq: D
Data columns (total 1 columns):
Ozone 6167 non-null float64
dtypes: float64(1)
MANIPULATING TIME SERIES DATA IN PYTHON
Creating monthly ozone data
ozone.resample('M').mean().head() ozone.resample('M').median().head()
Ozone Ozone
date date
2000-01-31 0.010443 2000-01-31 0.009486
2000-02-29 0.011817 2000-02-29 0.010726
2000-03-31 0.016810 2000-03-31 0.017004
2000-04-30 0.019413 2000-04-30 0.019866
2000-05-31 0.026535 2000-05-31 0.026018
.resample().mean() : Monthly
average, assigned to end of
calendar month
MANIPULATING TIME SERIES DATA IN PYTHON
Creating monthly ozone data
ozone.resample('M').agg(['mean', 'std']).head()
Ozone
mean std
date
2000-01-31 0.010443 0.004755
2000-02-29 0.011817 0.004072
2000-03-31 0.016810 0.004977
2000-04-30 0.019413 0.006574
2000-05-31 0.026535 0.008409
.resample().agg() : List of aggregation functions like
groupby
MANIPULATING TIME SERIES DATA IN PYTHON
Plotting resampled ozone data
ozone = ozone.loc['2016':]
ax = ozone.plot()
monthly = ozone.resample('M').mean()
monthly.add_suffix('_monthly').plot(ax=ax)
MANIPULATING TIME SERIES DATA IN PYTHON
Resampling multiple time series
data = pd.read_csv('ozone_pm25.csv',
parse_dates=['date'],
index_col='date')
data = data.resample('D').asfreq()
data.info()
DatetimeIndex: 6300 entries, 2000-01-01 to 2017-03-31
Freq: D
Data columns (total 2 columns):
Ozone 6167 non-null float64
PM25 6167 non-null float64
dtypes: float64(2)
MANIPULATING TIME SERIES DATA IN PYTHON
Resampling multiple time series
data = data.resample('BM').mean()
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 207 entries, 2000-01-31 to 2017-03-31
Freq: BM
Data columns (total 2 columns):
ozone 207 non-null float64
pm25 207 non-null float64
dtypes: float64(2)
MANIPULATING TIME SERIES DATA IN PYTHON
Resampling multiple time series
df.resample('M').first().head(4)
Ozone PM25
date
2000-01-31 0.005545 20.800000
2000-02-29 0.016139 6.500000
2000-03-31 0.017004 8.493333
2000-04-30 0.031354 6.889474
df.resample('MS').first().head()
Ozone PM25
date
2000-01-01 0.004032 37.320000
2000-02-01 0.010583 24.800000
2000-03-01 0.007418 11.106667
2000-04-01 0.017631 11.700000
2000-05-01 0.022628 9.700000
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Rolling window
functions with
pandas
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Window functions in pandas
Windows identify sub periods of your time series
Calculate metrics for sub periods inside the window
Create a new time series of metrics
Two types of windows:
Rolling: same size, sliding (this video)
Expanding: contain all prior values (next video)
MANIPULATING TIME SERIES DATA IN PYTHON
Calculating a rolling average
data = pd.read_csv('google.csv', parse_dates=['date'], index_col='date')
DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30
Data columns (total 1 columns):
price 1761 non-null float64
dtypes: float64(1)
MANIPULATING TIME SERIES DATA IN PYTHON
Calculating a rolling average
# Integer-based window size
data.rolling(window=30).mean() # fixed # observations
DatetimeIndex: 1761 entries, 2010-01-04 to 2017-05-24
Data columns (total 1 columns):
price 1732 non-null float64
dtypes: float64(1)
window=30 : # business days
min_periods : choose value < 30 to get results for rst days
MANIPULATING TIME SERIES DATA IN PYTHON
Calculating a rolling average
# Offset-based window size
data.rolling(window='30D').mean() # fixed period length
DatetimeIndex: 1761 entries, 2010-01-04 to 2017-05-24
Data columns (total 1 columns):
price 1761 non-null float64
dtypes: float64(1)
30D : # calendar days
MANIPULATING TIME SERIES DATA IN PYTHON
90 day rolling mean
r90 = data.rolling(window='90D').mean()
google.join(r90.add_suffix('_mean_90')).plot()
MANIPULATING TIME SERIES DATA IN PYTHON
90 & 360 day rolling means
data['mean90'] = r90
r360 = data['price'].rolling(window='360D'.mean()
data['mean360'] = r360; data.plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Multiple rolling metrics (1)
r = data.price.rolling('90D').agg(['mean', 'std'])
r.plot(subplots = True)
MANIPULATING TIME SERIES DATA IN PYTHON
Multiple rolling metrics (2)
rolling = data.google.rolling('360D')
q10 = rolling.quantile(0.1).to_frame('q10')
median = rolling.median().to_frame('median')
q90 = rolling.quantile(0.9).to_frame('q90')
pd.concat([q10, median, q90], axis=1).plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Expanding window
functions with
pandas
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Expanding windows in pandas
From rolling to expanding windows
Calculate metrics for periods up to current date
New time series re ects all historical values
Useful for running rate of return, running min/max
Two options with pandas:
.expanding() - just like .rolling()
.cumsum() , .cumprod() , cummin() / max()
MANIPULATING TIME SERIES DATA IN PYTHON
The basic idea
df = pd.DataFrame({'data': range(5)})
df['expanding sum'] = df.data.expanding().sum()
df['cumulative sum'] = df.data.cumsum()
df
data expanding sum cumulative sum
0 0 0.0 0
1 1 1.0 1
2 2 3.0 3
3 3 6.0 6
4 4 10.0 10
MANIPULATING TIME SERIES DATA IN PYTHON
Get data for the S&P 500
data = pd.read_csv('sp500.csv', parse_dates=['date'], index_col='date')
DatetimeIndex: 2519 entries, 2007-05-24 to 2017-05-24
Data columns (total 1 columns):
SP500 2519 non-null float64
MANIPULATING TIME SERIES DATA IN PYTHON
How to calculate a running return
Single period return rt : current price over last price minus 1:
Pt
rt = −1
Pt−1
Multi-period return: product of (1 + rt ) for all periods,
minus 1:
RT = (1 + r1 )(1 + r2 )...(1 + rT ) − 1
For the period return: .pct_change()
For basic math .add() , .sub() , .mul() , .div()
For cumulative product: .cumprod()
MANIPULATING TIME SERIES DATA IN PYTHON
Running rate of return in practice
pr = data.SP500.pct_change() # period return
pr_plus_one = pr.add(1)
cumulative_return = pr_plus_one.cumprod().sub(1)
cumulative_return.mul(100).plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Getting the running min & max
data['running_min'] = data.SP500.expanding().min()
data['running_max'] = data.SP500.expanding().max()
data.plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Rolling annual rate of return
def multi_period_return(period_returns):
return np.prod(period_returns + 1) - 1
pr = data.SP500.pct_change() # period return
r = pr.rolling('360D').apply(multi_period_return)
data['Rolling 1yr Return'] = r.mul(100)
data.plot(subplots=True)
MANIPULATING TIME SERIES DATA IN PYTHON
Rolling annual rate of return
data['Rolling 1yr Return'] = r.mul(100)
data.plot(subplots=True)
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Case study: S&P500
price simulation
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Random walks & simulations
Daily stock returns are hard to predict
Models o en assume they are random in nature
Numpy allows you to generate random numbers
From random returns to prices: use .cumprod()
Two examples:
Generate random returns
Randomly selected actual SP500 returns
MANIPULATING TIME SERIES DATA IN PYTHON
Generate random numbers
from numpy.random import normal, seed
from scipy.stats import norm
seed(42)
random_returns = normal(loc=0, scale=0.01, size=1000)
sns.distplot(random_returns, fit=norm, kde=False)
MANIPULATING TIME SERIES DATA IN PYTHON
Create a random price path
return_series = pd.Series(random_returns)
random_prices = return_series.add(1).cumprod().sub(1)
random_prices.mul(100).plot()
MANIPULATING TIME SERIES DATA IN PYTHON
S&P 500 prices & returns
data = pd.read_csv('sp500.csv', parse_dates=['date'], index_col='date')
data['returns'] = data.SP500.pct_change()
data.plot(subplots=True)
MANIPULATING TIME SERIES DATA IN PYTHON
S&P return distribution
sns.distplot(data.returns.dropna().mul(100), fit=norm)
MANIPULATING TIME SERIES DATA IN PYTHON
Generate random S&P 500 returns
from numpy.random import choice
sample = data.returns.dropna()
n_obs = data.returns.count()
random_walk = choice(sample, size=n_obs)
random_walk = pd.Series(random_walk, index=sample.index)
random_walk.head()
DATE
2007-05-29 -0.008357
2007-05-30 0.003702
2007-05-31 -0.013990
2007-06-01 0.008096
2007-06-04 0.013120
MANIPULATING TIME SERIES DATA IN PYTHON
Random S&P 500 prices (1)
start = data.SP500.first('D')
DATE
2007-05-25 1515.73
Name: SP500, dtype: float64
sp500_random = start.append(random_walk.add(1))
sp500_random.head())
DATE
2007-05-25 1515.730000
2007-05-29 0.998290
2007-05-30 0.995190
2007-05-31 0.997787
2007-06-01 0.983853
dtype: float64
MANIPULATING TIME SERIES DATA IN PYTHON
Random S&P 500 prices (2)
data['SP500_random'] = sp500_random.cumprod()
data[['SP500', 'SP500_random']].plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Relationships
between time series:
correlation
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Correlation & relations between series
So far, focus on characteristics of individual variables
Now: characteristic of relations between variables
Correlation: measures linear relationships
Financial markets: important for prediction and risk
management
pandas & seaborn have tools to compute & visualize
MANIPULATING TIME SERIES DATA IN PYTHON
Correlation & linear relationships
Correlation coe cient: how similar is the pairwise movement
of two variables around their averages?
∑N (x −x̄)(yi − ȳ )
Varies between -1 and +1 r= i=1 i
sx sy
MANIPULATING TIME SERIES DATA IN PYTHON
Importing five price time series
data = pd.read_csv('assets.csv', parse_dates=['date'],
index_col='date')
data = data.dropna().info()
DatetimeIndex: 2469 entries, 2007-05-25 to 2017-05-22
Data columns (total 5 columns):
sp500 2469 non-null float64
nasdaq 2469 non-null float64
bonds 2469 non-null float64
gold 2469 non-null float64
oil 2469 non-null float64
MANIPULATING TIME SERIES DATA IN PYTHON
Visualize pairwise linear relationships
daily_returns = data.pct_change()
sns.jointplot(x='sp500', y='nasdaq', data=data_returns);
MANIPULATING TIME SERIES DATA IN PYTHON
Calculate all correlations
correlations = returns.corr()
correlations
bonds oil gold sp500 nasdaq
bonds 1.000000 -0.183755 0.003167 -0.300877 -0.306437
oil -0.183755 1.000000 0.105930 0.335578 0.289590
gold 0.003167 0.105930 1.000000 -0.007786 -0.002544
sp500 -0.300877 0.335578 -0.007786 1.000000 0.959990
nasdaq -0.306437 0.289590 -0.002544 0.959990 1.000000
MANIPULATING TIME SERIES DATA IN PYTHON
Visualize all correlations
sns.heatmap(correlations, annot=True)
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Select index
components &
import data
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Market value-weighted index
Composite performance of various stocks
Components weighted by market capitalization
Share Price x Number of Shares => Market Value
Larger components get higher percentage weightings
Key market indexes are value-weighted:
S&P 500 , NASDAQ , Wilshire 5000 , Hang Seng
MANIPULATING TIME SERIES DATA IN PYTHON
Build a cap-weighted Index
Apply new skills to construct value-weighted index
Select components from exchange listing data
Get component number of shares and stock prices
Calculate component weights
Calculate index
Evaluate performance of components and index
MANIPULATING TIME SERIES DATA IN PYTHON
Load stock listing data
nyse = pd.read_excel('listings.xlsx', sheet_name='nyse',
na_values='n/a')
nyse.info()
RangeIndex: 3147 entries, 0 to 3146
Data columns (total 7 columns):
Stock Symbol 3147 non-null object # Stock Ticker
Company Name 3147 non-null object
Last Sale 3079 non-null float64 # Latest Stock Price
Market Capitalization 3147 non-null float64
IPO Year 1361 non-null float64 # Year of listing
Sector 2177 non-null object
Industry 2177 non-null object
dtypes: float64(3), object(4)
MANIPULATING TIME SERIES DATA IN PYTHON
Load & prepare listing data
nyse.set_index('Stock Symbol', inplace=True)
nyse.dropna(subset=['Sector'], inplace=True)
nyse['Market Capitalization'] /= 1e6 # in Million USD
Index: 2177 entries, DDD to ZTO
Data columns (total 6 columns):
Company Name 2177 non-null object
Last Sale 2175 non-null float64
Market Capitalization 2177 non-null float64
IPO Year 967 non-null float64
Sector 2177 non-null object
Industry 2177 non-null object
dtypes: float64(3), object(3)
MANIPULATING TIME SERIES DATA IN PYTHON
Select index components
components = nyse.groupby(['Sector'])['Market Capitalization'].nlargest(1)
components.sort_values(ascending=False)
Sector Stock Symbol
Health Care JNJ 338834.390080
Energy XOM 338728.713874
Finance JPM 300283.250479
Miscellaneous BABA 275525.000000
Public Utilities T 247339.517272
Basic Industries PG 230159.644117
Consumer Services WMT 221864.614129
Consumer Non-Durables KO 183655.305119
Technology ORCL 181046.096000
Capital Goods TM 155660.252483
Transportation UPS 90180.886756
Consumer Durables ABB 48398.935676
Name: Market Capitalization, dtype: float64
MANIPULATING TIME SERIES DATA IN PYTHON
Import & prepare listing data
tickers = components.index.get_level_values('Stock Symbol')
tickers
Index(['PG', 'TM', 'ABB', 'KO', 'WMT', 'XOM', 'JPM', 'JNJ', 'BABA', 'T',
'ORCL', ‘UPS'], dtype='object', name='Stock Symbol’)
tickers.tolist()
['PG',
'TM',
'ABB',
'KO',
'WMT',
...
'T',
'ORCL',
'UPS']
MANIPULATING TIME SERIES DATA IN PYTHON
Stock index components
columns = ['Company Name', 'Market Capitalization', 'Last Sale']
component_info = nyse.loc[tickers, columns]
pd.options.display.float_format = '{:,.2f}'.format
Company Name Market Capitalization Last Sale
Stock Symbol
PG Procter & Gamble Company (The) 230,159.64 90.03
TM Toyota Motor Corp Ltd Ord 155,660.25 104.18
ABB ABB Ltd 48,398.94 22.63
KO Coca-Cola Company (The) 183,655.31 42.79
WMT Wal-Mart Stores, Inc. 221,864.61 73.15
XOM Exxon Mobil Corporation 338,728.71 81.69
JPM J P Morgan Chase & Co 300,283.25 84.40
JNJ Johnson & Johnson 338,834.39 124.99
BABA Alibaba Group Holding Limited 275,525.00 110.21
T AT&T Inc. 247,339.52 40.28
ORCL Oracle Corporation 181,046.10 44.00
UPS United Parcel Service, Inc. 90,180.89 103.74
MANIPULATING TIME SERIES DATA IN PYTHON
Import & prepare listing data
data = pd.read_csv('stocks.csv', parse_dates=['Date'],
index_col='Date').loc[:, tickers.tolist()]
data.info()
DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30
Data columns (total 12 columns):
ABB 252 non-null float64
BABA 252 non-null float64
JNJ 252 non-null float64
JPM 252 non-null float64
KO 252 non-null float64
ORCL 252 non-null float64
PG 252 non-null float64
T 252 non-null float64
TM 252 non-null float64
UPS 252 non-null float64
WMT 252 non-null float64
XOM 252 non-null float64
dtypes: float64(12)
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Build a market-cap
weighted index
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Build your value-weighted index
Key inputs:
number of shares
stock price series
MANIPULATING TIME SERIES DATA IN PYTHON
Build your value-weighted index
Key inputs:
number of shares
stock price series
Normalize index to start
at 100
MANIPULATING TIME SERIES DATA IN PYTHON
Stock index components
components
Company Name Market Capitalization Last Sale
Stock Symbol
PG Procter & Gamble Company (The) 230,159.64 90.03
TM Toyota Motor Corp Ltd Ord 155,660.25 104.18
ABB ABB Ltd 48,398.94 22.63
KO Coca-Cola Company (The) 183,655.31 42.79
WMT Wal-Mart Stores, Inc. 221,864.61 73.15
XOM Exxon Mobil Corporation 338,728.71 81.69
JPM J P Morgan Chase & Co 300,283.25 84.40
JNJ Johnson & Johnson 338,834.39 124.99
BABA Alibaba Group Holding Limited 275,525.00 110.21
T AT&T Inc. 247,339.52 40.28
ORCL Oracle Corporation 181,046.10 44.00
UPS United Parcel Service, Inc. 90,180.89 103.74
MANIPULATING TIME SERIES DATA IN PYTHON
Number of shares outstanding
shares = components['Market Capitalization'].div(components['Last Sale'])
Stock Symbol
PG 2,556.48 # Outstanding shares in million
TM 1,494.15
ABB 2,138.71
KO 4,292.01
WMT 3,033.01
XOM 4,146.51
JPM 3,557.86
JNJ 2,710.89
BABA 2,500.00
T 6,140.50
ORCL 4,114.68
UPS 869.30
dtype: float64
Market Capitalization = Number of Shares x Share Price
MANIPULATING TIME SERIES DATA IN PYTHON
Historical stock prices
data = pd.read_csv('stocks.csv', parse_dates=['Date'],
index_col='Date').loc[:, tickers.tolist()]
market_cap_series = data.mul(no_shares)
market_series.info()
DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30
Data columns (total 12 columns):
ABB 252 non-null float64
BABA 252 non-null float64
JNJ 252 non-null float64
JPM 252 non-null float64
...
TM 252 non-null float64
UPS 252 non-null float64
WMT 252 non-null float64
XOM 252 non-null float64
dtypes: float64(12)
MANIPULATING TIME SERIES DATA IN PYTHON
From stock prices to market value
market_cap_series.first('D').append(market_cap_series.last('D'))
ABB BABA JNJ JPM KO ORCL \\
Date
2016-01-04 37,470.14 191,725.00 272,390.43 226,350.95 181,981.42 147,099.95
2016-12-30 45,062.55 219,525.00 312,321.87 307,007.60 177,946.93 158,209.60
PG T TM UPS WMT XOM
Date
2016-01-04 200,351.12 210,926.33 181,479.12 82,444.14 186,408.74 321,188.96
2016-12-30 214,948.60 261,155.65 175,114.05 99,656.23 209,641.59 374,264.34
MANIPULATING TIME SERIES DATA IN PYTHON
Aggregate market value per period
agg_mcap = market_cap_series.sum(axis=1) # Total market cap
agg_mcap(title='Aggregate Market Cap')
MANIPULATING TIME SERIES DATA IN PYTHON
Value-based index
index = agg_mcap.div(agg_mcap.iloc[0]).mul(100) # Divide by 1st value
index.plot(title='Market-Cap Weighted Index')
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Evaluate index
performance
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Evaluate your value-weighted index
Index return:
Total index return
Contribution by component
Performance vs Benchmark
Total period return
Rolling returns for sub periods
MANIPULATING TIME SERIES DATA IN PYTHON
Value-based index - recap
agg_market_cap = market_cap_series.sum(axis=1)
index = agg_market_cap.div(agg_market_cap.iloc[0]).mul(100)
index.plot(title='Market-Cap Weighted Index')
MANIPULATING TIME SERIES DATA IN PYTHON
Value contribution by stock
agg_market_cap.iloc[-1] - agg_market_cap.iloc[0]
315,037.71
MANIPULATING TIME SERIES DATA IN PYTHON
Value contribution by stock
change = market_cap_series.first('D').append(market_cap_series.last('D'))
change.diff().iloc[-1].sort_values() # or: .loc['2016-12-30']
TM -6,365.07
KO -4,034.49
ABB 7,592.41
ORCL 11,109.65
PG 14,597.48
UPS 17,212.08
WMT 23,232.85
BABA 27,800.00
JNJ 39,931.44
T 50,229.33
XOM 53,075.38
JPM 80,656.65
Name: 2016-12-30 00:00:00, dtype: float64
MANIPULATING TIME SERIES DATA IN PYTHON
Market-cap based weights
market_cap = components['Market Capitalization']
weights = market_cap.div(market_cap.sum())
weights.sort_values().mul(100)
Stock Symbol
ABB 1.85
UPS 3.45
TM 5.96
ORCL 6.93
KO 7.03
WMT 8.50
PG 8.81
T 9.47
BABA 10.55
JPM 11.50
XOM 12.97
JNJ 12.97
Name: Market Capitalization, dtype: float64
MANIPULATING TIME SERIES DATA IN PYTHON
Value-weighted component returns
index_return = (index.iloc[-1] / index.iloc[0] - 1) * 100
14.06
weighted_returns = weights.mul(index_return)
weighted_returns.sort_values().plot(kind='barh')
MANIPULATING TIME SERIES DATA IN PYTHON
Performance vs benchmark
data = index.to_frame('Index') # Convert pd.Series to pd.DataFrame
data['SP500'] = pd.read_csv('sp500.csv', parse_dates=['Date'],
index_col='Date')
data.SP500 = data.SP500.div(data.SP500.iloc[0], axis=0).mul(100)
MANIPULATING TIME SERIES DATA IN PYTHON
Performance vs benchmark: 30D rolling return
def multi_period_return(r):
return (np.prod(r + 1) - 1) * 100
data.pct_change().rolling('30D').apply(multi_period_return).plot()
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Index correlation &
exporting to Excel
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Some additional analysis of your index
Daily return correlations:
Calculate among all components
Visualize the result as heatmap
Write results to excel using .xls and .xlsx formats:
Single worksheet
Multiple worksheets
MANIPULATING TIME SERIES DATA IN PYTHON
Index components - price data
data = DataReader(tickers, 'google', start='2016', end='2017')['Close']
data.info()
DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30
Data columns (total 12 columns):
ABB 252 non-null float64
BABA 252 non-null float64
JNJ 252 non-null float64
JPM 252 non-null float64
KO 252 non-null float64
ORCL 252 non-null float64
PG 252 non-null float64
T 252 non-null float64
TM 252 non-null float64
UPS 252 non-null float64
WMT 252 non-null float64
XOM 252 non-null float64
MANIPULATING TIME SERIES DATA IN PYTHON
Index components: return correlations
daily_returns = data.pct_change()
correlations = daily_returns.corr()
ABB BABA JNJ JPM KO ORCL PG T TM UPS WMT XOM
ABB 1.00 0.40 0.33 0.56 0.31 0.53 0.34 0.29 0.48 0.50 0.15 0.48
BABA 0.40 1.00 0.27 0.27 0.25 0.38 0.21 0.17 0.34 0.35 0.13 0.21
JNJ 0.33 0.27 1.00 0.34 0.30 0.37 0.42 0.35 0.29 0.45 0.24 0.41
JPM 0.56 0.27 0.34 1.00 0.22 0.57 0.27 0.13 0.49 0.56 0.14 0.48
KO 0.31 0.25 0.30 0.22 1.00 0.31 0.62 0.47 0.33 0.50 0.25 0.29
ORCL 0.53 0.38 0.37 0.57 0.31 1.00 0.41 0.32 0.48 0.54 0.21 0.42
PG 0.34 0.21 0.42 0.27 0.62 0.41 1.00 0.43 0.32 0.47 0.33 0.34
T 0.29 0.17 0.35 0.13 0.47 0.32 0.43 1.00 0.28 0.41 0.31 0.33
TM 0.48 0.34 0.29 0.49 0.33 0.48 0.32 0.28 1.00 0.52 0.20 0.30
UPS 0.50 0.35 0.45 0.56 0.50 0.54 0.47 0.41 0.52 1.00 0.33 0.45
WMT 0.15 0.13 0.24 0.14 0.25 0.21 0.33 0.31 0.20 0.33 1.00 0.21
XOM 0.48 0.21 0.41 0.48 0.29 0.42 0.34 0.33 0.30 0.45 0.21 1.00
MANIPULATING TIME SERIES DATA IN PYTHON
Index components: return correlations
sns.heatmap(correlations, annot=True)
plt.xticks(rotation=45)
plt.title('Daily Return Correlations')
MANIPULATING TIME SERIES DATA IN PYTHON
Saving to a single Excel worksheet
correlations.to_excel(excel_writer= 'correlations.xls',
sheet_name='correlations',
startrow=1,
startcol=1)
MANIPULATING TIME SERIES DATA IN PYTHON
Saving to multiple Excel worksheets
data.index = data.index.date # Keep only date component
with pd.ExcelWriter('stock_data.xlsx') as writer:
corr.to_excel(excel_writer=writer, sheet_name='correlations')
data.to_excel(excel_writer=writer, sheet_name='prices')
data.pct_change().to_excel(writer, sheet_name='returns')
MANIPULATING TIME SERIES DATA IN PYTHON
Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Congratulations!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Congratulations!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Reading, inspecting,
and cleaning data
from CSV
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Import and clean data
Ensure that pd.DataFrame() is same as CSV source file
Stock exchange listings: amex-listings.csv
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
How pandas stores data
Each column has its own data format ( dtype )
dtype affects your calculation and visualization
pandas dtype Column characteristics
object Text, or a mix of text and numeric data
int64 Numeric: whole numbers - 64 bits (≤ 264 )
float64 Numeric: Decimals, or whole numbers with missing values
datetime64 Date and time information
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Import & inspect
import pandas as pd
amex = pd.read_csv('amex-listings.csv')
amex.info() # To inspect table structure & data types
RangeIndex: 360 entries, 0 to 359
Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null object
dtypes: float64(3), object(5)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Dealing with missing values
# Replace 'n/a' with np.nan
amex = pd.read_csv('amex-listings.csv', na_values='n/a')
amex.info()
RangeIndex: 360 entries, 0 to 359
Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null object
dtypes: float64(3), object(5)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Properly parsing dates
amex = pd.read_csv('amex-listings.csv',
na_values='n/a',
parse_dates=['Last Update'])
amex.info()
RangeIndex: 360 entries, 0 to 359
Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(3), object(4)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Showing off the result
amex.head(2) # Show first n rows (default: 5)
Stock Symbol Company Name
0 XXII 22nd Century Group, Inc
1 FAX Aberdeen Asia-Pacific Income Fund Inc
Last Sale Market Capitalization IPO Year
0 1.3300 1.206285e+08 NaN
1 5.0000 1.266333e+09 1986.0
Sector Industry Last Update
0 Non-Durables Farming/Seeds/Milling 2017-04-26
1 NaN NaN 2017-04-25
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Read data from
Excel worksheets
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Import data from Excel
pd.read_excel(file, sheet_name=0)
Select first sheet by default with sheet_name=0
Select by name with sheet_name='amex'
Import several sheets with list such as sheet_name=['amex', 'nasdaq']
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Import data from one sheet
amex = pd.read_excel('listings.xlsx',
sheet_name='amex',
na_values='n/a')
amex.info()
RangeIndex: 360 entries, 0 to 359
Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Import data from two sheets
listings = pd.read_excel('listings.xlsx',
sheet_name=['amex', 'nasdaq'], # keys = sheet name
na_values='n/a') # values = DataFrame
listings['nasdaq'].info()
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 3167 non-null object
1 Company Name 3167 non-null object
2 Last Sale 3165 non-null float64
3 Market Capitalization 3167 non-null float64
4 IPO Year 1386 non-null float64
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get sheet names
xls = pd.ExcelFile('listings.xlsx') # pd.ExcelFile object
exchanges = xls.sheet_names
exchanges
['amex', 'nasdaq', 'nyse']
nyse = pd.read_excel(xls,
sheet_name=exchanges[2],
na_values='n/a')
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get sheet names
nyse.info()
RangeIndex: 3147 entries, 0 to 3146
Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 3147 non-null object
1 Company Name 3147 non-null object
... ...
6 Industry 2177 non-null object
dtypes: float64(3), object(4)
memory usage: 172.2+ KB
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Combine data from
multiple worksheets
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Combine DataFrames
Concatenate or "stack" a list of pd.DataFrame s
Syntax: pd.concat([amex, nasdaq, nyse])
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Combine DataFrames
Concatenate or "stack" a list of pd.DataFrame s
Syntax: pd.concat([amex, nasdaq, nyse])
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Combine DataFrames
Concatenate or "stack" a list of pd.DataFrame s
Syntax: pd.concat([amex, nasdaq, nyse])
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Concatenate two DataFrames
amex = pd.read_excel('listings.xlsx',
sheet_name='amex',
na_values=['n/a'])
nyse = pd.read_excel('listings.xlsx',
sheet_name='nyse',
na_values=['n/a'])
pd.concat([amex, nyse]).info()
Int64Index: 3507 entries, 0 to 3146
Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 3507 non-null object
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Add a reference column
amex['Exchange'] = 'AMEX' # Add column to reference source
nyse['Exchange'] = 'NYSE'
listings = pd.concat([amex, nyse])
listings.head(2)
Stock Symbol ... Exchange
0 XXII ... AMEX
1 FAX ... AMEX
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Combine three DataFrames
xls = pd.ExcelFile('listings.xlsx')
exchanges = xls.sheet_names
# Create empty list to collect DataFrames
listings = []
for exchange in exchanges:
listing = pd.read_excel(xls, sheet_name=exchange)
# Add reference col
listing['Exchange'] = exchange
# Add DataFrame to list
listings.append(listing)
# List of DataFrames
combined_listings = pd.concat(listings)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Combine three DataFrames
combined_listings.info()
Int64Index: 6674 entries, 0 to 3146
Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 6674 non-null object
1 Company Name 6674 non-null object
2 Last Sale 6590 non-null float64
3 Market Capitalization 6674 non-null float64
4 IPO Year 2852 non-null float64
5 Sector 5182 non-null object
6 Industry 5182 non-null object
7 Exchange 6674 non-null object
dtypes: float64(3), object(5)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
The DataReader:
Access financial
data online
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
pandas_datareader
Easy access to various financial internet data sources
Little code needed to import into a pandas DataFrame
Available sources include:
IEX and Yahoo! Finance (including derivatives)
Federal Reserve
World Bank, OECD, Eurostat
OANDA
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Stock prices: Yahoo! Finance
from pandas_datareader.data import DataReader
from datetime import date # Date & time functionality
start = date(2015, 1, 1) # Default: Jan 1, 2010
end = date(2016, 12, 31) # Default: today
ticker = 'GOOG'
data_source = 'yahoo'
stock_data = DataReader(ticker, data_source, start, end)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Stock prices: Yahoo! Finance
stock_data.info()
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 6 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 High 504 non-null float64 # First price
1 Low 504 non-null float64 # Highest price
2 Open 504 non-null float64 # Lowest price
3 Close 504 non-null float64 # Last price
4 Volume 504 non-null float64 # No shares traded
5 Adj Close 504 non-null float64 # Adj. price
dtypes: float64(6)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Stock prices: Yahoo! Finance
pd.concat([stock_data.head(3), stock_data.tail(3)])
High Low Open Close Volume Adj Close
Date
2015-01-02 26.49 26.13 26.38 26.17 28951268 26.17
2015-01-05 26.14 25.58 26.09 25.62 41196796 25.62
2015-01-06 25.74 24.98 25.68 25.03 57998800 25.03
2016-12-28 39.71 39.16 39.69 39.25 23076000 39.25
2016-12-29 39.30 38.95 39.17 39.14 14886000 39.14
2016-12-30 39.14 38.52 39.14 38.59 35400000 38.59
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Stock prices: Visualization
import matplotlib.pyplot as plt
stock_data['Close'].plot(title=ticker)
plt.show()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Economic data from
the Federal Reserve
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Economic data from FRED
Federal Reserve Economic Data
500,000 series covering a range of categories:
Economic growth & employment
Monetary & fiscal policy
Demographics, industries, commodity prices
Daily, monthly, annual frequencies
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get data from FRED
1 https://fred.stlouisfed.org/
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get data from FRED
1 https://fred.stlouisfed.org/
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get data from FRED
1 https://fred.stlouisfed.org/
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Interest rates
from pandas_datareader.data import DataReader
from datetime import date
series_code = 'DGS10' # 10-year Treasury Rate
data_source = 'fred' # FED Economic Data Service
start = date(1962, 1, 1)
data = DataReader(series_code, data_source, start)
data.info()
DatetimeIndex: 15754 entries, 1962-01-02 to 2022-05-20
Data columns (total 1 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 DGS10 15083 non-null float64
dtypes: float64(1)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Stock prices: Visualization
.rename(columns={old_name: new_name})
series_name = '10-year Treasury'
data = data.rename(columns={series_code: series_name})
data.plot(title=series_name); plt.show()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Combine stock and economic data
start = date(2000, 1, 1)
series = 'DCOILWTICO' # West Texas Intermediate Oil Price
oil = DataReader(series, 'fred', start)
ticker = 'XOM' # Exxon Mobile Corporation
stock = DataReader(ticker, 'yanoo', start)
data = pd.concat([stock[['Close']], oil], axis=1)
data.info()
DatetimeIndex: 5841 entries, 2000-01-03 to 2022-05-23
Data columns (total 2 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Close 5634 non-null float64
1 DCOILWTICO 5615 non-null float64
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Combine stock and economic data
data.columns = ['Exxon', 'Oil Price']
data.plot()
plt.show()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Select stocks and
get data from
Yahoo! Finance
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Select stocks based on criteria
Use the listing information to select specific stocks
As criteria:
Stock Exchange
Sector or Industry
IPO Year
Market Capitalization
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get ticker for largest company
nyse = pd.read_excel('listings.xlsx',sheet_name='nyse', na_values='n/a')
nyse = nyse.sort_values('Market Capitalization', ascending=False)
nyse[['Stock Symbol', 'Company Name']].head(3)
Stock Symbol Company Name
1586 JNJ Johnson & Johnson
1125 XOM Exxon Mobil Corporation
1548 JPM J P Morgan Chase & Co
largest_by_market_cap = nyse.iloc[0] # 1st row
largest_by_market_cap['Stock Symbol'] # Select row label
'JNJ'
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get ticker for largest company
nyse = nyse.set_index('Stock Symbol') # Stock ticker as index
nyse.info()
Index: 3147 entries, JNJ to EAE
Data columns (total 6 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Company Name 3147 non-null object
1 Last Sale 3079 non-null float64
2 Market Capitalization 3147 non-null float64
...
nyse['Market Capitalization'].idxmax() # Index of max value
'JNJ'
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get ticker for largest tech company
nyse['Sector'].unique() # Unique values as numpy array
array(['Technology', 'Health Care', ...], dtype=object)
tech = nyse.loc[nyse.Sector == 'Technology']
tech['Company Name'].head(2)
Stock Symbol Company Name
ORCL Oracle Corporation
TSM Taiwan Semiconductor Manufacturing
nyse.loc[nyse.Sector=='Technology', 'Market Capitalization'].idxmax()
'ORCL'
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Get data for largest tech company with 2017 IPO
ticker = nyse.loc[(nyse.Sector=='Technology') & (nyse['IPO Year']==2017),
'Market Capitalization'].idxmax()
data = DataReader(ticker, 'yahoo') # Start: 2010/1/1
data = data.loc[:, ['Close', 'Volume']]
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Visualize price and volume on two axes
import matplotlib.pyplot as plt
data.plot(title=ticker, secondary_y='Volume')
plt.tight_layout(); plt.show()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Get several stocks &
manage a
MultiIndex
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Get data for several stocks
Use the listing information to select multiple stocks
E.g. largest 3 stocks per sector
Use Yahoo! Finance to retrieve data for several stocks
Learn how to manage a pandas MultiIndex , a powerful tool to deal with more complex
data sets
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Load prices for top 5 companies
nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq', na_values='n/a')
nasdaq.set_index('Stock Symbol', inplace=True)
top_5 = nasdaq['Market Capitalization'].nlargest(n=5) # Top 5
top_5.div(1000000) # Market Cap in million USD
AAPL 740024.467000
GOOG 569426.124504
... ...
Name: Market Capitalization, dtype: float64
tickers = top_5.index.tolist() # Convert index to list
['AAPL', 'GOOG', 'MSFT', 'AMZN', 'FB']
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Load prices for top 5 companies
df = DataReader(tickers, 'yahoo', start=date(2020, 1, 1))
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 712 entries, 2020-01-02 to 2022-10-27
Data columns (total 30 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 (Adj Close, AAPL) 712 non-null float64
1 (Adj Close, GOOG) 712 non-null float64
2 (Adj Close, MSFT) 712 non-null float64
...
28 (Volume, AMZN) 712 non-null float64
29 (Volume, FB) 253 non-null float64
dtypes: float64(30)
memory usage: 172.4 KB
df = df.stack()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Load prices for top 5 companies
df.info()
MultiIndex: 3101 entries, (Timestamp('2020-01-02 00:00:00'), 'AAPL') to (Timestamp('
Data columns (total 6 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Adj Close 3101 non-null float64
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Reshape your data: .unstack()
unstacked = df['Close'].unstack()
unstacked.info()
DatetimeIndex: 712 entries, 2020-01-02 to 2022-10-27
Data columns (total 5 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 AAPL 712 non-null float64
1 GOOG 712 non-null float64
2 MSFT 712 non-null float64
3 AMZN 712 non-null float64
4 FB 253 non-null float64
dtypes: float64(5)
memory usage: 33.4 KB
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
From long to wide format
unstacked = df['Close'].unstack() # Results in DataFrame
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Stock prices: Visualization
unstacked.plot(subplots=True)
plt.tight_layout(); plt.show()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Summarize your
data with
descriptive stats
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Be on top of your data
Goal: Capture key quantitative characteristics
Important angles to look at:
Central tendency: Which values are "typical"?
Dispersion: Are there outliers?
Overall distribution of individual variables
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Central tendency
n
1
Mean (average): x̄ = ∑ xi
n
i=1
Median: 50% of values smaller/larger
Mode: most frequent value
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Central tendency
n
1
Mean (average): x̄ = ∑ xi
n
i=1
Median: 50% of values smaller/larger
Mode: most frequent value
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Central tendency
n
1
Mean (average): x̄ = ∑ xi
n
i=1
Median: 50% of values smaller/larger
Mode: most frequent value
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Calculate summary statistics
nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq', na_values='n/a')
market_cap = nasdaq['Market Capitalization'].div(10**6)
market_cap.mean()
3180.7126214953805
market_cap.median()
225.9684285
market_cap.mode()
0.0
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Calculate summary statistics
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Dispersion
Variance: Sum all of the squared differences from mean and divide by n − 1
n
1
var = ∑(xi − x̄)2
n−1
i=1
Standard deviation: Square root of variance
sd = √var
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Calculate variance and standard deviation
variance = market_cap.var()
print(variance)
648773812.8182
np.sqrt(variance)
25471.0387
market_cap.std()
25471.0387
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Describe the
distribution of your
data with quantiles
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Describe data distributions
First glance: Central tendency and standard deviation
How to get a more granular view of the distribution?
Calculate and plot quantiles
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
More on dispersion: quantiles
Quantiles: Groups with equal share of observations
Quartiles: 4 groups, 25% of data each
Deciles: 10 groups, 10% of data each
Interquartile range: 3rd quartile - 1st quartile
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Quantiles with pandas
market_cap = nasdaq['Market Capitalization'].div(10**6)
median = market_cap.quantile(.5)
median == market_cap.median()
True
quantiles = market_cap.quantile([.25, .75])
0.25 43.375930
0.75 969.905207
quantiles[.75] - quantiles[.25] # Interquartile Range
926.5292771575
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Quantiles with pandas & numpy
deciles = np.arange(start=.1, stop=.91, step=.1)
deciles
array([ 0.1, 0.2, 0.3, 0.4, ..., 0.7, 0.8, 0.9])
market_cap.quantile(deciles)
0.1 4.884565
0.2 26.993382
0.3 65.714547
0.4 124.320644
0.5 225.968428
0.6 402.469678
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Visualize quantiles with bar chart
title = 'NASDAQ Market Capitalization (million USD)'
market_cap.quantile(deciles).plot(kind='bar', title=title)
plt.tight_layout(); plt.show()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
All statistics in one go
market_cap.describe()
count 3167.000000
mean 3180.712621
std 25471.038707
min 0.000000
25% 43.375930 # 1st quantile
50% 225.968428 # Median
75% 969.905207 # 3rd quantile
max 740024.467000
Name: Market Capitalization
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
All statistics in one go
market_cap.describe(percentiles=np.arange(.1, .91, .1))
count 3167.000000
mean 3180.712621
std 25471.038707
min 0.000000
10% 4.884565
20% 26.993382
30% 65.714547
40% 124.320644
50% 225.968428
60% 402.469678
70% 723.163197
80% 1441.071134
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Visualize the
distribution of your
data
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Always look at your data!
Identical metrics can represent very different data
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Introducing seaborn plots
Many attractive and insightful statistical plots
Based on matplotlib
Swiss Army knife: seaborn.distplot()
Histogram
Kernel Density Estimation (KDE)
Rugplot
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
10 year treasury: trend and distribution
ty10 = web.DataReader('DGS10', 'fred', date(1962, 1, 1))
ty10.info()
DatetimeIndex: 15754 entries, 1962-01-02 to 2022-05-20
Data columns (total 1 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 DGS10 15083 non-null float64
ty10.describe()
DGS10
mean 6.291073
std 2.851161
min 1.370000
25% 4.190000
50% 6.040000
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
10 year treasury: time series trend
ty10.dropna(inplace=True) # Avoid creation of copy
ty10.plot(title='10-year Treasury'); plt.tight_layout()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
10 year treasury: historical distribution
import seaborn as sns
sns.distplot(ty10)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
10 year treasury: trend and distribution
ax = sns.distplot(ty10)
ax.axvline(ty10['DGS10'].median(), color='black', ls='--')
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Summarize
categorical
variables
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
From categorical to quantitative variables
So far, we have analyzed quantitative variables
Categorical variables require a different approach
Concepts like average don't make much sense
Instead, we'll rely on their frequency distribution
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Categorical listing information
amex = pd.read_excel('listings.xlsx', sheet_name='amex',
na_values=['n/a'])
amex.info()
RangeIndex: 360 entries, 0 to 359
Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
dtypes: float64(3), object(4)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Categorical listing information
amex = amex['Sector'].nunique()
12
apply() : call function on each column
lambda : "anonymous function", receives each column as argument x
amex.Sector.apply(lambda x: x.nunique())
Stock Symbol 360
Company Name 326
Last Sale 323
Market Capitalization 317
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
How many observations per sector?
amex['Sector'].value_counts()
Health Care 49 # Mode
Basic Industries 44
Energy 28
Consumer Services 27
Capital Goods 24
Technology 20
Consumer Non-Durables 13
Finance 12
Public Utilities 11
Miscellaneous 5
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
How many IPOs per year?
amex['IPO Year'].value_counts()
2002.0 19 # Mode
2015.0 11
1999.0 9
1993.0 7
2014.0 6
2013.0 5
2017.0 5
...
2009.0 1
1990.0 1
1991.0 1
Name: IPO Year, dtype: int64
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Convert IPO Year to int
ipo_by_yr = amex['IPO Year'].dropna().astype(int).value_counts()
ipo_by_yr
2002 19
2015 11
1999 9
1993 7
2014 6
2004 5
2003 5
2017 5
...
1987 1
Name: IPO Year, dtype: int64
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Convert IPO Year to int
ipo_by_yr.plot(kind='bar', title='IPOs per Year')
plt.xticks(rotation=45)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Aggregate your
data by category
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Summarize numeric data by category
So far: Summarize individual variables
Compute descriptive statistic like mean, quantiles
Split data into groups, then summarize groups
Examples:
Largest company by exchange
Median market capitalization per IPO year
Average market capitalization per sector
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Group your data by sector
nasdaq.info()
RangeIndex: 3167 entries, 0 to 3166
Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- --- -------------- -----
0 Stock Symbol 3167 non-null object
1 Company Name 3167 non-null object
2 Last Sale 3165 non-null float64
3 Market Capitalization 3167 non-null float64
4 IPO Year 1386 non-null float64
5 Sector 2767 non-null object
6 Industry 2767 non-null object
dtypes: float64(3), object(4)
memory usage: 173 3+ KB
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Group your data by sector
nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)
nasdaq = nasdaq.drop('Market Capitalization', axis=1) # Drop column
nasdaq_by_sector = nasdaq.groupby('Sector') # Create groupby object
for sector, data in nasdaq_by_sector:
print(sector, data.market_cap_m.mean())
Basic Industries 724.899933858
Capital Goods 1511.23737278
Consumer Durables 839.802606627
Consumer Non-Durables 3104.05120552
...
Public Utilities 2357.86531507
Technology 10883.4342135
Transportation 2869.66000673
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Keep it simple and skip the loop
mcap_by_sector = nasdaq_by_sector.market_cap_m.mean()
mcap_by_sector
Sector
Basic Industries 724.899934
Capital Goods 1511.237373
Consumer Durables 839.802607
Consumer Non-Durables 3104.051206
Consumer Services 5582.344175
Energy 826.607608
Finance 1044.090205
Health Care 1758.709197
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Visualize category summaries
title = 'NASDAQ = Avg. Market Cap by Sector'
mcap_by_sector.plot(kind='barh', title=title)
plt.xlabel('USD mn')
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Aggregate summary for all numeric columns
nasdaq_by_sector.mean()
Last Sale IPO Year market_cap_m
Sector
Basic Industries 21.597679 2000.766667 724.899934
Capital Goods 26.188681 2001.324675 1511.237373
Consumer Durables 24.363391 2003.222222 839.802607
Consumer Non-Durables 25.749565 2000.609756 3104.051206
Consumer Services 34.917318 2004.104575 5582.344175
Energy 15.496834 2008.034483 826.607608
Finance 29.644242 2010.321101 1044.090205
Health Care 19.462531 2009.240409 1758.709197
Miscellaneous 46.094369 2004.333333 3445.655935
Public Utilities 18.643705 2006.040000 2357.865315
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
More ways to
aggregate your
data
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Many ways to aggregate
Last segment: Group by one variable and aggregate
More detailed ways to summarize your data:
Group by two or more variables
Apply multiple aggregations
Examples
Median market cap by sector and IPO year
Mean & standard deviation of stock price by year
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Several aggregations by category
nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)
by_sector = nasdaq.groupby('Sector')
by_sector.market_cap_m.agg(['size', 'mean']).sort_values('size')
Sector size mean
Transportation 52 2869.660007
Energy 66 826.607608
Public Utilities 66 2357.865315
Basic Industries 78 724.899934
...
Consumer Services 348 5582.344175
Technology 433 10883.434214
Finance 627 1044.090205
Health Care 645 1758.709197
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Several aggregations plus new labels
by_sector.market_cap_m.agg(['size', 'mean'])
.rename(columns={'size': '#Obs', 'mean': 'Average'})
Sector #Obs Average
Basic Industries 78 724.899934
Capital Goods 172 1511.237373
Consumer Durables 88 839.802607
Consumer Non-Durables 103 3104.051206
Consumer Services 348 5582.344175
...
Health Care 645 1758.709197
Miscellaneous 89 3445.655935
Public Utilities 66 2357.865315
Technology 433 10883.434214
Transportation 52 2869.660007
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Different statistics by column
by_sector.agg({'market_cap_m': 'size', 'IPO Year':'median'})
Sector market_cap_m IPO Year
Basic Industries 78 1972.0
Capital Goods 172 1972.0
Consumer Durables 88 1983.0
Consumer Non-Durables 103 1972.0
Consumer Services 348 1981.0
...
Health Care 645 1981.0
Miscellaneous 89 1987.0
Public Utilities 66 1981.0
Technology 433 1972.0
Transportation 52 1986.0
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Aggregate by two categories
by_sector_year = nasdaq.groupby(['Sector', 'IPO Year'])
by_sector_year.market_cap_m.mean()
Sector IPO Year
Basic Industries 1972.0 877.240005
1973.0 1445.697371
1986.0 1396.817381
...
Transportation 1986.0 1176.179710
1991.0 6646.778622
1992.0 56.074572
...
2009.0 552.445919
2011.0 3711.638317
2013.0 125.740421
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Select from MultiIndex
mcap_sector_year = by_sector_year.market_cap_m.mean()
mcap_sect_year.loc['Basic Industries']
IPO Year
1972.0 877.240005
1973.0 1445.697371
1986.0 1396.817381
1988.0 24.847526
...
2012.0 381.796074
2013.0 22.661533
2015.0 260.075564
2016.0 81.288336
Name: market_cap_m, dtype: float64
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Select from MultiIndex
mcap_sect_year.loc[['Basic Industries', 'Transportation']]
Sector IPO Year
Basic Industries 1972.0 877.240005
1973.0 1445.697371
1986.0 1396.817381
...
Transportation 1986.0 1176.179710
1991.0 6646.778622
1992.0 56.074572
...
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Summary statistics
by category with
seaborn
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Categorical plots with seaborn
Specialized ways to plot combinations of categorical and numerical variables
Visualize estimates of summary statistics per category
Understand how categories impact numerical variables
Compare using key metrics of distributional characteristics
Example: Mean Market Cap per Sector or IPO Year with indication of dispersion
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
The basics: countplot
sns.countplot(x='Sector', data=nasdaq)
plt.xticks(rotation=45)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
countplot, sorted
sector_size = nasdaq.groupby('Sector').size()
order = sector_size.sort_values(ascending=False)
order.head()
Sector
Health Care 645
Finance 627
Technology 433
...
order = order.index.tolist()
['Health Care', 'Finance', ..., 'Energy', 'Transportation']
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
countplot, sorted
sns.countplot(x='Sector', data=nasdaq, order=order)
plt.xticks(rotation=45)
plt.title('# Observations per Sector’)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
countplot, multiple categories
recent_ipos = nasdaq[nasdaq['IPO Year'] > 2014]
recent_ipos['IPO Year'] = recent_ipos['IPO Year'].astype(int)
sns.countplot(x='Sector', hue='IPO Year', data=recent_ipos)
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Compare stats with PointPlot
nasdaq['IPO'] = nasdaq['IPO Year'].apply(lambda x: 'After 2000' if x > 2000 else 'Before 2000')
sns.pointplot(x='Sector', y='market_cap_m', hue='IPO', data=nasdaq)
plt.xticks(rotation=45); plt.title('Mean Market Cap')
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Distributions by
category with
seaborn
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Distributions by category
Last segment: Summary statistics
Number of observations, mean per category
Now: Visualize distribution of a variable by levels of a categorical variable to facilitate
comparison
Example: Distribution of Market Cap by Sector or IPO Year
More detail than summary stats
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Clean data: removing outliers
nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq',
na_values='n/a')
nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)
nasdaq = nasdaq[nasdaq.market_cap_m > 0] # Active companies only
outliers = nasdaq.market_cap_m.quantile(.9) # Outlier threshold
nasdaq = nasdaq[nasdaq.market_cap_m < outliers] # Remove outliers
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Boxplot: quartiles and outliers
import seaborn as sns
sns.boxplot(x='Sector', y='market_cap_m', data=nasdaq)
plt.xticks(rotation=75);
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
A variation: SwarmPlot
sns.swarmplot(x='Sector', y='market_cap_m', data=nasdaq)
plt.xticks(rotation=75)
plt.show()
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Congratulations!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
What you learned
Import data from Excel and online sources
Combine datasets
Summarize and aggregate data
IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON
Keep learning!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Welcome to Portfolio
Analysis!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Hi! My name is Charlotte
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
What is a portfolio
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Why do we need portfolio analysis
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Portfolio versus fund versus index
Portfolio: a collection of investments (stocks, bonds, commodities, other funds) o en owned
by an individual
Fund: a pool of investments that is managed by a professional fund manager. Individual
investors buy "units" of the fund and the manager invests the money
Index: A smaller sample of the market that is representative of the whole, e.g. S&P500,
Nasdaq, Russell 2000, MSCI World Index
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Active versus passive investing
Passive investing: following a benchmark as
closely as possible
Active investing: taking active "bets" that
are di erent from a benchmark
Long only strategies: small deviations from
a benchmark
Hedgefunds: no benchmark but 'total return
strategies'
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Diversification
1. Single stock investments expose you to: a
sudden change in management,
disappointing nancial performance, weak
economy, an industry slump, etc
2. Good diversi cation means combining
stocks that are di erent: risk, cyclical,
counter-cyclical, industry, country
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Typical portfolio strategies
Equal weighted portfolios
Market-cap weighted portfolios
Risk-return optimized portfolios
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Portfolio returns
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
What are portfolio weights?
Weight is the percentage composition of a particular asset in a portfolio
All weights together have to sum up to 100%
Weights and diversi cation (few large investments versus many small investments)
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating portfolio weights
Calculate by dividing the value of a security by total value of the portfolio
Equal weighted portfolio, or market cap weighted portfolio
Weights determine your investment strategy, and can be set to optimize risk and expected
return
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Portfolio returns
Changes in value over time
Vt −Vt−1
Returnt = Vt−1
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Portfolio returns
Vt −Vt−1
Returnt = Vt−1
Historic average returns o en used to calculate expected return
Warning for confusion: average return, cumulative return, active return, and annualized
return
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating returns from pricing data
df.head(2)
AAPL AMZN TSLA
date
2018-03-25 13.88 114.74 92.48
2018-03-26 13.35 109.95 89.79
# Calculate returns over each day
returns = df.pct_change()
returns.head(2)
AAPL AMZN TSLA
date
2018-03-25 NaN NaN NaN
2018-03-26 -0.013772 0.030838 0.075705
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating returns from pricing data
weights = np.array([0, 0.50, 0.25])
# Calculate average return for each stock
meanDailyReturns = returns.mean()
# Calculate portfolio return
portReturn = np.sum(meanDailyReturns*weights)
print (portReturn)
0.05752375881537723
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating cumulative returns
# Calculate daily portfolio returns
returns['Portfolio']= returns.dot(weights)
# Let's see what it looks like
returns.head(3)
AAPL AMZN TSLA Portfolio
date
2018-03-23 -0.020974 -0.026739 -0.029068 -0.025880
2018-03-26 -0.013772 0.030838 0.075705 0.030902
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating cumulative returns
# Compound the percentage returns over time
daily_cum_ret=(1+returns).cumprod()
# Plot your cumulative return
daily_cum_ret.Portfolio.plot()
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Cumulative return plot
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Measuring risk of a
portfolio
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Risk of a portfolio
Investing is risky: individual assets will go up or down
Expected return is a random variable
Returns spread around the mean is measured by the variance σ 2 and is a common measure
of volatility
N
2
∑ (X−μ)
σ2 = i=1
N
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Variance
Variance of an individual asset varies: some
have more or less spread around the mean
Variance of the portfolio is not simply the
weighted variances of the underlying assets
Because returns of assets are correlated, it
becomes complex
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
How do variance and correlation relate to portfolio
risk?
The correlation between asset 1 and 2 is denoted by ρ1,2 , and tells us to which extend assets
move together
The portfolio variance takes into account the individual assets' variances (σ12 , σ22 , etc), the
weights of the assets in the portfolio (w1 , w2 ), as well as their correlation to each other
The standard deviation (σ ) is equal to the square root of variance (σ 2 ), both are a measure
of volatility
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating portfolio variance
ρ1,2 σ1 σ2 is called the covariance between asset 1 and 2
The covariance can also be wri en as σ1,2
This let's us write:
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Re-writing the portfolio variance shorter
This can be re-wri en in matrix notation, which you can use more easily in code:
In words, what we need to calculate in python is: Portfolio variance = Weights transposed x
(Covariance matrix x Weights)
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Portfolio variance in python
price_data.head(2)
ticker AAPL FB GE GM WMT
date
2018-03-21 171.270 169.39 13.88 37.58 88.18
2018-03-22 168.845 164.89 13.35 36.35 87.14
# Calculate daily returns from prices
daily_returns = df.pct_change()
# Construct a covariance matrix for the daily returns data
cov_matrix_d = daily_returns.cov()
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Portfolio variance in python
# Construct a covariance matrix from the daily_returns
cov_matrix_d = (daily_returns.cov())*250
print (cov_matrix_d)
AAPL FB GE GM WMT
AAPL 0.053569 0.026822 0.013466 0.018119 0.010798
FB 0.026822 0.062351 0.015298 0.017250 0.008765
GE 0.013466 0.015298 0.045987 0.021315 0.009513
GM 0.018119 0.017250 0.021315 0.058651 0.011894
WMT 0.010798 0.008765 0.009513 0.011894 0.041520
weights = np.array([0.2, 0.2, 0.2, 0.2, 0.2])
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Portfolio variance in python
# Calculate the variance with the formula
port_variance = np.dot(weights.T, np.dot(cov_matrix_a, weights))
print (port_variance)
0.022742232726360567
# Just converting the variance float into a percentage
print(str(np.round(port_variance, 3) * 100) + '%')
2.3%
port_stddev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix_a, weights)))
print(str(np.round(port_stddev, 3) * 100) + '%')
15.1%
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Annualized returns
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Comparing returns
1. Annual Return: Total return earned over a period of one calendar year
2. Annualized return: Yearly rate of return inferred from any time period
3. Average Return: Total return realized over a longer period, spread out evenly over the
(shorter) periods.
4. Cumulative (compounding) return: A return that includes the compounded results of re-
investing interest, dividends, and capital gains.
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Why annualize returns?
Average return = (100 - 50) / 2 = 25%
Actual return = 0% so average return is not
a good measure for performance!
How to compare portfolios with di erent
time lengths?
How to account for compounding e ects
over time?
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating annualized returns
N in years: rate = (1 + Return)1/N − 1
N in months: rate = (1 + Return)12/N − 1
Convert any time length to an annual rate:
Return is the total return you want to annualize.
N is number of periods so far.
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Annualized returns in python
# Check the start and end of timeseries
apple_price.head(1)
date
2015-01-06 105.05
Name: AAPL, dtype: float64
apple_price.tail(1)
date
2018-03-29 99.75
Name: AAPL, dtype: float64
# Assign the number of months
months = 38
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Annualized returns in python
# Calculate the total return
total_return = (apple_price[-1] - apple_price[0]) /
apple_price[0]
print (total_return)
0.5397420653068692
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Annualized returns in python
# Calculate the annualized returns over months
annualized_return=((1 + total_return)**(12/months))-1
print (annualized_return)
0.14602501482708763
# Select three year period
apple_price = apple_price.loc['2015-01-01':'2017-12-31']
apple_price.tail(3)
date
2017-12-27 170.60
2017-12-28 171.08
2017-12-29 169.23
Name: AAPL, dtype: float64
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Annualized return in Python
# Calculate annualized return over 3 years
annualized_return = ((1 + total_return)**(1/3))-1
print (annualized_return)
0.1567672968419047
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Risk adjusted returns
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Choose a portfolio
Portfolio 1 Portfolio 2
Annual return of 14% Annual return of 6%
Volatility (standard deviation) is 8% Volatility is 3%
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Risk adjusted return
It de nes an investment's return by measuring how much risk is involved in producing that
return
It's usually a ratio
Allows you to objectively compare across di erent investment options
Tells you whether the return justi es the underlying risk
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Sharpe ratio
Sharpe ratio is the most commonly used risk adjusted return ratio
It's calculated as follows:
Rp −Rf
Sharpe ratio = σp
Where: Rp is the portfolio return, Rf is the risk free rate and σp is the portfolio standard
deviation
Remember the formula for the portfolio σp ?
σp = √(W eights transposed(Covariance matrix ∗ W eights) )
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Annualizing volatility
Annualized standard deviation is calculated as follows: σa = σm ∗ √T
σm is the measured standard deviation
σa is the annualized standard deviation
T is the number of data points per year
Alternatively, when using variance instead of standard deviation; σa2 = σm
2
∗T
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating the Sharpe Ratio
# Calculate the annualized standard deviation
annualized_vol = apple_returns.std()*np.sqrt(250)
print (annualized_vol)
0.2286248397870068
# Define the risk free rate
risk_free = 0.01
# Calcuate the sharpe ratio
sharpe_ratio = (annualized_return - risk_free) / annualized_vol
print (sharpe_ratio)
0.6419569149994251
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Which portfolio did you choose?
Portfolio 1 Portfolio 2
Annual return of 14% Annual return of 6%
Volatility (standard deviation) is 8% Volatility is 3%
Sharpe ratio of 1.75 Sharpe ratio of 2
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Non-normal
distribution of
returns
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
In a perfect world returns are distributed normally
1 Source: Distribution of monthly returns from the S&P500 from evestment.com
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
But using mean and standard deviations can be
deceiving
1 Source: “An Introduction to Omega, Con Keating and William Shadwick, The Finance Development Center, 2002
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Skewness: leaning towards the negative
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Pearson’s Coefficient of Skewness
3(mean−median)
Skewness = σ
Rule of thumb:
Skewness < −1 or Skewness > 1 ⇒ Highly skewed distribution
−1 < Skewness < −0.5 or 0.5 < Skewness < 1 ⇒ Moderately skewed distribution
−0.5 < Skewness < 0.5 ⇒ Approximately symmetric distribution
1 Source: h ps://brownmath.com/stat/shape.htm
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Kurtosis: Fat tailed distribution
1 Source: Pimco
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Interpreting kurtosis
“Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as
opposed to frequent modestly sized deviations.”
A normal distribution has kurtosis of exactly 3 and is called (mesokurtic)
A distribution with kurtosis <3 is called platykurtic. Tails are shorter and thinner, and central
peak is lower and broader.
A distribution with kurtosis >3 is called leptokurtic: Tails are longer and fa er, and central
peak is higher and sharper (fat tailed)
1 Source: h ps://brownmath.com/stat/shape.htm
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating skewness and kurtosis
apple_returns=apple_price.pct_change()
apple_returns.head(3)
date
2015-01-02 NaN
2015-01-05 -0.028172
2015-01-06 0.000094
Name: AAPL, dtype: float64
apple_returns.hist()
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating skewness and kurtosis
print("mean : ", apple_returns.mean())
print("vol : ", apple_returns.std())
print("skew : ", apple_returns.skew())
print("kurt : ", apple_returns.kurtosis())
mean : 0.0006855391415724799
vol : 0.014459504468360529
skew : -0.012440851735057878
kurt : 3.197244607586669
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Alternative
measures of risk
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Looking at downside risk
A good risk measure should focus on potential losses i.e. downside risk
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Sortino ratio
Similar to the Sharpe ratio, just with a
di erent standard deviation
Rp −Rf
Sortino Ratio = σd
σd is the standard deviation of the
downside.
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Sortino ratio in python
# Define risk free rate and target return of 0
rfr = 0
target_return = 0
# Calcualte the daily returns from price data
apple_returns=pd.DataFrame(apple_price.pct_change())
# Select the negative returns only
negative_returns = apple_returns.loc[apple_returns['AAPL'] < target]
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
# Calculate expected return and std dev of downside returns
expected_return = apple_returns['AAPL'].mean()
down_stdev = negative_returns.std()
# Calculate the sortino ratio
sortino_ratio = (expected_return - rfr)/down_stdev
print(sortino_ratio)
0.07887683763760528
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Maximum draw-down
The largest percentage loss from a market peak to trough
Dependent on the chosen time window
The recovery time: time it takes to get back to break-even
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Maximum daily draw-down in Python
# Calculate the maximum value of returns using rolling().max()
roll_max = apple_price.rolling(min_periods=1,window=250).max()
# Calculate daily draw-down from rolling max
daily_drawdown = apple_price/roll_max - 1.0
# Calculate maximum daily draw-down
max_daily_drawdown = daily_drawdown.rolling(min_periods=1,window=250).min()
# Plot the results
daily_drawdown.plot()
max_daily_drawdown.plot()
plt.show()
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Maximum draw-down of Apple
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Comparing against
a benchmark
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Active investing against a benchmark
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Active return for an actively managed portfolio
Active return is the performance of an (active) investment, relative to the investment's
benchmark.
Calculated as the di erence between the benchmark and the actual return.
Active return is achieved by "active" investing, i.e. taking overweight and underweight
positions from the benchmark.
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Tracking error for an index tracker
Passive investment funds, or index trackers, don't use active return as a measure for
performance.
Tracking error is the name used for the di erence in portfolio and benchmark for a passive
investment fund.
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Active weights
1 Source: Schwab Center for Financial Research.
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Active return in Python
# Inspect the data
portfolio_data.head()
mean_ret var pf_w bm_w GICS Sector
Ticker
A 0.146 0.035 0.002 0.005 Health Care
AAL 0.444 0.094 0.214 0.189 Industrials
AAP 0.242 0.029 0.000 0.000 Consumer Discretionary
AAPL 0.225 0.027 0.324 0.459 Information Technology
ABBV 0.182 0.029 0.026 0.010 Health Care
1 Global Industry Classi cation System (GICS)
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Active return in Python
# Calculate mean portfolio return
total_return_pf = (pf_w*mean_ret).sum()
# Calculate mean benchmark return
total_return_bm = (bm_w*mean_ret).sum()
# Calculate active return
active_return = total_return_pf - total_return_bm
print ("Simple active return: ", active_return)
Simple active return: 6.5764
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Active weights in Python
# Group dataframe by GICS sectors
grouped_df=portfolio_data.groupby('GICS Sector').sum()
# Calculate active weights of portfolio
grouped_df['active_weight']=grouped_df['pf_weights']-
grouped_df['bm_weights']
print (grouped_df['active_weight'])
GICS Sector
Consumer Discretionary 20.257
Financials -2.116
...etc
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Risk factors
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
What is a factor?
Factors in portfolios are like nutrients in food
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Factors in portfolios
Di erent types of factors:
Macro factors: interest rates, currency, country, industry
Style factors: momentum, volatility, value and quality
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Using factor models to determine risk exposure
1 Source: h ps://invesco.eu/investment-campus/educational-papers/factor-investing
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Factor exposures
df.head()
date portfolio volatility quality
2015-01-05 -1.827811 1.02 -1.76
2015-01-06 -0.889347 0.41 -0.82
2015-01-07 1.162984 1.07 1.39
2015-01-08 1.788828 0.31 1.93
2015-01-09 -0.840381 0.28 -0.77
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Factor exposures
df.corr()
portfolio volatility quality
portfolio 1.000000 0.056596 0.983416
volatility 0.056596 1.000000 0.092852
quality 0.983416 0.092852 1.000000
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Correlations change over time
# Rolling correlation
df['corr']=df['portfolio'].rolling(30).corr(df['quality'])
# Plot results
df['corr'].plot()
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Rolling correlation with quality
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Factor models
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Using factors to explain performance
Factors are used for risk management.
Factors are used to help explain performance.
Factor models help you relate factors to portfolio returns
Empirical factor models exist that have been tested on historic data.
Fama French 3 factor model is a well-known factor model.
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Fama French Multi Factor model
Rpf = α + βm M KT + βs SM B + βh HM L
MKT is the excess return of the market, i.e. Rm − Rf
SMB (Small Minus Big) a size factor
HML (High Minus Low) a value factor
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Regression model refresher
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Difference between beta and correlation
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Regression model in Python
import statsmodels.api as sm
# Define the model
model = sm.OLS(factor_data['sp500'],
factor_data[['momentum','value']]).fit()
# Get the model predictions
predictions = model.predict(factor_data[['momentum','value']])
b1, b2 = model.params
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
The regression summary output
# Print out the summary statistics
model.summary()
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Obtaining betas quickly
# Get just beta coefficients from linear regression model
b1, b2 = regression.linear_model.OLS(df['returns'],
df[['F1', 'F2']]).fit().params
# Print the coefficients
print 'Sensitivities of active returns to factors:
\nF1: %f\nF2: %f' % (b1, b2)
Sensitivities of active returns to factors:
F1: -0.0381
F2: 0.9858
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Portfolio analysis
tools
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Professional portfolio analysis tools
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Back-testing your strategy
Back-testing: run your strategy on historic data and see how it would have performed
Strategy works on historic data: not guaranteed to work well on future data -> changes in
markets
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Quantopian's pyfolio tool
1 Github: h ps://github.com/quantopian/pyfolio
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Performance and risk analysis in Pyfolio
# Install the package
!pip install pyfolio
# Import the package
import pyfolio as pf
# Read the data as a pandas series
returns=pd.Series(pd.read_csv('pf_returns.csv')
returns.index=pd.to_datetime(returns.index)
# Create a tear sheet on returns
pf.create_returns_tear_sheet(returns)
# If you have backtest and live data
pf.create_returns_tear_sheet(returns, live_start_date='2018-03-01')
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Pyfolio's tear sheet
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Holdings and exposures in Pyfolio
# define our sector mappings
sect_map = {'COST': 'Consumer Goods',
'INTC': 'Technology',
'CERN': 'Healthcare',
'GPS': 'Technology',
'MMM': 'Construction',
'DELL': 'Technology',
'AMD': 'Technology'}
pf.create_position_tear_sheet(returns, positions,
sector_mappings=sect_map)
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Exposure tear sheet results
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Modern portfolio
theory
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Creating optimal portfolios
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
What is Portfolio Optimization?
Meet Harry Markowitz
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
The optimization problem: finding optimal weights
In words:
Minimize the portfolio variance, subject to:
The expected mean return is at least some
target return
The weights sum up to 100%
At least some weights are positive
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Varying target returns leads to the Efficient Frontier
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
PyPortfolioOpt for portfolio optimization
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns
df=pd.read_csv('portfolio.csv')
df.head(2)
XOM RRC BBY MA PFE
date
2010-01-04 54.068794 51.300568 32.524055 22.062426 13.940202
2010-01-05 54.279907 51.993038 33.349487 21.997149 13.741367
# Calculate expected annualized returns and sample covariance
mu = expected_returns.mean_historical_return(df)
Sigma = risk_models.sample_cov(df)
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Get the Efficient Frontier and portfolio weights
# Calculate expected annualized returns and risk
mu = expected_returns.mean_historical_return(df)
Sigma = risk_models.sample_cov(df)
# Obtain the EfficientFrontier
ef = EfficientFrontier(mu, Sigma)
# Select a chosen optimal portfolio
ef.max_sharpe()
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Different optimizations
# Select the maximum Sharpe portfolio
ef.max_sharpe()
# Select an optimal return for a target risk
ef.efficient_risk(2.3)
# Select a minimal risk for a target return
ef.efficient_return(1.5)
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculate portfolio risk and performance
# Obtain the performance numbers
ef.portfolio_performance(verbose=True, risk_free_rate = 0.01)
Expected annual return: 21.3%
Annual volatility: 19.5%
Sharpe Ratio: 0.98
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's optimize a
portfolio!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Maximum Sharpe
vs. minimum
volatility
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Remember the Efficient Frontier?
E cient frontier: all portfolios with an
optimal risk and return trade-o
Maximum Sharpe portfolio: the highest
Sharpe ratio on the EF
Minimum volatility portfolio: the lowest
level of risk on the EF
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Adjusting PyPortfolioOpt optimization
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Maximum Sharpe portfolio
Maximum Sharpe portfolio: the highest Sharpe ratio on the EF
from pypfopt.efficient_frontier import EfficientFrontier
# Calculate the Efficient Frontier with mu and S
ef = EfficientFrontier(mu, Sigma)
raw_weights = ef.max_sharpe()
# Get interpretable weights
cleaned_weights = ef.clean_weights()
{'GOOG': 0.01269,'AAPL': 0.09202,'FB': 0.19856,
'BABA': 0.09642,'AMZN': 0.07158,'GE': 0.02456,...}
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Maximum Sharpe portfolio
# Get performance numbers
ef.portfolio_performance(verbose=True)
Expected annual return: 33.0%
Annual volatility: 21.7%
Sharpe Ratio: 1.43
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Minimum Volatility Portfolio
Minimum volatility portfolio: the lowest level of risk on the EF
# Calculate the Efficient Frontier with mu and S
ef = EfficientFrontier(mu, Sigma)
raw_weights = ef.min_volatility()
# Get interpretable weights and performance numbers
cleaned_weights = ef.clean_weights()
{'GOOG': 0.05664, 'AAPL': 0.087, 'FB': 0.1591,
'BABA': 0.09784, 'AMZN': 0.06986, 'GE': 0.0123,...}
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Minimum Volatility Portfolio
ef.portfolio_performance(verbose=True)
Expected annual return: 17.4%
Annual volatility: 13.2%
Sharpe Ratio: 1.28
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's have another look at the Efficient Frontier
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Maximum Sharpe versus Minimum Volatility
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Alternative portfolio
optimization
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Expected risk and return based on historic data
Mean historic returns, or the historic
portfolio variance are not perfect estimates
of mu and Sigma
Weights from portfolio optimization
therefore not guaranteed to work well on
future data
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Historic data
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Exponentially weighted returns
Need be er measures for risk and return
Exponentially weighted risk and return
assigns more importance to the most
recent data
Exponential moving average in the graph:
most weight on t-1 observation
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Exponentially weighted covariance
The exponential covariance matrix: gives
more weight to recent data
In the graph: exponential weighted
volatility in black, follows real volatility
be er than standard volatility in blue
1 Source: h ps://systematicinvestor.github.io/Exponentially-Weighted-Volatility-RCPP
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Exponentially weighted returns
from pypfopt import expected_returns
# Exponentially weighted moving average
mu_ema = expected_returns.ema_historical_return(df,
span=252, frequency=252)
print(mu_ema)
symbol
XOM 0.103030
BBY 0.394629
PFE 0.186058
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Exponentially weighted covariance
from pypfopt import risk_models
# Exponentially weighted covariance
Sigma_ew = risk_models.exp_cov(df, span=180, frequency=252)
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Using downside risk in the optimization
Remember the Sortino ratio: it uses the variance of negative returns only
PyPortfolioOpt allows you to use semicovariance in the optimization, this is a measure for
downside risk:
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Semicovariance in PyPortfolioOpt
Sigma_semi = risk_models.semicovariance(df,
benchmark=0, frequency=252)
print(Sigma_semi)
XOM BBY MA PFE
XOM 0.018939 0.008505 0.006568 0.004058
BBY 0.008505 0.016797 0.009133 0.004404
MA 0.006568 0.009133 0.018711 0.005373
PFE 0.004058 0.004404 0.005373 0.008349
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Recap
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Charlo e Werger
Data Scientist
Chapter 1: Calculating risk and return
A portfolio as a collection of weight and assets
Diversi cation
Mean returns versus cumulative returns
Variance, standard deviation, correlations and the covariance matrix
Calculating portfolio variance
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Chapter 2: Diving deep into risk measures
Annualizing returns and risk to compare over di erent periods
Sharpe ratio as a measured of risk adjusted returns
Skewness and Kurtosis: looking beyond mean and variance of a distribution
Maximum draw-down, downside risk and the Sortino ratio
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Chapter 3: Breaking down performance
Compare to benchmark with active weights and active returns
Investment factors: explain returns and sources of risk
Fama French 3 factor model to breakdown performance into explainable factors and alpha
Pyfolio as a portfolio analysis tool
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Chapter 4: Finding the optimal portfolio
Markowitz' portfolio optimization: e cient frontier, maximum Sharpe and minimum volatility
portfolios
Exponentially weighted risk and return, semicovariance
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Continued learning
Datacamp course on Portfolio Risk Management in Python
Quantopian's lecture series: h ps://www.quantopian.com/lectures
Learning by doing: Pyfolio and PyPortfolioOpt
INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
End of this course
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N