#Big Data Analytics With Numpy in Python
#Big Data Analytics With Numpy in Python
財務金融大數據分析 University
Big Data Analytics in Finance
Python Numpy大數據分析
(Big Data Analytics with Numpy
in Python)
1061BDAF07
MIS EMBA (M2322) (8605)
Thu 12,13,14 (19:20-22:10) (D503)
Min-Yuh Day
戴敏育
Assistant Professor
專任助理教授
Dept. of Information Management, Tamkang University
淡江大學 資訊管理學系
http://mail. tku.edu.tw/myday/
2017-11-09 1
課程大綱 (Syllabus)
週次 (Week) 日期 (Date) 內容 (Subject/Topics)
1 2017/09/21 財務金融大數據分析課程介紹
(Course Orientation for Big Data Analytics in Finance)
2 2017/09/28 金融科技商業模式 (Business Models of Fintech)
3 2017/10/05 人工智慧投資分析與機器人理財顧問
(Artificial Intelligence for Investment Analysis and
Robo-Advisors)
4 2017/10/12 金融科技對話式商務與智慧型交談機器人
(Conversational Commerce and
Intelligent Chatbots for Fintech)
5 2017/10/19 事件研究法 (Event Study)
6 2017/10/26 財務金融大數據分析個案研究 I
(Case Study on Big Data Analytics in Finance I)
2
課程大綱 (Syllabus)
週次 (Week) 日期 (Date) 內容 (Subject/Topics)
7 2017/11/02 Python 財務大數據分析基礎
(Foundations of Finance Big Data Analytics in Python)
8 2017/11/09 Python Numpy大數據分析
(Big Data Analytics with Numpy in Python)
9 2017/11/16 Python Pandas 財務大數據分析
(Finance Big Data Analytics with Pandas in Python)
10 2017/11/23 期中報告 (Midterm Project Report)
11 2017/11/30 文字探勘分析技術與自然語言處理
(Text Mining Techniques and
Natural Language Processing)
12 2017/12/07 Python Keras深度學習
(Deep Learning with Keras in Python)
3
課程大綱 (Syllabus)
週次 (Week) 日期 (Date) 內容 (Subject/Topics)
13 2017/12/14 財務金融大數據分析個案研究 II
(Case Study on Big Data Analytics in Finance II)
14 2017/12/21 TensorFlow深度學習
(Deep Learning with TensorFlow)
15 2017/12/28 財務金融大數據深度學習
(Deep Learning for Finance Big Data)
16 2018/01/04 社會網絡分析 (Social Network Analysis)
17 2018/01/11 期末報告 I (Final Project Presentation I)
18 2018/01/18 期末報告 II (Final Project Presentation II)
4
Big Data Analytics
with
Numpy
in Python 5
Numpy
NumPy
Base
N-dimensional array
package
6
NumPy
is the
fundamental package
for
scientific computing
with Python.
Source: http://www.numpy.org/ 7
Wes McKinney (2012),
Python for Data Analysis: Data Wrangling with
Pandas, NumPy, and IPython, O'Reilly Media
Source: http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793/ 8
The Quant Finance PyData Stack
Source: http://nbviewer.jupyter.org/format/slides/github/quantopian/pyfolio/blob/master/pyfolio/examples/overview_slides.ipynb#/5 9
NumPy
http://www.numpy.org/ 10
Source: https://www.python.org/community/logos/ 11
Python
12
Anaconda-Navigator
Launchpad
13
Anaconda-Navigator
14
Jupyter Notebook
15
Jupyter Notebook
New Python 3
16
print(“hello, world”)
17
Text input and output
print("Hello World")
x = 3
print(x)
x = 2
y = 3
print(x, ' ', y)
x = int(input("What is x? "))
Source: http://pythonprogramminglanguage.com/text-input-and-output/ 19
Variables
x = 2
price = 2.5
word = 'Hello'
word = 'Hello'
word = "Hello"
word = '''Hello'''
x = 2
x = x + 1
x = 5
Source: http://pythonprogramminglanguage.com/ 20
Python Basic Operators
print('7 + 2 =', 7 + 2)
print('7 - 2 =', 7 - 2)
print('7 * 2 =', 7 * 2)
print('7 / 2 =', 7 / 2)
print('7 // 2 =', 7 // 2)
print('7 % 2 =', 7 % 2)
print('7 ** 2 =', 7 ** 2)
21
BMI Calculator in Python
height_m = height_cm/100
BMI = (weight_kg/(height_m**2))
Source: http://code.activestate.com/recipes/580615-bmi-code/ 22
BMI Calculator in Python
Source: http://code.activestate.com/recipes/580615-bmi-code/ 23
Future value
of a specified
principal amount,
rate of interest, and
a number of years
Source: https://www.w3resource.com/python-exercises/python-basic-exercise-39.php 24
Future Value (FV)
print(100 * 1.1 ** 7)
# output = 194.87
Source: https://www.w3resource.com/python-exercises/python-basic-exercise-39.php 25
Future Value (FV)
pv = 100
r = 0.1
n = 7
fv = pv * ((1 + (r)) ** n)
print(round(fv, 2))
26
Future Value (FV)
amount = 100
interest = 10 #10% = 0.01 * 10
years = 7
Source: https://www.w3resource.com/python-exercises/python-basic-exercise-39.php 27
if statements
> greater than
< smaller than
== equals
!= is not
score = 80
if score >=60 :
print("Pass")
else:
print("Fail")
Source: http://pythonprogramminglanguage.com/ 28
if elif else
score = 90 A
grade = ""
if score >=90:
grade = "A"
elif score >= 80:
grade = "B"
elif score >= 70:
grade = "C"
elif score >= 60:
grade = "D"
else:
grade = "E"
print(grade)
# grade = ”A”
http://pythontutor.com/visualize.html
https://goo.gl/E6w5ph
Source: http://pythonprogramminglanguage.com/ 29
for loops
for i in range(1,11):
print(i)
1
2
3
4
5
6
7
8
9
10
Source: http://pythonprogramminglanguage.com/ 30
for loops
for i in range(1,10):
for j in range(1,10):
print(i, ' * ' , j , ' = ', i*j)
9 * 1 = 9
9 * 2 = 18
9 * 3 = 27
9 * 4 = 36
9 * 5 = 45
9 * 6 = 54
9 * 7 = 63
9 * 8 = 72
9 * 9 = 81
Source: http://pythonprogramminglanguage.com/ 31
while loops
age = 10 10
11
while age < 20: 12
13
print(age) 14
age = age + 1 15
16
17
18
19
Source: https://learnpython.trinket.io/learn-python-part-8-loops#/while-loops/about-while-loops 32
Functions
def convertCMtoM(xcm):
m = xcm/100
return m
cm = 180
m = convertCMtoM(cm)
print(str(m))
1.8
33
Lists
x = [60, 70, 80, 90]
print(len(x))
print(x[0])
print(x[1])
print(x[-1])
4
60
70
90
34
Tuples
A tuple in Python is a collection that
cannot be modified.
A tuple is defined using parenthesis.
x = (10, 20, 30, 40, 50)
print(x[0]) 10
print(x[1]) 20
print(x[2]) 30
print(x[-1]) 50
Source: http://pythonprogramminglanguage.com/tuples/ 35
Dictionary
k = { 'EN':'English', 'FR':'French' }
print(k['EN'])
English
Source: http://pythonprogramminglanguage.com/dictionary/ 36
Sets
animals = {'cat', 'dog'}
Source: http://cs231n.github.io/python-numpy-tutorial/ 37
File Input / Output
with open('myfile.txt', 'w') as file:
file.write('Hello World\nThis is Python File Input )
Output'
Source: https://github.com/TiesdeKok/LearnPythonforResearch/blob/master/0_python_basics.ipynb 38
File Input / Output
with open('myfile.txt', 'a+') as file:
file.write('\n' + 'New line')
Source: https://github.com/TiesdeKok/LearnPythonforResearch/blob/master/0_python_basics.ipynb 39
Numpy
NumPy
Base
N-dimensional array
package
40
NumPy
NumPy
• NumPy provides a
multidimensional array object
to store homogenous or
heterogeneous data;
it also provides
optimized functions/methods to
operate on this array object.
Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 41
NumPy ndarray
One-dimensional Array
NumPy
(1-D Array)
0 1 n-1
1 2 3 4 5
Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20
42
NumPy
NumPy
v = range(1, 6)
print(v)
2 * v
import numpy as np
v = np.arange(1, 6)
v
2 * v
Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 43
NumPy
Base
N-dimensional
array package
44
NumPy
NumPy Create Array
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a * b
c
Source: Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly 45
NumPy
NumPy
Source: http://cs231n.github.io/python-numpy-tutorial/ 46
Numpy Quickstart Tutorial
https://docs.scipy.org/doc/numpy-dev/user/quickstart.html 47
import numpy as np
a = np.arange(15).reshape(3, 5)
a.shape
a.ndim
a.dtype.name
Source: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html 48
Matrix
Source: https://simple.wikipedia.org/wiki/Matrix_(mathematics) 49
NumPy ndarray:
Multidimensional Array Object
50
NumPy ndarray
One-dimensional Array
(1-D Array)
0 1 n-1
1 2 3 4 5
Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20
51
import numpy as np
a = np.array([1,2,3,4,5])
One-dimensional Array
(1-D Array)
0 1 n-1
1 2 3 4 5
52
a = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20]])
Two-dimensional Array
(2-D Array)
0 1 n-1
0 1 2 3 4 5
1 6 7 8 9 10
11 12 13 14 15
m-1 16 17 18 19 20
53
import numpy as np
a = np.array([[0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23]])
a
0 1 2 3
10 11 12 13
20 21 22 23
54
a = np.array ([[0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23]])
0 1 2 3
10 11 12 13
20 21 22 23
55
NumPy Basics:
Arrays and Vectorized
Computation
Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 56
NumPy Array
Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 57
Numpy Array
Source: https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html 58
Wes McKinney (2017), "Python for Data Analysis: Data Wrangling
with Pandas, NumPy, and IPython", 2nd Edition, O'Reilly Media.
https://github.com/wesm/pydata-book 59
Wes McKinney (2017), "Python for Data Analysis: Data Wrangling
with Pandas, NumPy, and IPython", 2nd Edition, O'Reilly Media.
Source: https://github.com/wesm/pydata-book/blob/2nd-edition/ch04.ipynb 60
Python
Pandas for
Finance 61
pandas
http://pandas.pydata.org/ 62
pandas
Python Data Analysis
Library
providing high-performance, easy-to-use
data structures and data analysis tools
for the Python programming language.
Source: http://pandas.pydata.org/ 63
Jupyter Notebook New Python 3
64
Creating pd.DataFrame
a b c
1 4 7 10
2 5 8 11
3 6 9 12
Source: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
65
Pandas DataFrame
type(df)
66
conda install pandas-datareader
67
Jupyter Notebook New Python 3
68
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
print('Hello Pandas')
s = pd.Series([1,3,5,np.nan,6,8])
s
dates = pd.date_range('20170301',
periods=6)
dates
Source: http://pandas.pydata.org/pandas-docs/stable/10min.html 69
70
df = pd.DataFrame(np.random.randn(6,4),
index=dates, columns=list('ABCD'))
df
71
df = pd.DataFrame(np.random.randn(4,6),
index=['student1','student2','student3',
'student4'], columns=list('ABCDEF'))
df
72
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20170322'),
'C' : pd.Series(2.5,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
df2
73
df2.dtypes
74
Yahoo Finance Symbols: AAPL
Apple Inc. (AAPL)
http://finance.yahoo.com/q?s=AAPL 75
Apple Inc. (AAPL) -NasdaqGS
http://finance.yahoo.com/quote/AAPL?p=AAPL 76
Yahoo Finance Charts: Apple Inc. (AAPL)
http://finance.yahoo.com/chart/AAPL 77
Apple Inc. (AAPL) Historical Data
http://finance.yahoo.com/q/hp?s=AAPL+Historical+Prices 78
Yahoo Finance Historical Prices
Apple Inc. (AAPL)
http://finance.yahoo.com/quote/AAPL/history 79
Yahoo Finance Historical Prices
Apple Inc. (AAPL)
http://finance.yahoo.com/quote/AAPL/history?period1=345398400&period2=1490112000&interval=1d&filter=history&frequency=1d 80
Yahoo Finance Historical Prices
Apple Inc. (AAPL)
http://finance.yahoo.com/quote/AAPL/history?period1=345398400&period2=1490112000&interval=1d&filter=history&frequency=1d 81
Yahoo Finance Historical Prices
http://ichart.finance.yahoo.com/table.csv?s=AAPL
82
Yahoo Finance Charts
Alphabet Inc. (GOOG)
http://finance.yahoo.com/echarts?s=GOOG+Interactive#{"showArea":false,"showLine":false,"showCandle":true,"lineType":"candle","range":"5y","allowChartStacking":true} 83
Dow Jones Industrial Average
(^DJI)
http://finance.yahoo.com/chart/^DJI 84
TSEC weighted index (^TWII) -
Taiwan
http://finance.yahoo.com/chart/^DJI 85
Taiwan Semiconductor Manufacturing Company Limited
(2330.TW)
http://finance.yahoo.com/q?s=2330.TW 86
Yahoo Finance Charts
TSMC (2330.TW)
87
import pandas as pd
import pandas_datareader.data as web
df = web.DataReader('AAPL', data_source='yahoo',
start='1/1/2010', end='3/21/2017')
df.to_csv('AAPL.csv')
df.tail()
88
df = web.DataReader('GOOG',
data_source='yahoo', start='1/1/1980',
end='3/21/2017')
df.head(10)
89
df.tail(10)
90
df.count()
91
df.ix['2015-12-31']
92
df.to_csv('2330.TW.Yahoo.Finance.Data.csv')
93
Python Pandas
for Finance
Source: https://mapattack.wordpress.com/2017/02/12/using-python-for-stocks-1/ 94
Python Pandas for Finance
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
%matplotlib inline
Source: https://mapattack.wordpress.com/2017/02/12/using-python-for-stocks-1/ 95
Python Pandas for Finance
#Read Stock Data from Yahoo Finance
end = dt.datetime.now()
#start = dt.datetime(end.year-2, end.month, end.day)
start = dt.datetime(2015, 1, 1)
df = web.DataReader("AAPL", 'yahoo', start, end)
df.to_csv('AAPL.csv')
df.from_csv('AAPL.csv')
df.tail()
Source: https://mapattack.wordpress.com/2017/02/12/using-python-for-stocks-1/ 96
Finance Data from Quandl
import quandl
df = quandl.get("WIKI/AAPL", start_date="2015-01-01", end_date="2017-10-31" )
df.to_csv('AAPL.csv')
df.from_csv('AAPL.csv')
df.tail()
Source: https://www.quandl.com/tools/python 97
Python Pandas for Finance
df['Adj Close'].plot(legend=True,
figsize=(12, 8), title='AAPL', label='Adj
Close')
Source: https://mapattack.wordpress.com/2017/02/12/using-python-for-stocks-1/ 98
Python Pandas for Finance
plt.figure(figsize=(12,9))
top = plt.subplot2grid((12,9), (0, 0),
rowspan=10, colspan=9)
bottom = plt.subplot2grid((12,9), (10,0),
rowspan=2, colspan=9)
top.plot(df.index, df['Adj Close'],
color='blue') #df.index gives the dates
bottom.bar(df.index, df['Volume'])
plt.figure(figsize=(12,9))
sns.distplot(df['Adj Close'].dropna(), bins=50, color='purple')
# sURL = "http://ichart.finance.yahoo.com/table.csv?s=AAPL"
# sBaseURL = "http://ichart.finance.yahoo.com/table.csv?s="
sURL = "http://ichart.finance.yahoo.com/table.csv?s=" + sSymbol
#req = requests.get("http://ichart.finance.yahoo.com/table.csv?s=2330.TW")
#req = requests.get("http://ichart.finance.yahoo.com/table.csv?s=AAPL")
req = requests.get(sURL)
sText = req.text
#print(sText)
#df = web.DataReader(sSymbol, 'yahoo', starttime, endtime)
#df = web.DataReader("2330.TW", 'yahoo')
sPath = "data/"
sPathFilename = sPath + sSymbol + ".csv"
print(sPathFilename)
f = open(sPathFilename, 'w')
f.write(sText)
f.close()
sIOdata = io.StringIO(sText)
df = pd.DataFrame.from_csv(sIOdata)
df.head(5)
109
110
df.tail(5)
111
sSymbol = "AAPL”
# sURL = "http://ichart.finance.yahoo.com/table.csv?s=AAPL"
sURL = "http://ichart.finance.yahoo.com/table.csv?s=" + sSymbol
#req = requests.get("http://ichart.finance.yahoo.com/table.csv?s=AAPL")
req = requests.get(sURL)
sText = req.text
#print(sText)
sPath = "data/"
sPathFilename = sPath + sSymbol + ".csv"
print(sPathFilename)
f = open(sPathFilename, 'w')
f.write(sText)
f.close()
sIOdata = io.StringIO(sText)
df = pd.DataFrame.from_csv(sIOdata)
df.head(5)
112
113
114
def getYahooFinanceData(sSymbol, starttime, endtime, sDir):
#GetMarketFinanceData_From_YahooFinance
#"^TWII"
#"000001.SS"
#"AAPL"
#SHA:000016"
#"600000.SS"
#"2330.TW"
#sSymbol = "^TWII"
starttime = datetime.datetime(2000, 1, 1)
endtime = datetime.datetime(2015, 12, 31)
sPath = sDir
#sPath = "data/financedata/"
df_YahooFinance = web.DataReader(sSymbol, 'yahoo', starttime, endtime)
#df_01 = web.DataReader("2330.TW", 'yahoo')
sSymbol = sSymbol.replace(":","_")
sSymbol = sSymbol.replace("^","_")
sPathFilename = sPath + sSymbol + "_Yahoo_Finance.csv"
df_YahooFinance.to_csv(sPathFilename)
#df_YahooFinance.head(5)
return sPathFilename
#End def getYahooFinanceData(sSymbol, starttime, endtime, sDir):
115
116
sSymbol = "AAPL”
starttime = datetime.datetime(2000, 1, 1)
endtime = datetime.datetime(2015, 12, 31)
sDir = "data/financedata/"
117
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
%matplotlib inline
plt.figure(figsize=(12,9))
sns.distplot(df['Adj Close'].dropna(), bins=50, color='purple')
https://www.quantopian.com/ 121
References
• Wes McKinney (2012), Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython,
O'Reilly Media
• Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly
• Yves Hilpisch (2015), Derivatives Analytics with Python: Data Analysis, Models, Simulation,
Calibration and Hedging, Wiley
• Michael Heydt (2015) , Mastering Pandas for Finance, Packt Publishing
• Michael Heydt (2015), Learning Pandas - Python Data Discovery and Analysis Made Easy, Packt
Publishing
• James Ma Weiming (2015), Mastering Python for Finance, Packt Publishing
• Fabio Nelli (2015), Python Data Analytics: Data Analysis and Science using PANDAs, matplotlib and
the Python Programming Language, Apress
• Wes McKinney (2013), 10-minute tour of pandas, https://vimeo.com/59324550
• Jason Wirth (2015), A Visual Guide To Pandas, https://www.youtube.com/watch?v=9d5-Ti6onew
• Edward Schofield (2013), Modern scientific computing and big data analytics in Python, PyCon
Australia, https://www.youtube.com/watch?v=hqOsfS3dP9w
• Python Programming, https://pythonprogramming.net/
• Python, https://www.python.org/
• Python Programming Language, http://pythonprogramminglanguage.com/
• Numpy, http://www.numpy.org/
• Pandas, http://pandas.pydata.org/
122
References
• Wes McKinney (2017), "Python for Data Analysis: Data Wrangling with
Pandas, NumPy, and IPython", 2nd Edition, O'Reilly Media.
https://github.com/wesm/pydata-book
• Avinash Jain (2017), Introduction To Python Programming, Udemy,
https://www.udemy.com/pythonforbeginnersintro/
• Alfred Essa (2015), Awesome Data Science: 1.0 Jupyter Notebook Tour,
https://www.youtube.com/watch?v=e9cSF3eVQv0
• Ties de Kok (2017), Learn Python for Research,
https://github.com/TiesdeKok/LearnPythonforResearch
• Ivan Idris (2015), Numpy Beginner's Guide, Third Edition, Packt Publishing
• Numpy Tutorial, https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
123