) *
Search in the blog... !
Stock Market Data And Analysis In Python
Python For Trading
" Aug 06, 2019
# 15 min read
By Ishan Shah
$
%
In this article, you will learn to get the stock market data such as price, volume
& and fundamental data using python packages and how to analyze it.
'
In backtesting your strategies or analyzing the performance, one of the first
hurdles faced is getting the right stock market data and in the right format, isn't
it? Don't worry.
After reading this, you will be able to:
Fetch the open, high, low, close, and volume data.
Get data at a custom frequency such as 1 minute, 7 minutes or 2 hours
Perform analysis of your portfolio
Get the earnings data, balance sheet data, cash flow statements and
various key ratios such as price to earnings (PE) and price to book value
(PB)
Get the futures and options data for Indian stock market
Generally, web sources are quite unstable and therefore, you will learn to get
the stock market data from multiple web sources.
For easy navigation, this article is divided as below.
1. Price Volume Daily Data
2. Intraday Data
3. Fundamental Data
4. Futures and Options Data
5. Visualization and Analysis
Price Volume Daily Data
Yahoo Finance
One of the first sources from which you can get daily price-volume stock market
data is Yahoo finance. You can use pandas_datareader or yfinance
module to get the data.
In [ ]:
!pip install pandas_datareader==0.7.0
In [22]:
# Import pandas datareader
import pandas_datareader
pandas_datareader.__version__
Out[22]:
'0.7.0'
In [7]:
# Yahoo recently has become an unstable data source.
# If it gives an error, you may run the cell again, or try yf
inance
import pandas as pd
from pandas_datareader import data
# Set the start and end date
start_date = '1990-01-01'
end_date = '2019-02-01'
# Set the ticker
ticker = 'AMZN'
# Get the data
data = data.get_data_yahoo(ticker, start_date, end_date)
data.head()
Out[7]:
Date High Low Open Close Volume Adj
Close
1997- 2.500000 1.927083 2.437500 1.958333 72156000.0 1.958333
05-15
1997- 1.979167 1.708333 1.968750 1.729167 14700000.0 1.729167
05-16
1997- 1.770833 1.625000 1.760417 1.708333 6106800.0 1.708333
05-19
1997- 1.750000 1.635417 1.729167 1.635417 5467200.0 1.635417
05-20
1997- 1.645833 1.375000 1.635417 1.427083 18853200.0 1.427083
05-21
To visualize the adjusted close price data, you can use the matplotlib library and
plot method as shown below.
In [9]:
import matplotlib.pyplot as plt
%matplotlib inline
data['Adj Close'].plot()
plt.show()
Let us improve the plot by resizing, giving appropriate labels and adding grid
lines for better readability.
In [10]:
# Plot the adjusted close price
data['Adj Close'].plot(figsize=(10, 7))
# Define the label for the title of the figure
plt.title("Adjusted Close Price of %s" % ticker, fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Year', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=
0.5)
# Show the plot
plt.show()
Advantages
1. Adjusted close price stock market data is available
2. Most recent stock market data is available
3. Doesn't require API key to fetch the stock market data
Disadvantages
1. It is not a stable source to fetch the stock market data
If the stock market data fetching fails from yahoo finance using the
pandas_datareader then you can use yfinance package to fetch the
data.
Quandl
Quandl has many data sources to get different types of data. However, some
are free and some are paid. Wiki is the free data source of Quandl to get the
data of the end of the day prices of 3000+ US equities.
It is curated by Quandl community and also provides information about the
dividends and split.
To get the stock market data, you need to first install the quandl module if it is
not already installed using the pip command as shown below.
In [ ]:
!pip install quandl
You need to get your own API Key from quandl to get the stock market data
using the below code. If you are facing issue in getting the API key then you can
refer to this link.
After you get your key, assign the variable
QUANDLA P IK EY
with that key. Then set the start date, end date and the ticker of the asset whose
stock market data you want to fetch.
The quandl get method takes this stock market data as input and returns the
open, high, low, close, volume, adjusted values and other information.
In [1]:
# Import the quandl
import quandl
# To get your API key, sign up for a free Quandl account.
# Then, you can find your API key on Quandl account settings
page.
QUANDL_API_KEY = 'REPLACE-THIS-TEXT-WITH-A-REAL-API-KEY'
# This is to prompt you to change the Quandl Key
if QUANDL_API_KEY == 'REPLACE-THIS-TEXT-WITH-A-REAL-API-KEY':
raise Exception("Please provide a valid Quandl API key!")
# Set the start and end date
start_date = '1990-01-01'
end_date = '2018-03-01'
# Set the ticker name
ticker = 'AMZN'
# Feth the data
data = quandl.get('WIKI/'+ticker, start_date=start_date,
Out[1]:
Date Open High Low Close Volume Ex- Split
Dividend Ratio
1997- 22.38 23.75 20.50 20.75 1225000.0 0.0 1.0
05-16
1997- 20.50 21.25 19.50 20.50 508900.0 0.0 1.0
05-19
1997- 20.75 21.00 19.63 19.63 455600.0 0.0 1.0
05-20
1997- 19.25 19.75 16.50 17.13 1571100.0 0.0 1.0
05-21
1997- 17.25 17.38 15.75 16.75 981400.0 0.0 1.0
05-22
In [3]:
# Define the figure size for the plot
plt.figure(figsize=(10, 7))
# Plot the adjusted close price
data['Adj. Close'].plot()
# Define the label for the title of the figure
plt.title("Adjusted Close Price of %s" % ticker, fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Year', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=
0.5)
plt.show()
Get stock market data for multiple tickers
To get the stock market data of multiple stock tickers, you can create a list of
tickers and call the quandl get method for each stock ticker.[1]
For simplicity, I have created a dataframe data to store the adjusted close
price of the stocks.
In [4]:
# Define the ticker list
import pandas as pd
tickers_list = ['AAPL', 'IBM', 'MSFT', 'WMT']
# Import pandas
data = pd.DataFrame(columns=tickers_list)
# Feth the data
for ticker in tickers_list:
data[ticker] = quandl.get('WIKI/' + ticker, start_date=start_date,
end_date=end_date, api_key=QUANDL_API_KEY)['Adj. Close']
# Print first 5 rows of the data
data.head()
Out[4]:
Date AAPL IBM MSFT WMT
1990-01-02 1.118093 14.138144 0.410278 4.054211
1990-01-03 1.125597 14.263656 0.412590 4.054211
1990-01-04 1.129499 14.426678 0.424702 4.033561
1990-01-05 1.133101 14.390611 0.414300 3.990541
1990-01-08 1.140605 14.480057 0.420680 4.043886
In [5]:
# Plot all the close prices
data.plot(figsize=(10, 7))
# Show the legend
plt.legend()
# Define the label for the title of the figure
plt.title("Adjusted Close Price", fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Year', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=
0.5)
plt.show()
Advantages
1. It is free of cost
2. Has split and dividend-adjusted stock market data
Disadvantages
1. Only available till 27-March-2018
Intraday Data
Alpha Vantage
Alpha vantage is used to get the minute level stock market data. You need to
signup on alpha vantage to get the free API key.
In [ ]:
# Install the alpha_vantage if not already installed
!pip install alpha_vantage
Assign the ALPHA_VANTAGE_API_KEY, with your API Key in the below code.
In [12]:
# Import TimeSeries class
from alpha_vantage.timeseries import TimeSeries
ALPHA_VANTAGE_API_KEY = 'REPLACE-THIS-TEXT-WITH-A-REAL-API-KE
Y'
# This is to prompt you to change the ALPHA_VANTAGE Key
if ALPHA_VANTAGE_API_KEY == 'REPLACE-THIS-TEXT-WITH-A-REAL-AP
I-KEY':
raise Exception("Please provide a valid Alpha Vantage API key!")
# Initialize the TimeSeries class with key and output format
ts = TimeSeries(key=ALPHA_VANTAGE_API_KEY, output_format='pan
das')
# Get pandas dataframe with the intraday data and information
of the data
intraday_data, data_info = ts.get_intraday(
'GOOGL', outputsize='full', interval='1min')
# Print the information of the data
Out[12]:
{'1. Information': 'Intraday (1min) open, high, low, close pr
ices and volume',
'2. Symbol': 'GOOGL',
'3. Last Refreshed': '2019-08-01 16:00:00',
'4. Interval': '1min',
'5. Output Size': 'Full size',
'6. Time Zone': 'US/Eastern'}
This gives information about the stock market data which is returned. The
information includes the type of data returned such as open, high, low and
close, the symbol or ticker of the stock, last refresh time of the data, frequency
of the stock market data and the time zone.
In [13]:
# Print the intraday data
intraday_data.head()
Out[13]:
Date Open High Low Close Volume
2019-07- 1228.2300 1232.49 1228.0000 1230.7898 407037.0
26
09:31:00
2019-07- 1230.9200 1235.13 1230.4301 1233.0000 111929.0
26
09:32:00
2019-07- 1233.0000 1237.90 1232.7500 1237.9000 86564.0
26
09:33:00
2019-07- 1237.4449 1241.90 1237.0000 1241.9000 105884.0
26
09:34:00
2019-07- 1241.9399 1244.49 1241.3500 1243.1300 74444.0
26
09:35:00
In [19]:
intraday_data['4. close'].plot(figsize=(10, 7))
# Define the label for the title of the figure
plt.title("Close Price", fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Time', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=
0.5)
plt.show()
Get data at a custom frequency
During strategy modelling, you are required to work with a custom frequency of
stock market data such as 7 minutes or 35 minutes. This custom frequency
candles are not provided by data vendors or web sources.
In this case, you can use the pandas resample method to convert the stock
market data to the frequency of your choice. The implementation of these is
shown below where a 1-minute frequency data is converted to 10-minute
frequency data.
The first step is to define the dictionary with the conversion logic. For example, (
Our cookie policy +
to get the open value the first value will be used, to get the high value the