Introduction to Python
Data Analytics
Course Outline
• Session 1: Basic Concepts
• Session 2: Demonstration of Python Data Analytics
• Session 3: Hands-On Practice
Goal & Scope of This Course
We’re going to cover only the key
concepts in Python data analytics
• Python basics for data analytics
• Python data analytics libraries
• Jupyter Notebook
Quick Survey on Prior Experience
• Python
• I have experience with Python
• I have experience with programming, but not with Python
• I have no experience with programming
• Data analytics
• I have experience with data analytics
• I have no experience with data analytics
What Is Data Analytics?
Data analytics is the process and
methodology of analyzing data
to draw meaningful insight
from the data
Why Is It So Popular?
We now see the limitless potential
for gaining critical insight
by applying data analytics
Typical Process of Data Analytics
Decision Making
Problem
Insight Decision
Development Making
Requirement Data Data Data
Understanding Understanding Preparation Exploration
Modeling &
Deployment
Evaluation
The most time- The most
consuming part exciting part Modeling
Problem
Types of Data Analytics
Data Analytics
Descriptive Predictive Prescriptive
Analytics Analytics Analytics
What has happened or is What could happen in the What should we do to make
happening? future? that happen or not happen?
“How will the population “What actions should be
“How has the population
change over the next ten taken in order to avoid the
been changing?”
years?” demographic cliff?”
Confusion – Machine Learning vs. Data Analytics
AI
Machine Learning Vs. Data Analytics
Data analytics depends
Deep Learning heavily on machine learning
Confusion –AI vs. Data Analytics
AI Vs. Data Analytics
Machine Learning
The goals are different!
• AI: intelligence
Deep Learning • Data analytics: insight
Python as a Programming Language
Python is a general-purposed
high-level programming language
• Web development
• Networking
• Scientific computing
• Data analytics
• …
Python as a Data Analytics Tool
The nature of Python makes it
a perfect-fit for data analytics
• Easy to learn
• Readable
• Scalable
• Extensive set of libraries
• Easy integration with other apps
• Active community & ecosystem
18
Popular Python Data Analytics Libraries
Library Usage
numpy, scipy Scientific & technical computing
pandas Data manipulation & aggregation
mlpy, scikit-learn Machine learning
theano, tensorflow, keras Deep learning
statsmodels Statistical analysis
nltk, gensim Text processing
networkx Network analysis & visualization
bokeh, matplotlib, seaborn, plotly Visualization
beautifulsoup, scrapy Web scraping
19
iPython & Jupyter Notebook
iPython is a Python command shell
for interactive computing
Jupyter Notebook (the former iPython
Notebook) is a web-based interactive
data analysis environment that
supports iPython
20
Comparison – R vs. Python
• Comparison between R and Python has been absolutely one of the hottest
topics in data science communities
R Vs. Python
R came from the statisticians community,
whereas Python came from the computer scientists community
Python is said to be a challenger against R, but in general it’s a tie
It’s up to you to choose the one that best fits your needs
For detailed comparison, refer to https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis
21
Start Jupyter notebook
jupyter notebook
25
Loading Python Libraries
#Import Python Libraries
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns
Press Shift+Enter to execute the jupyter cell
25
Reading data using pandas
In [ ]: #Read csv file
df = pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/Salaries.csv")
Note: The above command has many optional arguments to fine-tune the data import process.
There is a number of pandas commands to read other data formats:
pd.read_excel('myfile.xlsx',sheet_name='Sheet1', index_col=None, na_values=['NA'])
pd.read_stata('myfile.dta')
pd.read_sas('myfile.sas7bdat')
pd.read_hdf('myfile.h5','df')
25
Exploring data frames
In [3]: #List first 5 records
df.head()
Out[3]:
25
Data Frames methods
Unlike attributes, python methods have parenthesis.
All attributes and methods can be listed with a dir() function: dir(df)
df.method() description
head( [n] ), tail( [n] ) first/last n rows
describe() generate descriptive statistics (for numeric columns only)
max(), min() return max/min values for all numeric columns
mean(), median() return mean/median values for all numeric columns
std() standard deviation
sample([n]) returns a random sample of the data frame
dropna() drop all the records with missing values
25
Summary
• Typical Python data analytics process for beginners
1. Identify the dataset of interest from a file/database/web
2. Load the dataset into a Pandas dataframe
3. Check the column names and see the first few rows
4. Derive additional columns if needed and handle missing data
5. Do analysis with visualization or apply advanced data analytics techniques
25
Any Doubts ?
THANK YOU !!
25