Python for Data Science
ibm200620@yahoo.com
FK4AW1IM38
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Agenda
1. Pop Quiz
2. Common Python Libraries for Data Science
3. NumPy and Pandas
4. Common NumPy functions
5. Common Pandas functions
ibm200620@yahoo.com
6. Merge vs Join
FK4AW1IM38 in Pandas
7. Example of Join
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Pop Quiz
1. What are the data types in Python?
2. What are some of the common Python libraries for Data Science?
3. Can you list some of the common functions in Pandas?
4. What are the applications of the functions like group by, merge, join etc?
ibm200620@yahoo.com
FK4AW1IM38
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Common Python Libraries for Data Science
Library Use
NumPy Handling multi-dimensional arrays
Scipy Scientific computation package
Matplotlib, Seaborn Data visualisation
ibm200620@yahoo.com
FK4AW1IM38
Pandas Handling tabular data
Scikit-learn Machine learning
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
NumPy
● Stands for Numerical Python
● It is one of the fundamental packages for mathematical, logical, and statistical operations with
Python
● It contains
○ Powerful N-dimensional array object, called ndarray
ibm200620@yahoo.com
FK4AW1IM38○ Large set of functions for creating, manipulating, and transforming ndarrays
● ndarrays can only contain data of a single datatype
● Useful in linear algebra, vector calculus, random number capabilities, etc
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Pandas
● Pandas is one of the fundamental packages for analysis and manipulation of tabular data
● Offers two major data structures - series & dataframe
● We can think of a pandas dataframe like an excel spreadsheet that is storing some data in rows and
columns.
● A pandas dataframe is made up of several pandas series
ibm200620@yahoo.com
FK4AW1IM38○ Each column of a dataframe is a series.
● Pandas dataframes can contain data of multiple datatypes
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common NumPy Functions
Function Description
np.array() To create an array
np.arange() Return evenly spaced values within a given interval
np.linspace() Return evenly spaced numbers over a specified interval
ibm200620@yahoo.com
FK4AW1IM38
np.zeros() To create an array of zeros
np.ones() To create an array of ones
np.transpose() Permute array dimensions
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common NumPy Functions
Function Description
np.random.rand() To create an array of specified shape filled with random values
np.random.randint() Return random integers from low (inclusive) to high (exclusive)
np.random.randn() Return a sample (or samples) from the “standard normal” distribution.
ibm200620@yahoo.com
np.concatenate()
FK4AW1IM38
Concatenate two arrays
np.save() Save an array to a binary file in .npy format.
np.savez() Save several arrays into a single file in uncompressed .npz format.
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Pandas Functions
Function Description
pd.read_csv() Read a comma-separated values (csv) file into DataFrame
df.loc[] Access a group of rows and columns by label(s)
df.iloc[] Purely integer-location based indexing for selection by position
ibm200620@yahoo.com
FK4AW1IM38
df.drop() Drop specified labels from rows or columns
pd.concat() To concatenate two pandas objects
pd.merge() To merge the pandas dataframes
df.groupby() To split, apply or combine the data structures
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Pandas Functions
Function Description
df.value_counts() To get count of some attributes
df.unique() To get unique values
df.dtype To get the data types
ibm200620@yahoo.com
FK4AW1IM38
df.shape To get the shape (number or rows and columns)
df.head() To get the top rows
df.tail() To get the last rows
df.describe() To get the quick statistic summary
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Merge vs Join
• Join: The join method works best when we are joining dataframes on their indexes (though you
can specify another column to join on for the left dataframe).
• Merge: The merge method is more versatile and allows us to specify columns besides the index
to join on for both dataframes.
Natural join - Full outer join -
Left outer join Right outer join
Intersection Union
ibm200620@yahoo.com
FK4AW1IM38To keep only rows To keep all rows from To include all the To include all the
that match from the both data frames, rows of your data rows of your data
data frames frame x and only frame y and only
those from y that those from x that
match match,
how=‘inner’. how=‘outer’. how =‘left’. how=‘right’.
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Example of Join
ibm200620@yahoo.com
FK4AW1IM38
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Example of Join
ibm200620@yahoo.com
FK4AW1IM38
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Happy Learning !
ibm200620@yahoo.com
FK4AW1IM38
This file is meant for personal use by ibm200620@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action. 14
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.