Introducing Python Pandas
Based on CBSE Curriculum
Class -11
By-
Neha Tyagi
PGT CS
KV 5 Jaipur II Shift
Jaipur Region
Neha Tyagi, KV 5 Jaipur II Shift
Introduction
• Pandas or Python Pandas is a library of Python which is used
for data analysis.
• The term Pandas is derived from “Panel data system” , which
is an ecometric term for multidimentioal, structured data set
ecometrics.
• Now a days, Pandas has become a popular option for Data
Analysis.
• Pandas provides various tools for data analysis in simpler
form.
• Pandas is an Open Source, BSD library built for Python
Programming language.
• Pandas offers high performance, easy to use data structure
and data analysis tools.
• The main author of Pandas is Wes McKinney.
• In this chapter, we will learn about Pandas.
Neha Tyagi, KV5 Jaipur II shift
Installing Pandas
• “pip” command is used to install Pandas. For this, open the
location of pip storage in command prompt (cmd). Goto the
location in windows where pip file is stored.look at the
following screen-
In Windows, after reaching at the location, on
right click with shift you will get the option
“Open Command Window Here”. On clicking,
you will enter in the command promt at the same
path.
Neha Tyagi, KV5 Jaipur II shift
Installing Pandas
• Command window will look like-
•
• Run the command- “pip install pandas”
• The following screen comes after and Pandas will be successfully installed.
Neha Tyagi, KV5 Jaipur II shift
Using Pandas
• Before proceeding, we need to first import the Pandas.
Help(pandas) command will give you all information
about Pandas module.
Neha Tyagi, KV5 Jaipur II shift
Features of Pandas
• Pandas, is the most popular library in Scientific Python
ecosystem for doing data analysis. Pandas is capable of
many taska including-
1. It can read or write in many different data formats(Integer, float, double
etc).
2. It can calculate in all ways data is organized.
3. It can easily select subsets of data from bulky data sets ab=nd even
combine multiple datasets together.
4. It has functionality to find anfd fill missing data.
5. It allows you to apply operations to independent groups within the data.
6. It supports reshaping of data into different forms.
7. It supports advanced time-series functionality(which is the use of a model
to predict future values based on previously observed values).
8. It supports visualization by integrating matplotlib and seaborn etc libraries.
Pandas is best at handling huge tabular data sets comprising different data
formats.
Neha Tyagi, KV5 Jaipur II shift
NumPy Arrays
• Before proceeding towards Pandas’ data structure, let us have a brief
review of NumPy arrays because-
1. Pandas’ some functions return result in form of NumPy array.
2. It will give you a jumpstart with data structure.
• NumPy (“Numerical Python” or Numeric Python”) is an open source
module of Python that provides functions for fast mathematical
computation on arrays and matrices.
• To use NumPy, it is needed to import. Syntax for that is-
>>>import numpy as np
(here np, is an alias for numpy which is optional)
• NumPy arrays come in two forms- See the
• 1-D array – also known as Vectors. difference
between List
• Multidimentional arrays –
and array
Also known as Matrices.
Neha Tyagi, KV5 Jaipur II shift
2D NumPy Arrays
With the help
Accessing Array of list, 2D array
elemets with is created.
index
Printing of Array
To see type of
Array
To see shape of
Array (use of
different functions)
NumPy arrays arr also known as ndarray (n-dimentional array)
Neha Tyagi, KV5 Jaipur II shift
NumPy Arrays Vs Python Lists
• Although NumPy array also holds elements like Python List ,
yet Numpy arrays are different data structures from Python
list. The key differences are-
• Once a NumPy array is created, you cannot change its size.
you will have to create a new array or overwrite the existing
one.
• NumPy array contain elements of homogenous type, unlike
python lists.
• An equivalent NumPy array occupies much less space than a
Python list.
• NumPy array supports Vectorized operation, i.e. you need to
perform any function on every item one by one which is not in
list.
In list, it will generate
error but will be executed
in arrays.
Neha Tyagi, KV5 Jaipur II shift
NumPy Data Types
NumPy supports following data types-
Neha Tyagi, KV5 Jaipur II shift
Ways to Create NumPy Arrays
• empty() function can be used to create empty array or an
unintialized array of specified shape and dtype.
numpy.empty(Shape,[dtype=<datatype>,] [ order = ‘C’ or ‘F’]
Where:dtype: is a data type of python or numpy to set initial values.
Shape: is dimension.
Order : ‘C’ means arrangement of data as row wise(C means C like).
Order : ‘F’ means arrangement of data as row wise ( F means Fortran like)
Here, array is of all zeros
Here, array is of all garbage
values and of default type
“float”
Neha Tyagi, KV5 Jaipur II shift
Ways to Create NumPy Arrays
1. arange( ) function is used to create array from a range.
<arrayname> = numpy.arange([start],stop,[step],[dtype])
Here, only stop value is
passed.
Here, from 1-7 at the step of
2.
2. linspace( ) function can be used to prepare array of range.
<arrayname> = numpy.linspace([start],stop,[dtype])
Here, an array of 6 values is created between
the values 2 and 3.
Here, an array of 8 values is created between the values 2.5 and 8.
Neha Tyagi, KV5 Jaipur II shift
Pandas Data Structure
“A data structure is a particular way of storing and organizing
data in a computer so that it can be accessed and worked with in
appropriate ways. For ex-
-If you want to store similar type of data items together and
process them in identical way , array is the solution.
- If you want to store data in such a way so that you get access
of the very last data item you inserted, stack is the solution.
-If you want to store data in such a way so that data item inserted
first get accessed first, Queue is the solution.
there are many more other types of data structure suited for
different types of functionality.
Further, We will come to know about Series and DataFrame data
structures of Python.
Neha Tyagi, KV5 Jaipur II shift
Series Data Structure
– Series is a data structure of pandas. It represents a 1D array
of indexed data.
– It has two main components-
• An array of actual data.
• An associated array of indexes or data labels.
– Both components are 1D arrays with the same length.
Index Data Index Data Index Data
0 21 Jan 31 ‘A’ 91
1 23 Feb 28 ‘B’ 81
2 18 Mar 31 ‘C’ 71
3 25 Apr 30 ‘D’ 61
Examples of series type objects.
Neha Tyagi, KV5 Jaipur II shift
Creation of Series Objects
– There are many ways to create series type object.
1. Using Series ( )-
<Series Object> = pandas.Series( ) it will create empty series.
2. Non-empty series creation–
Import pandas as pd
<Series Object> = pd.Series(data, index=idx) where data can be
python sequence, ndarray, python dictionary or scaler value.
Index
Index
Neha Tyagi, KV5 Jaipur II shift
Series Objects creation
1. Creation of series with Dictionary-
Index of
Keys
2. Creation of series with Scalar value-
Neha Tyagi, KV5 Jaipur II shift
Creation of Series Objects –Additional functionality
1. When it is needed to create a series with missing values, this
can be achieved by filling missing data with a NaN (“Not a
Number”) value.
2. Index can also be given as-
Loop is used to give Index
Neha Tyagi, KV5 Jaipur II shift
Creation of Series Objects –Additional functionality
3. Dtype can also be passed with Data and index
Important: it is not necessary to
have unique indices but it will give
error when search will be according to
index.
4. Mathematical function/Expression can also be used-
Neha Tyagi, KV5 Jaipur II shift
Series Object Attributes
3. Some common attributes-
<series object>.<AttributeName>
Attribute Description
Series.index Returns index of the series
Series.values Returns ndarray
Series.dtype Returns dtype object of the underlying data
Series.shape Returns tuple of the shape of underlying data
Series.nbytes Return number of bytes of underlying data
Series.ndim Returns the number of dimention
Series.size Returns number of elements
Series.intemsize Returns the size of the dtype
Series.hasnans Returns true if there are any NaN
Series.empty Returns true if series object is empty
Neha Tyagi, KV5 Jaipur II shift
Series Object Attributes
Neha Tyagi, KV5 Jaipur II shift
Accessing Series Object
Object
slicing
Printing object value
Printing Individual value
For Object slicing, follow the following syntax-
<objectName>[<start>:<stop>:<step >]
Neha Tyagi, KV5 Jaipur II shift
Operations on Series Object
1. Elements modification-
<series object>[index] = <new_data_value>
To change
individual value To change value in a
certain slice
Neha Tyagi, KV5 Jaipur II shift
Operations on Series Object
1. It is possible to change indexes
<series object>.<index] = <new_index_array>
Here, indexes got
changed.
Neha Tyagi, KV5 Jaipur II shift
head() and tail () Function
1. head(<n> ) function fetch first n rows from a pandas object. If
you do not provide any value for n, will return first 5 rows.
2. tail(<n> ) function fetch last n rows from a pandas object. If
you do not provide any value for n, will return last 5 rows.
Neha Tyagi, KV5 Jaipur II shift
Series Objects - Series Objects -
Vector Operations Arithmetic Operations
All these are
vector operations
Arithmetic operation is
possible on objects of
same index otherwise
will result as NaN.
Neha Tyagi, KV5 Jaipur II shift
We can also store these results in other objects.
Entries Filtering
<seriesObject> <series - boolean expression >
Other feature
To delete value of
index
Neha Tyagi, KV5 Jaipur II shift
Difference between NumPy array Series objects
1. In case of ndarray, vector operation is possible only
when ndarray are of similar shape. Whereas in case
of series object, it will be aligned only with matching
index otherwise NaN will be returned.
2. In ndarray, index always starts from 0 and always
numeric. Whereas, in series, index can be of any
type including number and not necessary to start
from 0.
Neha Tyagi, KV5 Jaipur II shift
Thank you
Please follow us on our blog
www.pythontrends.wordpress.com
Neha Tyagi, KV 5 Jaipur II Shift