0% found this document useful (0 votes)

15 views15 pages

Unit III - Pandas - Data Manipulation Using Python

Uploaded by

SARAVANAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views15 pages

Unit III - Pandas - Data Manipulation Using Python

Uploaded by

SARAVANAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Unit - III

Pandas

Introducing Pandas Objects

 Pandas objects is enhanced versions of NumPy structured arrays in which the rows and
columns are identified with labels rather than simple integer indices.

 Three fundamental Pandas data structures: the Series, DataFrame, and Index.

import numpy as np
import pandas as pd

The Pandas Series Object

 A Pandas Series is a one-dimensional array of indexed data.

 It can be created from a list or array as follows:

data = pd.Series([0.25, 0.5, 0.75, 1.0])

data
0 0.25
1 0.50
2 0.75
3 1.00
dtype: float64

data.values
array([ 0.25, 0.5 , 0.75, 1. ])

data.index
RangeIndex(start=0, stop=4, step=1)

data[1]
0.5

data[1:3]
1 0.50
2 0.75
dtype: float64
Series as generalized NumPy array
 We can use strings as an index:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64

data['b']
0.5

 We can even use non-contiguous or non-sequential indices:

data = pd.Series([0.25, 0.5, 0.75, 1.0],

index=[2, 5, 3, 7])
data
2 0.25
5 0.50
3 0.75
7 1.00
dtype: float64

data[5]
0.5

Series as specialized dictionary

population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
population
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64

population['California']
38332521

population['California':'Illinois']
California 38332521
Florida 19552860
Illinois 12882135
dtype: int64
Constructing Series objects

>>> pd.Series(data, index=index)

 Where index is an optional argument and data can be one of many entities.

pd.Series([2, 4, 6])
0 2
1 4
2 6
dtype: int64

pd.Series(5, index=[100, 200, 300])

100 5
200 5
300 5
dtype: int64

pd.Series({2:'a', 1:'b', 3:'c'})

1 b
2 a
3 c
dtype: object

pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

3 c
2 a
dtype: object

The Pandas DataFrame Object

 Series is an analog of a one-dimensional array with flexible indices,

 DataFrame is an analog of a two-dimensional array with both flexible row indices and
flexible column names.

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,

'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
dtype: int64
states = pd.DataFrame({'population': population,
'area': area})
states
area population
California42396738332521
Florida 17031219552860
Illinois 14999512882135
New York14129719651127
Texas 69566226448193

states.index
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

states.columns
Index(['area', 'population'], dtype='object')

DataFrame as specialized dictionary

states['area']
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64

Constructing DataFrame objects

 A Pandas DataFrame can be constructed in a variety of ways.

From a single Series object

 A DataFrame is a collection of Series objects, and a single-column DataFrame can be
constructed from a single Series:

pd.DataFrame(population, columns=['population'])
population

California38332521

Florida 19552860

Illinois 12882135

New York19651127

Texas 26448193
From a list of dicts
data = [{'a': i, 'b': 2 * i}
for i in range(3)]
pd.DataFrame(data)
a b

0 0 0

1 1 2

2 2 4

From a dictionary of Series objects

pd.DataFrame({'population': population,
'area': area})
area population
California42396738332521
Florida 17031219552860
Illinois 14999512882135
New York14129719651127
Texas 69566226448193

From a two-dimensional NumPy array

pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])

foo bar

a 0.865257 0.213169

b 0.442759 0.108267

c 0.047110 0.905718

From a NumPy structured array

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A
array([(0, 0.0), (0, 0.0), (0, 0.0)],
dtype=[('A', '<i8'), ('B', '<f8')])

pd.DataFrame(A)

A B

0 0 0.0

1 0 0.0

2 0 0.0

The Pandas Index Object

 Both the Series and DataFrame objects contain an explicit index that lets you reference
and modify data.
 Index object is an interesting structure in itself, and it can be thought of either as
an immutable array or as an ordered set.

ind = pd.Index([2, 3, 5, 7, 11])

ind
Int64Index([2, 3, 5, 7, 11], dtype='int64')

Index as immutable array

ind[1]
3

ind[::2]
Int64Index([2, 5, 11], dtype='int64')

print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64

 One difference between Index objects and NumPy arrays is that indices are immutable–
that is, they cannot be modified via the normal means:

ind[1] = 0
Index as ordered set
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
indA & indB # intersection
Int64Index([3, 5, 7], dtype='int64')

indA | indB # union

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

indA ^ indB # symmetric difference

Int64Index([1, 2, 9, 11], dtype='int64')

Data Selection in Series

Indexers: loc, iloc, and ix

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

data
1 a
3 b
5 c
dtype: object

# explicit index when indexing

data[1]
'a'
# implicit index when slicing
data[1:3]
3 b
5 c
dtype: object

 First, the loc attribute allows indexing and slicing that always references the explicit
index:

data.loc[1]
'a'

data.loc[1:3]
1 a
3 b
dtype: object

 The iloc attribute allows indexing and slicing that always references the implicit Python-
style index:

data.iloc[1]
'b'
data.iloc[1:3]
3 b
5 c
dtype: object

 A third indexing attribute, ix, is a hybrid of the two, and for Series objects is equivalent to
standard []-based indexing.
 The purpose of the ix indexer will become more apparent in the context
of DataFrame objects

Operating on Data in Pandas

 One of the essential pieces of NumPy is the ability to perform quick element-wise
operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and with
more sophisticated operations (trigonometric functions, exponential and logarithmic
functions, etc.).

 Pandas inherit much of this functionality from NumPy, and the ufuncs.

 Pandas include a couple useful twists, however:

i. For unary operations like negation and trigonometric functions, these ufuncs
will preserve index and column labels in the output.
ii. For binary operations such as addition and multiplication, Pandas will
automatically align indices when passing the objects to the ufunc.

Handling Missing Data

 The difference between data found in many tutorials and data in the real world is that
real-world data is rarely clean and homogeneous.

 In particular, many interesting datasets will have some amount of data missing.

 Different data sources may indicate missing data in different ways.

 Here we will see some general considerations for missing data, and how Pandas chooses
to represent it.

 Demonstrate some built-in Pandas tools for handling missing data in Python.
 We refer missing data in general as null, NaN, or NA values.

 There are a number of schemes that have been developed to indicate the presence of
missing data in a table or DataFrame.

 Two strategies: using a mask that globally indicates missing values, or choosing
a sentinel value (indicating a missing integer value with -9999 or NaN or None) that
indicates a missing entry.

Operating on Null Values

 Pandas treats None and NaN as essentially interchangeable for indicating missing or null
values.

 To facilitate this convention, there are several useful methods for detecting, removing,
and replacing null values in Pandas data structures. They are:
 isnull(): Generate a boolean mask indicating missing values
 notnull(): Opposite of isnull()
 dropna(): Return a filtered version of the data
 fillna(): Return a copy of the data with missing values filled or imputed

Hierarchical Indexing
 Pandas provide objects that handle three-dimensional and four-dimensional data.

 Common pattern in practice is to make use of hierarchical indexing (also known

as multi-indexing) to incorporate multiple index levels within a single index.

 In this way, higher-dimensional data can be compactly represented within the familiar
one-dimensional Series and two-dimensional DataFrame objects.

Pandas MultiIndex
import pandas as pd
import numpy as np
index = [('California', 2000), ('California', 2010),
('New York', 2000), ('New York', 2010),
('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956,
18976457, 19378102,
20851820, 25145561]
pop = pd.Series(populations, index=index)
pop
(California, 2000) 33871648
(California, 2010) 37253956
(New York, 2000) 18976457
(New York, 2010) 19378102
(Texas, 2000) 20851820
(Texas, 2010) 25145561
dtype: int64

index = pd.MultiIndex.from_tuples(index)
index
MultiIndex(levels=[['California', 'New York', 'Texas'], [2000, 2010]],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

 The MultiIndex contains multiple levels of indexing–in this case, the state names and the
years, as well as multiple labels for each data point which encode these levels.

 If we re-index our series with this MultiIndex, we see the hierarchical representation of
the data:
pop = pop.reindex(index)
pop
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

pop[:, 2010]
California 37253956
New York 19378102
Texas 25145561
dtype: int64

MultiIndex as extra dimension

 The unstack() method will quickly convert a multiply indexed Series into a
conventionally indexed DataFrame:

pop_df = pop.unstack()
pop_df

2000 2010

California3387164837253956

New York 1897645719378102

Texas 2085182025145561
 The stack() method provides the opposite operation:

pop_df.stack()
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

pop_df = pd.DataFrame({'total': pop,

'under18': [9267089, 9284094,
4687374, 4318033,
5906301, 6879014]})
pop_df

total under18

California2000338716489267089

2010372539569284094

New York 2000189764574687374

2010193781024318033

Texas 2000208518205906301

2010251455616879014

Methods of MultiIndex Creation

 The most straightforward way to construct a multiply indexed Series or DataFrame is to
simply pass a list of two or more index arrays to the constructor.

df = pd.DataFrame(np.random.rand(4, 2),
index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
columns=['data1', 'data2'])
df

data1 data2

a 10.5542330.356072

20.9252440.219474

b10.4417590.610054

20.1714950.886688
data = {('California', 2000): 33871648,
('California', 2010): 37253956,
('Texas', 2000): 20851820,
('Texas', 2010): 25145561,
('New York', 2000): 18976457,
('New York', 2010): 19378102}
pd.Series(data)
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

Explicit MultiIndex constructors

pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])

MultiIndex(levels=[['a', 'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])

MultiIndex(levels=[['a', 'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

MultiIndex level names

 Sometimes it is convenient to name the levels of the MultiIndex. This can be
accomplished by passing the names argument to any of the
above MultiIndex constructors.

pop.index.names = ['state', 'year']

pop
state year
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

Indexing and Slicing a MultiIndex

pop
state year
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

pop.loc['California':'New York']
state year
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
dtype: int64

Combining Datasets:
Concat and Append

import pandas as pd
import numpy as np

x = [1, 2, 3]
y = [4, 5, 6]
z = [7, 8, 9]
np.concatenate([x, y, z])
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

x = [[1, 2],
[3, 4]]
np.concatenate([x, x], axis=1)
array([[1, 2, 1, 2],
[3, 4, 3, 4]])

Simple Concatenation with pd.concat

ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3])
ser2 = pd.Series(['D', 'E', 'F'], index=[4, 5, 6])
pd.concat([ser1, ser2])
1 A
2 B
3 C
4 D
5 E
6 F
dtype: object

df1 = make_df('AB', [1, 2])

df2 = make_df('AB', [3, 4])
display('df1', 'df2', 'pd.concat([df1, df2])')
df3 = make_df('AB', [0, 1])
df4 = make_df('CD', [0, 1])
display('df3', 'df4', "pd.concat([df3, df4], axis='col')")

Concatenation with joins

df5 = make_df('ABC', [1, 2])
df6 = make_df('BCD', [3, 4])
display('df5', 'df6', 'pd.concat([df5, df6])')

display('df5', 'df6',
"pd.concat([df5, df6], join='inner')")

The append() method

display('df1', 'df2', 'df1.append(df2)')
Merge and Join

Categories of Joins

 The pd.merge() function implements a number of types of joins: the one-to-one, many-to-
one, and many-to-many joins.

 All three types of joins are accessed via an identical call to the pd.merge() interface; the
type of join performed depends on the form of the input data.

Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
138 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Pandas
No ratings yet
Pandas
63 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
The Pandas Series Object-Print
No ratings yet
The Pandas Series Object-Print
16 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Pandas
No ratings yet
Pandas
163 pages
Notes - EDA-Unit2
No ratings yet
Notes - EDA-Unit2
43 pages
Pandas
No ratings yet
Pandas
57 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Module 6
No ratings yet
Module 6
48 pages
Unit 2
No ratings yet
Unit 2
81 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Pandas Fundamentals
No ratings yet
Pandas Fundamentals
90 pages
09 - Pandas Slides
No ratings yet
09 - Pandas Slides
33 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
23 pages
Pandas - Ipynb - Colab
No ratings yet
Pandas - Ipynb - Colab
8 pages
SR Ip Pandas I Full Notes
No ratings yet
SR Ip Pandas I Full Notes
30 pages
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
14 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
Pandas Series - Notes for PA3.Docx
No ratings yet
Pandas Series - Notes for PA3.Docx
9 pages
Ip Notes
No ratings yet
Ip Notes
20 pages
Leip 102
No ratings yet
Leip 102
36 pages
Pandas
No ratings yet
Pandas
12 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
eda u2
No ratings yet
eda u2
61 pages
Exp 25 - 26
No ratings yet
Exp 25 - 26
17 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
Ip 102
No ratings yet
Ip 102
36 pages
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
No ratings yet
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
30 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas 1 Series
No ratings yet
Pandas 1 Series
14 pages
Pandas
No ratings yet
Pandas
7 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Custom Validation On Table Maintenance Generator (SM30) - Musicodez
No ratings yet
Custom Validation On Table Maintenance Generator (SM30) - Musicodez
6 pages
Bosch Ar VR Solution
No ratings yet
Bosch Ar VR Solution
1 page
Alcatel 4615 Voice Mail
No ratings yet
Alcatel 4615 Voice Mail
17 pages
EEE223 Signals & Systems Fall 2010
No ratings yet
EEE223 Signals & Systems Fall 2010
12 pages
Assessment ITI
No ratings yet
Assessment ITI
9 pages
Chapter Four
No ratings yet
Chapter Four
14 pages
Registers 8051
No ratings yet
Registers 8051
6 pages
Security0 Architecture
No ratings yet
Security0 Architecture
4 pages
Sy Gpon 2000 Wadont
No ratings yet
Sy Gpon 2000 Wadont
4 pages
CS8481 - Set1
No ratings yet
CS8481 - Set1
8 pages
PPL Unit-3
No ratings yet
PPL Unit-3
56 pages
Troubleshooting Guide:: Knights of The Old Republic
No ratings yet
Troubleshooting Guide:: Knights of The Old Republic
23 pages
YEAR 9 Computer Science
No ratings yet
YEAR 9 Computer Science
9 pages
HPE ProLiant DL360 G5 Server - Overview
No ratings yet
HPE ProLiant DL360 G5 Server - Overview
5 pages
(Prelim) SmartPAC PRO Manual
No ratings yet
(Prelim) SmartPAC PRO Manual
170 pages
5G Bootcamp Syllabus 3.0 - APPROVED 10 - 12 - 22-1
No ratings yet
5G Bootcamp Syllabus 3.0 - APPROVED 10 - 12 - 22-1
9 pages
PCS-221S X Communication+Protocol+Manual EN Overseas+General X R1.00
No ratings yet
PCS-221S X Communication+Protocol+Manual EN Overseas+General X R1.00
95 pages
Kesavan Resume
No ratings yet
Kesavan Resume
2 pages
QPA9501 Data Sheet
No ratings yet
QPA9501 Data Sheet
10 pages
Assembly Program Assembly Commands PIC 16F84A Instruction Set
100% (1)
Assembly Program Assembly Commands PIC 16F84A Instruction Set
8 pages
D-Bus and Polkit, No More Mysticism and Confusion
No ratings yet
D-Bus and Polkit, No More Mysticism and Confusion
33 pages
IGCSE - ICT - Paper 1 Theory - Paper 11
No ratings yet
IGCSE - ICT - Paper 1 Theory - Paper 11
11 pages
My PPT For Presentation
No ratings yet
My PPT For Presentation
16 pages
Introduction To Programming Languages
No ratings yet
Introduction To Programming Languages
69 pages
Build Info
No ratings yet
Build Info
17 pages
ControlXTPB Setup Guide
No ratings yet
ControlXTPB Setup Guide
11 pages
SDN Presentation Schedule
No ratings yet
SDN Presentation Schedule
2 pages
SW Update 041028
No ratings yet
SW Update 041028
8 pages
QlikView SAP Connector v5.60 - SR2 Reference Manual
No ratings yet
QlikView SAP Connector v5.60 - SR2 Reference Manual
73 pages
Dice Resume CV Lanre Akin Highlighted
No ratings yet
Dice Resume CV Lanre Akin Highlighted
3 pages

Unit III - Pandas - Data Manipulation Using Python

Uploaded by

Unit III - Pandas - Data Manipulation Using Python

Uploaded by

Unit - III

Introducing Pandas Objects

The Pandas Series Object

 It can be created from a list or array as follows:

data = pd.Series([0.25, 0.5, 0.75, 1.0])

 We can even use non-contiguous or non-sequential indices:

data = pd.Series([0.25, 0.5, 0.75, 1.0],

Series as specialized dictionary

>>> pd.Series(data, index=index)

pd.Series(5, index=[100, 200, 300])

pd.Series({2:'a', 1:'b', 3:'c'})

pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

The Pandas DataFrame Object

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,

DataFrame as specialized dictionary

Constructing DataFrame objects

From a single Series object

From a dictionary of Series objects

From a two-dimensional NumPy array

From a NumPy structured array

The Pandas Index Object

ind = pd.Index([2, 3, 5, 7, 11])

Index as immutable array

print(ind.size, ind.shape, ind.ndim, ind.dtype)

indA | indB # union

indA ^ indB # symmetric difference

Data Selection in Series

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

# explicit index when indexing

Operating on Data in Pandas

 Pandas include a couple useful twists, however:

Handling Missing Data

 Different data sources may indicate missing data in different ways.

Operating on Null Values

 Common pattern in practice is to make use of hierarchical indexing (also known

MultiIndex as extra dimension

New York 1897645719378102

pop_df = pd.DataFrame({'total': pop,

New York 2000189764574687374

Methods of MultiIndex Creation

Explicit MultiIndex constructors

pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])

pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])

MultiIndex level names

pop.index.names = ['state', 'year']

Indexing and Slicing a MultiIndex

Simple Concatenation with pd.concat

df1 = make_df('AB', [1, 2])

Concatenation with joins

The append() method

You might also like