Pandas_Data_Analytics
Pandas_Data_Analytics
Series are:
one-dimensional
labeled arrays
of any data type
Or, said differently...
a sequence of values
with associated labels
dtype('O')
0 A
2 C
dtype: object
List argument:
Dict argument:
pd.Series(data=0) pd.Series(data='weather')
argument
pd.Series(data=students)
parameter
Indexing With Callables
slices, callables,
[] idx'ing series['label'] boolean masks
slices, callables,
.loc[] series.loc['label'] boolean masks
no slice or boolean
dot access series.label mask support
no slice support;
.get() series.get('label') provides default;
forgiving
Selection By Position
Approach Example Comment
slices, callables,
[] idx'ing series[0] boolean masks
slices, callables,
.iloc[] series.iloc[0] boolean masks
no slice or boolean
dot access series.0 mask support
no slice support;
.get() series.get(0) provides default;
forgiving
What is a csv?
COMMA-SEPARATED VALUES (.CSV) FILE
True 1
False 0
ser.diff(periods=1)
Dropping Or Filling NAs
.dropna(): excludes NAs from the series
unless
func()
sequential
vectorized
func() func()
func()
Series Accounting
.size: number of elements in the series
series.size # 193
ser.value_counts( sort=True,
ascending=False,
dropna=True,
normalize=False )
Variance
the average of squared differences from the mean
mean
sum of
DATAFRAMES
FIRST KEY CONCEPT
each column in a
dataframe is a series
DATAFRAMES
THIRD KEY CONCEPT
unlike series,
dataframes could be
heterogenous
drop DF.DROPNA()
rows
removes columns or rows with
missing values
SUBSET
restricts or localizes the method
application to specific
orthogonal labels
MORE WAYS TO DATAFRAME
column-wise
DF.APPLY()
Is aggregation required?
yes no
DF.AGG() DF.TRANSFORM()
Binary (or bitwise) Operators
OPERATOR WHAT IS EXAMPLE
complement
~ ~True -> -2
Comparators
COMPARISON OPERATOR PANDAS METHOD
≤ .le()
> .gt()
≥ .ge()
== .eq()
players.duplicated( WHAT COUNTS AS A DUPLICATE?
subset=['name', 'age'],
DEFAULT CUSTOM
subset paramter
DEFAULT CUSTOM
keep parmeter
fillna() axes and methods
FILL DIRECTIONS
AXIS=1
METHOD=FFILL
AXIS=0 AXIS=0
METHOD=FFILL METHOD=BFILL
AXIS=1
METHOD=BFILL
lookup(): another way to fancy index
players.lookup([450], ['age'])
array([30])
pandas
layout 3 cols
9 cols 7 cols
to pop() or not...
players.pop('age')
3 POINTS TO CONSIDER
players.loc[0:2] slicing
fancy
players.loc[[0, 132], ['name', 'market_value']]
indexing
Two's Complement
VER
Y IM
P ORT
ANT
11111111 -1
11111110 -2 For example, 32 bits represent
^
00000000 0
00000001 1 inverts the bits
00000010 2
11111111 -1
11111110
11111110 -2
11111101 -3
VECTORIZATION
o p e ra t io n
c o m p le t e
in 2 c y c le s
s , in s t e a d
of 6
CPU
GPU
df.append()
.append() is a DataFrame
instance method
pd.concat()
.append() only operates along
the index axis
concat()
+ =
1 a v 7 1 a 9
3
9
b
c
+ r
a
4
9
= 3
9
b
c
1
2
b 1
c 2
k 3
how='inner' how='outer'
+ = + =
only the common keys are selected all keys are selected
similar to set intersection similar to set union
Join Cardinalities
how='left' how='right'
+ = + =
MULTIINDEX INTERNALS
make up MultiIndex objects,
also known as hierarchical
indices in pands
L0 L1
PANEL
deprecated since pandas v0.22
VS
- prefer df.MultiIndex for new projects
MULTIINDEX DF
- many of the same pandas concepts apply
for representing
hierarchical data - older docs still available online for panel
split data into groups
Split
Apply
Combine
apply .sum()
Split
Apply
NA_Sales NA_Sales
Combine
0.75 0.80
combine the output
Split
NA_Sales NA_Sales
0.75 0.80
Apply
Combine