Pandas Notes
Pandas Notes
History:
Pandas were initially developed by Wes McKinney in 2008 while he was working at
AQR Capital Management. He convinced the AQR to allow him to open source the
Pandas. Another AQR employee, Chang She, joined as the second major contributor
to the library in 2012. Over time many versions of pandas have been released.
Advantages of pandas:-
Pandas generally provide two data structures for manipulating data, They are:
• Array
• Dict
• Scalar value or constant
Example:
import pandas as pd
import numpy as np
data = np.array(['chicken','mutton','fish'])
ser = pd.Series(data)
print(ser)
DataFrame:
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular
fashion in rows and columns. You can think of it as an SQL table or a spreadsheet
data representation.
Features of DataFrame
• Potentially columns are of different types
• Size – Mutable
• Labeled axes (rows and columns)
• Can Perform Arithmetic operations on rows and columns
Create DataFrame
A pandas DataFrame can be created using various inputs like −
• Lists
• dict
• Series
• Numpy ndarrays
• Another DataFrame
Ex:2
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)
df.groupby('Team').groups -----grouping
Joins:
--------
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
pd.merge(left,right,on='id')
pd.merge(left,right,on=['id','subject_id'])
pd.merge(left, right, on='subject_id', how='left')