0% found this document useful (0 votes)
82 views

Pandas Notes

Pandas is a Python library used for manipulating and analyzing data. It provides two main data structures - Series for one-dimensional data and DataFrame for two-dimensional tabular data. DataFrame can be created from lists, dictionaries, NumPy arrays, or other DataFrames. It allows fast and efficient operations on data like loading from different sources, handling missing data, merging, reshaping, and grouping.

Uploaded by

Rajesh T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Pandas Notes

Pandas is a Python library used for manipulating and analyzing data. It provides two main data structures - Series for one-dimensional data and DataFrame for two-dimensional tabular data. DataFrame can be created from lists, dictionaries, NumPy arrays, or other DataFrames. It allows fast and efficient operations on data like loading from different sources, handling missing data, merging, reshaping, and grouping.

Uploaded by

Rajesh T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Pandas

History:
Pandas were initially developed by Wes McKinney in 2008 while he was working at
AQR Capital Management. He convinced the AQR to allow him to open source the
Pandas. Another AQR employee, Chang She, joined as the second major contributor
to the library in 2012. Over time many versions of pandas have been released.

The latest version of the pandas is 1.0.1

Advantages of pandas:-

• Fast and efficient for manipulating and analyzing data.


• Data from different file objects can be loaded.
• Easy handling of missing data (represented as NaN) in floating point as well
• Size mutability: columns can be inserted and deleted from DataFrame and
higher
dimensional objects
• Data set merging and joining.
• Flexible reshaping and pivoting of data sets
• Powerful group by functionality for performing split-apply-combine operations
on data sets.

Pandas generally provide two data structures for manipulating data, They are:

1.Series --->Single dimensional data


2.DataFrame--->Two dimensional data
Series:
A series can be created using various inputs like −

• Array
• Dict
• Scalar value or constant

Example:
import pandas as pd
import numpy as np

data = np.array(['chicken','mutton','fish'])
ser = pd.Series(data)
print(ser)
DataFrame:
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular
fashion in rows and columns. You can think of it as an SQL table or a spreadsheet
data representation.

Features of DataFrame
• Potentially columns are of different types
• Size – Mutable
• Labeled axes (rows and columns)
• Can Perform Arithmetic operations on rows and columns

Create DataFrame
A pandas DataFrame can be created using various inputs like −

• Lists
• dict
• Series
• Numpy ndarrays
• Another DataFrame

Create an Empty DataFrame


A basic DataFrame, which can be created is an Empty Dataframe.
import pandas as pd
df = pd.DataFrame()
print(df).

Create a DataFrame from Lists


The DataFrame can be created using a single list or a list of lists.
Ex:1
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

Ex:2
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)

Handle csv files:


df=pd.read_csv('E:\python batch\car_data.csv')
Record count:
len(df) or df.shape
select specific columns:
df.loc[;,[‘owner’,’transmission’]]
df.sort_values(‘Year’)
filter the data:
df[df['Year']>2013]
df[(df.val > 0.5) & (df.val2 == 1)]
Replace nulls with default values

nba["College"].fillna("No College", inplace = True)

Grouping the data:

df.groupby('Team').groups -----grouping
Joins:
--------
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
pd.merge(left,right,on='id')

pd.merge(left,right,on=['id','subject_id'])
pd.merge(left, right, on='subject_id', how='left')

You might also like