0% found this document useful (0 votes)
45 views

Data Frames Python

- Dataframe objects in Pandas can store heterogeneous 2D data like tables with rows and columns. - The document shows various ways to create dataframes from lists, nested lists, dictionaries, and existing series. - It also demonstrates operations on dataframes like accessing elements, adding/modifying columns, applying functions, and extracting subsets of data.

Uploaded by

p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Data Frames Python

- Dataframe objects in Pandas can store heterogeneous 2D data like tables with rows and columns. - The document shows various ways to create dataframes from lists, nested lists, dictionaries, and existing series. - It also demonstrates operations on dataframes like accessing elements, adding/modifying columns, applying functions, and extracting subsets of data.

Uploaded by

p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

DATA FRAMES

Dataframe objects of Pandas can store 2D heterogeneous data. It is a two-dimentional


data structure, just like any table ( with rows and columns )
# 1.CREATE DATA FRAME FROM LIST

import pandas as pd

stulist=[10,20,30,40,50]
stu_df1=pd.DataFrame(stulist)
print(stu_df1)
0
0 10
1 20
2 30
3 40
4 50
# 2. CREATE DATAFRAME FROM NESTED LIST

import pandas as pd
stulist=[['AAA','BBB','CCC','DDD','EEE','FFF'],[10,20,30,40,50,60]]
stu_df2=pd.DataFrame(stulist,index=['Name','Marks'],columns=['col1','col2','col3','col4','col5'
,'col6'])
print(stu_df2)
col1 col2 col3 col4 col5 col6
Name AAA BBB CCC DDD EEE FFF
Marks 10 20 30 40 50 60

# 3. CREATE DATAFRAME USING SERIES

#Create a Series using a Dictionary

stu_marks=pd.Series({'Anil':30,'Vishal':89,'Komal':67,'Varsha':35,'Pulkit':77})
stu_class=pd.Series({'Anil':11,'Vishal':12,'Komal':10,'Varsha':12,'Pulkit':11})
stu_df3=pd.DataFrame({'Marks':stu_marks,'Class':stu_class})
print(stu_df3)

Marks Class
Anil 30 11
Vishal 89 12
Komal 67 10
Varsha 35 12
Pulkit 77 11

#Assignment

Iname Price Company


I1 TV 4000.0 NaN
I2 AC 5000.0 SAMSUNG
I3 OVEN 9000.0 USHA
I1 NaN NaN LG

#4. Create a dataframe with following structure:

# Iname Price Company


# I1 TV 4000 LG
# I2 AC 5000 SAMSUNG
# I3 OVEN 9000 USHA

#Solution to assignment

s1=pd.Series({'I1':'TV','I2':'AC','I3':'OVEN'})
s2=pd.Series({'I1':4000,'I2':5000,'I3':9000})
s3=pd.Series({'i1':'LG','I2':'SAMSUNG','I3':'USHA'})

idf=pd.DataFrame({'Iname':s1,'Price':s2,'Company':s3})
print(idf)

# 5.CREATE DATAFRAME from DICTIONARY with a list associated as the value for a key
import pandas as pd
import numpy as np
dict_stu=
{'Rollno':[1,2,3,4,5],'Name':['Anil','Ankit','Vishal','Gaurav','Harsh'],'Class':[11,11,12,10,12],
'Sec':['A','B','B','A','C']}
print("="*30)
studf=pd.DataFrame(dict_stu)
print(studf)
Roll no Name Class Sec
0 1 Anil 11 A
1 2 Ankit 11 B
2 3 Vishal 12 B
3 4 Gaurav 10 A
4 5 Harsh 12 C

# 6. Create a dataframe from an existing dataframe

import pandas as pd
studf1=pd.DataFrame(studf)
print(studf1)
# 7. Creating an indexed dataframe using lists
stu={'ID':['E1','E2','E3','E4','E5'],'RollNo':[1,2,3,4,5],'Name':
['Anil','Ankit','Vishal','Gaurav','Harsh'],'Class':[11,11,12,10,12],'Sec':['A','B','B','A','C']}
stdf=pd.DataFrame(stu, index=['S1','S2','S3','S4','S5'])
print(stdf)
ID RollNo Name Class Sec
S1 E1 1 Anil 11 A
S2 E2 2 Ankit 11 B
S3 E3 3 Vishal 12 B
S4 E4 4 Gaurav 10 A
S5 E5 5 Harsh 12 C

#8. Accessing elements of a dataframe


print(stdf['RollNo'])
print(stdf.RollNo)
S1 1
S2 2
S3 3
S4 4
S5 5
#9. To access more than 1 column pass them as a list
print(stdf[['RollNo','Class']])
Name: RollNo, dtype: int64
S1 1
S2 2
S3 3
S4 4
S5 5
#10. Change the index by taking value from one of the column of dataframe

#stdf1=pd.DataFrame(stdf)
#stdf1.set_index('ID',inplace=True)
#print(stdf1)
ID RollNo Name Class Sec Marks
S1 E1 1 Anil 11 A 90
S2 E2 2 Ankit 11 B 90
S3 E3 3 Vishal 12 B 90
S4 E4 4 Gaurav 10 A 90
S5 E5 5 Harsh 12 C 90
#11. To add a new column in data frame with scalar value
stdf['Marks']=90
print(stdf)
ID RollNo Name Class Sec Marks
S1 E1 1 Anil 11 A 90
S2 E2 2 Ankit 11 B 90
S3 E3 3 Vishal 12 B 90
S4 E4 4 Gaurav 10 A 90
S5 E5 5 Harsh 12 C 90

#12. Add a new column with varying values for each index value
stdf['Marks']=pd.Series([10,20,30,40,50],index=['S1','S2','S3','S4','S5'])
print(stdf)
ID RollNo Name Class Sec Marks
S1 E1 1 Anil 11 A 10
S2 E2 2 Ankit 11 B 20
S3 E3 3 Vishal 12 B 30
S4 E4 4 Gaurav 10 A 40
S5 E5 5 Harsh 12 C 50
stdf['New Marks']=stdf['Marks']+5
print(stdf)
ID RollNo Name Class Sec Marks New Marks
S1 E1 1 Anil 11 A 10 15
S2 E2 2 Ankit 11 B 20 25
S3 E3 3 Vishal 12 B 30 35
S4 E4 4 Gaurav 10 A 40 45
S5 E5 5 Harsh 12 C 50 55

#13. Applying an expression with character values


stdf['NN']=stdf['Sec']+'Q'
stdf['New_Sec']=stdf['Sec']+"$"
print(stdf)

#Applying * operator to replicate the character values of a column

stdf['New Sec2']=stdf['Sec']*3
print(stdf)

stdf['New Sec2']=stdf['Sec']*3+"@"+"Name"
print(stdf)

stdf.drop(columns=['NN'],inplace=True)
print(stdf)

#Extracting data from dataframe (loc & iloc method)


#Syntax of loc method <dataframe>.loc(<start row heading>:<stop row heading>,<start
column heading<:<stop column heading>)
print(stdf.loc['S1':'S4','ID':'Class'])
#If row label and column heading is skipped it takes values
print(stdf.loc['S1':,:'Sec'])

#Syntax of iloc method:


# <dataframe>.iloc(<start row index>:<stop row index>,<start column index>:<stop column
index>)

print(stdf.iloc[0:2,0:2])
print(stdf.iloc[::-1])

print(stdf.iloc[::-1,::-1])

#AGGREGATION USING STATISTICAL FUNCTIONS ON DATAFRAMES


#Create dataframe of players with their score
Dfgame=pd.DataFrame({'Pno':[1,2,3,4,5,6,7,8],'Pname':
['Virat','Sehwag','Dhoni','Rahane','Hardik','Jadeja','Shami','Buruah'],'Score':
[90,98,86,76,70,65,93,80]})
print(Dfgame)

#Max() function

print("The highest score is:",Dfgame['Score'].max())


print("Max function applied on Pname function:",Dfgame['Pname'].max())

#Min function
print("the lowest score is:",Dfgame['Score'].min())
print("Min function applied on Pname function:",Dfgame['Pname'].min())

#Function Sum
print("The total score of team is :",Dfgame['Score'].sum())

#Count the total number of columns for each row


print("the total number of columns for each row is:")

print(Dfgame.count(axis=1))

==============================
Roll no Name Class Sec
0 1 Anil 11 A
1 2 Ankit 11 B
2 3 Vishal 12 B
3 4 Gaurav 10 A
4 5 Harsh 12 C
Roll no Name Class Sec
0 1 Anil 11 A
1 2 Ankit 11 B
2 3 Vishal 12 B
3 4 Gaurav 10 A
4 5 Harsh 12 C
ID RollNo Name Class Sec
S1 E1 1 Anil 11 A
S2 E2 2 Ankit 11 B
S3 E3 3 Vishal 12 B
S4 E4 4 Gaurav 10 A
S5 E5 5 Harsh 12 C
S1 1
S2 2
S3 3
S4 4
S5 5
Name: RollNo, dtype: int64
S1 1
S2 2
S3 3
S4 4
S5 5
Name: RollNo, dtype: int64
RollNo Class
S1 1 11
S2 2 11
S3 3 12
S4 4 10
S5 5 12
ID RollNo Name Class Sec Marks
S1 E1 1 Anil 11 A 90
S2 E2 2 Ankit 11 B 90
S3 E3 3 Vishal 12 B 90
S4 E4 4 Gaurav 10 A 90
S5 E5 5 Harsh 12 C 90
ID RollNo Name Class Sec Marks
S1 E1 1 Anil 11 A 10
S2 E2 2 Ankit 11 B 20
S3 E3 3 Vishal 12 B 30
S4 E4 4 Gaurav 10 A 40
S5 E5 5 Harsh 12 C 50
ID RollNo Name Class Sec Marks New Marks
S1 E1 1 Anil 11 A 10 15
S2 E2 2 Ankit 11 B 20 25
S3 E3 3 Vishal 12 B 30 35
S4 E4 4 Gaurav 10 A 40 45
S5 E5 5 Harsh 12 C 50 55
ID RollNo Name Class Sec Marks New Marks NN New_Sec
S1 E1 1 Anil 11 A 10 15 AQ A$
S2 E2 2 Ankit 11 B 20 25 BQ B$
S3 E3 3 Vishal 12 B 30 35 BQ B$
S4 E4 4 Gaurav 10 A 40 45 AQ A$
S5 E5 5 Harsh 12 C 50 55 CQ C$
ID RollNo Name Class Sec Marks New Marks NN New_Sec New Sec2
S1 E1 1 Anil 11 A 10 15 AQ A$ AAA
S2 E2 2 Ankit 11 B 20 25 BQ B$ BBB
S3 E3 3 Vishal 12 B 30 35 BQ B$ BBB
S4 E4 4 Gaurav 10 A 40 45 AQ A$ AAA
S5 E5 5 Harsh 12 C 50 55 CQ C$ CCC
ID RollNo Name Class Sec Marks New Marks NN New_Sec New Sec2
S1 E1 1 Anil 11 A 10 15 AQ A$ AAA@Name
S2 E2 2 Ankit 11 B 20 25 BQ B$ BBB@Name
S3 E3 3 Vishal 12 B 30 35 BQ B$ BBB@Name
S4 E4 4 Gaurav 10 A 40 45 AQ A$ AAA@Name
S5 E5 5 Harsh 12 C 50 55 CQ C$ CCC@Name

ID RollNo Name Class Sec Marks New Marks New_Sec New Sec2
S1 E1 1 Anil 11 A 10 15 A$ AAA@Name
S2 E2 2 Ankit 11 B 20 25 B$ BBB@Name
S3 E3 3 Vishal 12 B 30 35 B$ BBB@Name
S4 E4 4 Gaurav 10 A 40 45 A$ AAA@Name
S5 E5 5 Harsh 12 C 50 55 C$ CCC@Name
ID RollNo Name Class
S1 E1 1 Anil 11
S2 E2 2 Ankit 11
S3 E3 3 Vishal 12
S4 E4 4 Gaurav 10
ID RollNo Name Class Sec
S1 E1 1 Anil 11 A
S2 E2 2 Ankit 11 B
S3 E3 3 Vishal 12 B
S4 E4 4 Gaurav 10 A
S5 E5 5 Harsh 12 C
ID RollNo
S1 E1 1
S2 E2 2
ID RollNo Name Class Sec Marks New Marks New_Sec New Sec2
S5 E5 5 Harsh 12 C 50 55 C$ CCC@Name
S4 E4 4 Gaurav 10 A 40 45 A$ AAA@Name
S3 E3 3 Vishal 12 B 30 35 B$ BBB@Name
S2 E2 2 Ankit 11 B 20 25 B$ BBB@Name
S1 E1 1 Anil 11 A 10 15 A$ AAA@Name
New Sec2 New_Sec New Marks Marks Sec Class Name RollNo ID
S5 CCC@Name C$ 55 50 C 12 Harsh 5 E5
S4 AAA@Name A$ 45 40 A 10 Gaurav 4 E4
S3 BBB@Name B$ 35 30 B 12 Vishal 3 E3
S2 BBB@Name B$ 25 20 B 11 Ankit 2 E2
S1 AAA@Name A$ 15 10 A 11 Anil 1 E1
Pno Pname Score
0 1 Virat 90
1 2 Sehwag 98
2 3 Dhoni 86
3 4 Rahane 76
4 5 Hardik 70
5 6 Jadeja 65
6 7 Shami 93
7 8 Buruah 80
The highest score is: 98
Max function applied on Pname function: Virat
the lowest score is: 65
Min function applied on Pname function: Buruah
The total score of team is : 658
the total number of columns for each row is:
0 3
1 3
2 3
3 3
4 3
5 3
6 3
7 3
dtype: int64
>>>

You might also like