Gyan Ganga Institute of Technology and Sciences,Jabalpur
Computer Science and Engineering
CSE-Data Science, IV semester DS406
PYTHON FOR DATA SCIENCE
LAB MANUAL
List of Experiments
1. Write a python program to reverse a string.
2. Write a python program to perform following operation using lists:
a. append element in the list
b. compare two lists
c. convert list to dictionary
3. Write a Program to transpose a table/pandas data frame.
4. Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10.
5. Write a python program to perform following operation on Data Frame:
a. Create two different Data Frames and perform the merging operations on it.
b. Create two different Data Frames and perform the grouping operations on it.
c. Create two different Data Frames and perform the concatenating operations on it
6. Program to check regular expression pattern is matching with string or not in Python
7. Create a sample dataset and apply the following aggregation function on it:
1. mean(), median()
2. min(), max()
3. std(), var()
4. sum()
8. Write a python program to get row wise proportion using crosstab () function.
9. Write a python program to display a bar chart of the popularity of programming languages.
10. Write a Python program to create bar plot of scores by group and gender. Use multiple X
values on the same chart for men and women.
Sample Data:
Means (men) = (22, 30, 35, 35, 26)
Means (women) = (25, 32, 30, 35, 29)
Experiment 1
Aim: Write a python program to reverse a string.
def reverse_string(str):
revstr=""
index=len(str)
while index>0:
revstr=revstr+str[index-1]
index=index-1
return revstr
print(reverse_string("Welcome"))
Output:
emocleW
Experiment 2
Aim: Write a python program to perform following operation using lists:
1. append element in the list
2. compare two lists
3. convert list to dictionary
# appending the element in the list
list1=[1,2,3,4,5]
for i in list1:
print(i)
list2=['a','b','c','d','e']
list1.append(list2)
print(list1)
Output:
[1, 2, 3, 4, 5, ['a', 'b', 'c', 'd', 'e']]
# comparing two lists
import collections
l1 = [10, 20, 30, 40, 50]
l2 = [10, 20, 30, 50, 40, 70]
l3 = [50, 10, 30, 20, 40]
l1.sort()
l2.sort()
l3.sort()
if l1 == l3:
print ("The lists l1 and l3 are the same")
else:
print ("The lists l1 and l3 are not the same")
if l1 == l2:
print ("The lists l1 and l2 are the same")
else:
print ("The lists l1 and l2 are not the same")
Output:
The lists l1 and l3 are the same
The lists l1 and l2 are not the same
# convert list to dictionary
index = [1,2,3]
languages = [‘a’, ’b’, ’c’]
dictionary={k:v for k, v in zip(index, languages)}
print(dictionary)
Output:
{1: 'a', 2: 'b', 3: 'c'}
Experiment 3
Aim: Write a Program to transpose a table/pandas data frame.
import numpy as np
import pandas as pd
d1 = {'c1': [2, 3], 'c2': [4, 5]}
df1 = pd.DataFrame(data=d1)
print(df1)
df1Transpose=df1.T
print(df1_transpose)
Output:
c1 c2
0 2 4
1 3 5
0 1
c1 2 3
c2 4 5
Experiment 4
Aim: Write a NumPy program to create a 3x3 matrix with values ranging from 2 to
10.
import numpy as np
x = np.arange(2, 11).reshape(3,3)
print(x)
Output:
[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]
Experiment 5
Aim: Write a python program to perform following operation on Data Frame:
a. Create two different Data Frames and perform the merging operations on it.
b. Create two different Data Frames and perform the grouping operations on it.
c. Create two different Data Frames and perform the concatenating operations on it
a. Create two different Data Frames and perform the merging operations on it.
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
index=[0, 1, 2, 3],
)
df2 = pd.DataFrame(
{
"A": ["A4", "A5", "A6", "A7"],
"B": ["B4", "B5", "B6", "B7"],
"C": ["C4", "C5", "C6", "C7"],
"D": ["D4", "D5", "D6", "D7"],
},
index=[4, 5, 6, 7],
)
df3 = pd.DataFrame(
{
"A": ["A8", "A9", "A10", "A11"],
"B": ["B8", "B9", "B10", "B11"],
"C": ["C8", "C9", "C10", "C11"],
"D": ["D8", "D9", "D10", "D11"],
},
index=[8, 9, 10, 11],
)
frames = [df1, df2, df3]
result = pd.concat(frames)
Output:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
Create two different Data Frames and perform the grouping operations on it:
Any groupby operation involves one of the following operations on the original object.
They are −
Splitting the Object
Applying a function
Combining the results
In many situations, we split the data into sets and we apply some functionality on each
subset. In the apply functionality, we can perform the following operations −
Aggregation − computing a summary statistic
Transformation − perform some group-specific operation
Filtration − discarding the data with some condition
Example:
Split Data into Groups
Pandas object can be split into any of their objects. There are multiple ways to split an
object like −
obj.groupby('key')
obj.groupby(['key1','key2'])
obj.groupby(key,axis=1)
import pandas as pd
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
grouped = df.groupby('Year')
for name,group in grouped:
print (name)
print (group)
Output:
2014
Team Rank Year Points
0 Riders 1 2014 876
2 Devils 2 2014 863
4 Kings 3 2014 741
9 Royals 4 2014 701
2015
Team Rank Year Points
1 Riders 2 2015 789
3 Devils 3 2015 673
5 kings 4 2015 812
10 Royals 1 2015 804
2016
Team Rank Year Points
6 Kings 1 2016 756
8 Riders 2 2016 694
2017
Team Rank Year Points
7 Kings 1 2017 788
11 Riders 2 2017 690
Using the get_group() method, we can select a single group.
# import the pandas library
import pandas as pd
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
grouped = df.groupby('Year')
print (grouped.get_group(2014))
Output:
Team Rank Year Points
0 Riders 1 2014 876
2 Devils 2 2014 863
4 Kings 3 2014 741
9 Royals 4 2014 701
Aggregations
An aggregated function returns a single aggregated value for each group. Once the group
by object is created, several aggregation operations can be performed on the grouped
data.
An obvious one is aggregation via the aggregate or equivalent agg method −
# import the pandas library
import pandas as pd
import numpy as np
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
#Aggregations using agg() function
grouped = df.groupby('Year')
print (grouped['Points'].agg(np.mean))
grouped = df.groupby('Team')
# to see the size of each group is by applying the size() function
print (grouped.agg(np.size))
# pass a list or dict of functions to do aggregation with, and generate DataFrame as
output
print (grouped['Points'].agg([np.sum, np.mean, np.std]))
# Transformation
grouped = df.groupby('Team')
score = lambda x: (x - x.mean()) / x.std()*10
print (grouped.transform(score))
Output:
Year
2014 795.25
2015 769.50
2016 725.00
2017 739.00
Name: Points, dtype: float64
Rank Year Points
Team
Devils 2 2 2
Kings 3 3 3
Riders 4 4 4
Royals 2 2 2
kings 1 1 1
sum mean std
Team
Devils 1536 768.000000 134.350288
Kings 2285 761.666667 24.006943
Riders 3049 762.250000 88.567771
Royals 1505 752.500000 72.831998
kings 812 812.000000 NaN
Rank Year Points
0 -15.000000 -11.618950 12.843272
1 5.000000 -3.872983 3.020286
2 -7.071068 -7.071068 7.071068
3 7.071068 7.071068 -7.071068
4 11.547005 -10.910895 -8.608621
5 NaN NaN NaN
6 -5.773503 2.182179 -2.360428
7 -5.773503 8.728716 10.969049
8 5.000000 3.872983 -7.705963
9 7.071068 -7.071068 -7.071068
10 -7.071068 7.071068 7.071068
11 5.000000 11.618950 -8.157595
Experiment 6
Aim: Program to check regular expression pattern is matching with string or not
in Python
#Need module 're' for regular expression
import re
search_string = "Hello-World"
pattern = "World"
match = re.match(pattern, search_string)
#If-statement after search() tests if it succeeded
if match:
print("regex matches: ", match.group())
else:
print('pattern not found')
Output:
pattern not found
Experiment 7
Aim: Create a sample dataset and apply the following aggregation function on it:
1. mean(), median()
2. min(), max()
3. std(), var()
4. sum()
import pandas as pd
import numpy as np
raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],
'age': [20, 19, 22, 21],
'favorite_color': ['blue', 'blue', 'yellow', "green"],
'grade': [88, 92, 95, 70]}
df = pd.DataFrame(raw_data, index = ['Willard Morris', 'Al Jennings', 'Omar Mullins',
'Spencer McDaniel'])
print(df)
print(df['grade'].mean())
print(df['grade'].median())
print(df['grade'].sum())
print(df['grade'].var())
print(df['grade'].std())
print(df['grade'].max())
print(df['grade'].min())
Output:
name age favorite_color grade
Willard Morris Willard Morris 20 blue 88
Al Jennings Al Jennings 19 blue 92
Omar Mullins Omar Mullins 22 yellow 95
Spencer McDaniel Spencer McDaniel 21 green 70
86.25
90.0
345
125.58333333333333
11.206396982676159
95
70
Experiment 8
Aim: Write a python program to get row wise proportion using crosstab () function.
import pandas as pd
import numpy as np
#Create a DataFrame
d={
'Name':['Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine',
'Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine'],
'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],
'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science',
'Mathematics','Mathematics','Mathematics','Science','Science','Science'],
'Result':['Pass','Pass','Fail','Pass','Fail','Pass','Pass','Fail','Fail','Pass','Pass','Fail']}
df = pd.DataFrame(d,columns=['Name','Exam','Subject','Result'])
pd.crosstab(df.Subject, df.Result,margins=True)
# Two way frequency table or cross table: Get proportion using crosstab() function
#### Rename the index and columns
my_crosstab.columns = ["Fail" , "Pass" , "rowtotal"]
my_crosstab.index= ["Mathematics","Science","coltotal"]
#Get row wise proportion using crosstab() function
#### Get the row proportion
print(my_crosstab.index(my_crosstab["rowtotal"],axis=0))
Output:
Experiment 9
Aim: Write a python program to display a bar chart of the popularity of
programming languages.
import matplotlib.pyplot as plt
x = ['Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++']
popularity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]
x_pos = [i for i, _ in enumerate(x)]
plt.bar(x_pos, popularity, color=(0.4, 0.6, 0.8, 1.0), edgecolor='blue')
plt.xlabel("Languages")
plt.ylabel("Popularity")
plt.title("PopularitY of Programming Language\n" + "Worldwide, Oct 2017 compared to a
year ago")
plt.xticks(x_pos, x)
# Turn on the grid
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth='0.5', color='red')
# Customize the minor grid
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
plt.show()
Output:
Experiment 10
Aim: . Write a Python program to create bar plot of scores by group and gender. Use
multiple X values on the same chart for men and women.
Sample Data:
Means (men) = (22, 30, 35, 35, 26)
Means (women) = (25, 32, 30, 35, 29)
import numpy as np
import matplotlib.pyplot as plt
# data to plot
n_groups = 5
men_means = (22, 30, 33, 30, 26)
women_means = (25, 32, 30, 35, 29)
# create plot
fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.35
opacity = 0.8
rects1 = plt.bar(index, men_means, bar_width,
alpha=opacity,
color='g',
label='Men')
rects2 = plt.bar(index + bar_width, women_means, bar_width,
alpha=opacity,
color='r',
label='Women')
plt.xlabel('Person')
plt.ylabel('Scores')
plt.title('Scores by person')
plt.xticks(index + bar_width, ('G1', 'G2', 'G3', 'G4', 'G5'))
plt.legend()
plt.tight_layout()
plt.show()
Output: