MODULE – 7 ASSIGNMENT
Python for data analytics
Please implement Python coding for all the problems.
1) Please take care of missing data present in the “Data.csv” file using python module
“sklearn.impute” and its methods, also collect all the data that has “Salary” less than “70,000”.
import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer
df = pd.read_csv(r'C:\Users\DELL\Downloads\datasets\Data.csv')
mean_imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
df["Salaries"] = pd.DataFrame(mean_imputer.fit_transform(df[["Salaries"]]))
df["Salaries"].isna().sum()
2) Subtracting dates:
Python date objects let us treat calendar dates as something similar to numbers: we can
compare them, sort them, add, and even subtract them. Do math with dates in a way that
would be a pain to do by hand. The 2007 Florida hurricane season was one of the busiest on
record, with 8 hurricanes in one year. The first one hit on May 9th, 2007, and the last one hit
on December 13th, 2007. How many days elapsed between the first and last hurricane in
2007?
Instructions:
Import date from datetime.
Create a date object for May 9th, 2007, and assign it to the start variable.
Create a date object for December 13th, 2007, and assign it to the end variable.
Subtract start from end, to print the number of days in the resulting timedelta object.
from datetime import datetime
© 360DigiTMG. All Rights Reserved.
# Using current time
start_time = date(2007, 5, 9)
end_time= date(2007,12,13)
Total_days= end_time - start_time
print(Total_days)
3) Representing dates in different ways
Date objects in Python have a great number of ways they can be printed out as strings. In
some cases, you want to know the date in a clear, language-agnostic format. In other cases,
you want something which can fit into a paragraph and flow naturally.
Print out the same date, August 26, 1992 (the day that Hurricane Andrew made landfall in
Florida), in a number of different ways, by using the “ .strftime() ” method. Store it in a
variable called “Andrew”.
Instructions:
Print it in the format 'YYYY-MM', 'YYYY-DDD' and 'MONTH (YYYY)'
4) For the dataset “Indian_cities”,
a) Find out top 10 states in female-male sex ratio
import pandas as pd
import numpy as np
ic = pd.read_csv(r"C:\Users\DELL\Downloads\datasets\Indian_cities.csv")
state=ic.groupby('state_name').sum()
© 360DigiTMG. All Rights Reserved.
literacy_city = state[['sex_ratio']].sort_values('sex_ratio', ascending=False)
literacy_city.head(10)
b) Find out top 10 cities in total number of graduates
state=ic.groupby('name_of_city').sum()
literacy_city = state[['total_graduates']].sort_values('total_graduates', ascending=False)
literacy_city.head(10)
c) Find out top 10 cities and their locations in respect of total effective_literacy_rate.
state=ic.groupby('name_of_city').sum()
literacy_city =
state[['effective_literacy_rate_total','state_name']].sort_values('effective_literacy_rate_total',
ascending=False)
literacy_city.head(10)
5) For the data set “Indian_cities”
a) Construct histogram on literates_total and comment about the inferences
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
ic = pd.read_csv(r"C:\Users\DELL\Downloads\datasets\Indian_cities.csv")
plt.hist(ic.literates_total)
b) Construct scatter plot between male graduates and female graduates
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
© 360DigiTMG. All Rights Reserved.
ic = pd.read_csv(r"C:\Users\DELL\Downloads\datasets\Indian_cities.csv")
plt.scatter(x = ic['male_graduates'], y = ic['female_graduates'])
6) For the data set “Indian_cities”
a) Construct Boxplot on total effective literacy rate and draw inferences
import pandas as pd
import numpy as np
import seaborn as sns
ic = pd.read_csv(r"C:\Users\DELL\Downloads\datasets\Indian_cities.csv")
sns.boxplot(ic.effective_literacy_rate_total)
b) Find out the number of null values in each column of the dataset and delete them.
import pandas as pd
import numpy as np
ic = pd.read_csv(r"C:\Users\DELL\Downloads\datasets\data.csv")
b=ic.isna().sum()
print(b)
c=ic.dropna()
print(c)
© 360DigiTMG. All Rights Reserved.