Milestone Challenge On Used Bikes Data Set
Milestone Challenge On Used Bikes Data Set
Milestone Challenge On Used Bikes Data Set
Welecome to the Milestione Challenge, in this scenario, you will be exploring descriptive statistics on used
bikes dataset .
Note -
Finally restart and run all the cells after the completion of the challenge.
import numpy as np
import pandas as pd
import statistics
import scipy
!wget hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
--2021-09-22 10:42:42-- http://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
Resolving hrcdn.net (hrcdn.net)... 23.77.203.146, 23.77.203.144, 2600:1407:1800::173f:49d8, ...
Connecting to hrcdn.net (hrcdn.net)|23.77.203.146|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv [following]
--2021-09-22 10:42:42-- https://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
Connecting to hrcdn.net (hrcdn.net)|23.77.203.146|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2493547 (2.4M) [application/octet-stream]
Saving to: ‘Used_Bikes.csv’
[13]:
df = pd.read_csv("Used_Bikes.csv")
Question 1
(a) From the given above dataset calculate the mean for all numerical columns and convert it into
dictionary and save it in the variable mean.
(b) From the given above dataset calculate the median for all numerical columns and convert it
into dictionary and save it in the variable median.
(c) From the given above dataset calculate the mode for all numerical columns and convert it into
dictionary and save it in the variable mode.
(d) From the mean , median and mode values of the column kms_driven ,it is positively skewed
distribution.
- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable kms_driven.
Save the outputs for (a), (b) and (c) in the following format -
Example-
{'age': 85.85,
'kms_driven': 1500.01,
'power': 200.01}
Note- Here round off the mean and median values to two decimal points and mode to the integer.
[57]:
print (mean)
median = df.median()
mode = df.mode()
Question 2
(a) From the column owner get the count of all categories and convert it into dictionary and save
it in the variable owner.
- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable class_imb.
For the question (a) save the output in the following format -
Example-
owner =
class_imb =
File "<ipython-input-58-7a9c7a6960d5>", line 1
owner =
^
SyntaxError: invalid syntax
Question 3
- (a) Population Variance of dataset of all numerical columns and save it in variable p_var.
- (b) Population Standard Deviation of dataset of all numerical columns and save it in variable
p_sd.
For the questions (a) and (b) save the output in the following format -
{Column Name : Value}
Example-
{'age': 85.85,
'kms_driven': 1500.01,
'power': 200.01}
Note- Here round it off the values to two decimal points and convert all variables it into dictionary.
[59]:
p_var =
p_sd =
File "<ipython-input-59-039e0a8010f1>", line 1
p_var =
^
SyntaxError: invalid syntax
Question 4
For the given dataset find the following -
- (a) Interquartile Range of the column price and save it in variable iqr_price .
- (b) Interquartile Range of the column kms_driven and save it in variable iqr_kms_driven.
Note- Here round it off the values to two decimal points.
[ ]:
iqr_price =
iqr_kms_driven =
Question 5
(a) From the given dataset find the correlation matrix for all the numerical columns and save it in
varibale df_corr.Here round it off the values to two decimal points.
- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable Power_Price.
For the questions (a) save the output in the following format -
Example-
'kms_driven': 0.3,
'power': 0.18,
'price': 0.58}}
[ ]:
df_corr =
Age_Price =
Power_Price =
Question 6
(a) From the given dataset find the skewness for all the numerical columns and save it in
varibale df_skew.Here round it off the values to two decimal points.
- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable Age_Skew.
(c) From df_skew power is highly skewed.
- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable Power_Skew.
For the question (a) save the output in the following format -
Example-
{'age': 55.85,
'kms_driven': 76.01,
'power': 61.01}
[ ]:
df_skew =
Age_Skew =
Power_Skew =
print(mean , file=f)
print(median , file=f)
print(mode , file=f)
print(kms_driven , file=f)
print(df_corr , file=f)
print(Age_Price , file=f)
print(Power_Price , file=f)
Desc_Stats_Question.ipynb
Used_Bikes.csv
score.py
Terminal 1
import ast
import json
def get_file(file_name):
with open(file_name) as f:
data = f.readlines()
score = 0
try:
#Question1
q1 = 0
question1 = get_file("question1.txt")
question1_1 = ast.literal_eval(question1[0])
question1_2 = ast.literal_eval(question1[1])
question1_2 =sorted(question1_2.items(), key=lambda x: x[0])
question1_3 = ast.literal_eval(question1[2])
q1 = q1 +20
except:
q1 = 0
#Question2
try:
q2 = 0
question2 = get_file("question2.txt")
question2_1 = ast.literal_eval(question2[0])
q2 = q2 + 10
except:
q2 = 0
#Question3
try:
q3 = 0
question3 = get_file("question3.txt")
question3_1 = ast.literal_eval(question3[0])
question3_2 = ast.literal_eval(question3[1])
q3 = q3 + 20
except:
q3 = 0
#Question4
try:
q4 = 0
question4 = get_file("question4.txt")
if(md5(str(question4[0]).encode()).hexdigest() == "f19bb046ca4ba9a016360ca151cc8a0a" and
md5(str(question4[1]).encode()).hexdigest() == "3804bd983ddd0d379c3167b9126fc866"):
q4 = q4 +10
except:
q4 = 0
#Question5
try:
q5 = 0
question5 = get_file("question5.txt")
question5_1 = ast.literal_eval(question5[0])
if(md5(json.dumps(question5_1, sort_keys=True).encode('utf-8')).hexdigest() ==
"25fa51b43ce1c5bbc55fa494ce634be1" and md5(str(question5[1]).encode()).hexdigest() ==
"a6105c0a611b41b08f1209506350279e" and md5(str(question5[2]).encode()).hexdigest() ==
"7fa3b767c460b54a2be4d49030b349c7"):
q5 = q5 +20
except:
q5 = 0
#Question6
try:
q6 =0
question6 = get_file("question6.txt")
question6_1 = ast.literal_eval(question6[0])
q6 = q6 +20
except:
q6 = 0
try:
score = q1 + q2 + q3 + q4 + q5 + q6
print("FS_SCORE:{0}%".format(score))
except:
print("FS_SCORE:0%")