Milestone Challenge On Used Bikes Data Set

Milestone Challenge on Used Bikes Data Set
Welecome to the Milestione Challenge, in this scenario, you will be exploring descriptive statistics on used
bikes dataset .
Note -
 Add extra cells for coding if neccessary.
 Finally restart and run all the cells after the completion of the challenge.
Run the below cell to import the basic neccessary packages

Note- These are basic packages which is needed to solve this challenge,kindly include appropriate modules
from the below given packages to solve this challenge based on the given scenarios.
[1]:
import numpy as np
import pandas as pd
import statistics
import scipy
Run the below cell to download the dataset

[2]:
!wget hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
--2021-09-22 10:42:42-- http://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
Resolving hrcdn.net (hrcdn.net)... 23.77.203.146, 23.77.203.144, 2600:1407:1800::173f:49d8, ...
Connecting to hrcdn.net (hrcdn.net)|23.77.203.146|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv [following]
--2021-09-22 10:42:42-- https://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
Connecting to hrcdn.net (hrcdn.net)|23.77.203.146|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2493547 (2.4M) [application/octet-stream]
Saving to: ‘Used_Bikes.csv’
Used_Bikes.csv 100%[===================>] 2.38M 9.68MB/s in 0.2s
2021-09-22 10:42:43 (9.68 MB/s) - ‘Used_Bikes.csv’ saved [2493547/2493547]
Load the dataset

 Load the Bank Marketing dataset from the file Used_Bikes.csv and save it in the variable df.
[13]:
df = pd.read_csv("Used_Bikes.csv")
Question 1
 (a) From the given above dataset calculate the mean for all numerical columns and convert it into
dictionary and save it in the variable mean.
 (b) From the given above dataset calculate the median for all numerical columns and convert it
into dictionary and save it in the variable median.
 (c) From the given above dataset calculate the mode for all numerical columns and convert it into
dictionary and save it in the variable mode.
 (d) From the mean , median and mode values of the column kms_driven ,it is positively skewed
distribution.
- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable kms_driven.
Save the outputs for (a), (b) and (c) in the following format -
{Column Name : Value}
Example-
{'age': 85.85,
'kms_driven': 1500.01,
'power': 200.01}
Note- Here round off the mean and median values to two decimal points and mode to the integer.
[57]:
mean = {"price" : price.mean(),"kms_driven" : kms_driven.mean(),"age" :, age.mean(),"power" :,

power.mean()}
print (mean)
median = df.median()
mode = df.mode()
kms_driven = print ('Yes')

File "<ipython-input-57-b3982c389b48>", line 1
mean = {"price" : price.mean(),"kms_driven" : kms_driven.mean(),"age" :, age.mean(),"power" :,
power.mean()}
^
SyntaxError: invalid syntax
Question 2
 (a) From the column owner get the count of all categories and convert it into dictionary and save
it in the variable owner.
 (b) The column owner looks to have a high class imbalance problem.
string in the variable class_imb.
For the question (a) save the output in the following format -
{Category Name : Count}
Example-
{'First Owner': 500,
'Fourth Owner Or More': 300,
'Second Owner': 200}

[58]:
owner =
class_imb =
File "<ipython-input-58-7a9c7a6960d5>", line 1
owner =
^
Question 3
 For the given dataset find the following -
 - (a) Population Variance of dataset of all numerical columns and save it in variable p_var.

- (b) Population Standard Deviation of dataset of all numerical columns and save it in variable
p_sd.
For the questions (a) and (b) save the output in the following format -
Example-
{'age': 85.85,
'power': 200.01}
Note- Here round it off the values to two decimal points and convert all variables it into dictionary.
[59]:
p_var =
p_sd =
File "<ipython-input-59-039e0a8010f1>", line 1
p_var =
^
Question 4
 For the given dataset find the following -
 - (a) Interquartile Range of the column price and save it in variable iqr_price .

- (b) Interquartile Range of the column kms_driven and save it in variable iqr_kms_driven.
Note- Here round it off the values to two decimal points.
[ ]:
from scipy.stats import iqr
iqr_price =
iqr_kms_driven =
Question 5
 (a) From the given dataset find the correlation matrix for all the numerical columns and save it in
varibale df_corr.Here round it off the values to two decimal points.
 (b) From df_corr Age is negatively correlated with Price.

string in the variable Age_Price.
 (c) From df_corr Power is negatively correlated with Price.
string in the variable Power_Price.
For the questions (a) save the output in the following format -
{Column Name : {Column Name : Value}}
Example-
{'age': {'age': 0.2, 'kms_driven': 0.08, 'power': -0.25, 'price': 0.08},
'kms_driven': {'age': -0.15,
'kms_driven': 0.3,
'power': 0.18,
'price': 0.58}}
[ ]:
df_corr =
Age_Price =
Power_Price =
Question 6
 (a) From the given dataset find the skewness for all the numerical columns and save it in
varibale df_skew.Here round it off the values to two decimal points.
 (b) From df_skew age is fairly symmetrical.
string in the variable Age_Skew.
 (c) From df_skew power is highly skewed.
string in the variable Power_Skew.
For the question (a) save the output in the following format -
Example-
{'age': 55.85,
'power': 61.01}
[ ]:
df_skew =
Age_Skew =
Power_Skew =
Run the below cell to save the results

[ ]:
with open('question1.txt', 'a') as f:
print(mean , file=f)
print(median , file=f)
print(mode , file=f)
print(kms_driven , file=f)

print(owner, file=f)
print(class_imb, file=f)

print(p_var , file=f)
print(p_sd , file=f)

print(iqr_price , file=f)
print(iqr_kms_driven , file=f)
print(df_corr , file=f)
print(Age_Price , file=f)
print(Power_Price , file=f)

print(df_skew , file=f)
print(Age_Skew , file=f)
print(Power_Skew , file=f)
[ ]:
 Desc_Stats_Question.ipynb
 Used_Bikes.csv
 score.py
 Terminal 1
import ast
from hashlib import md5
import json
def get_file(file_name):
with open(file_name) as f:
data = f.readlines()
return [s.rstrip('\n') for s in data]
score = 0
try:
#Question1
q1 = 0
question1 = get_file("question1.txt")
question1_1 = ast.literal_eval(question1[0])
question1_1 =sorted(question1_1.items(), key=lambda x: x[0])
if(md5(str(question1_1).encode()).hexdigest() == "5234e07f99ab2b913f26a9dd0d37a3e7" and

md5(str(question1_2).encode()).hexdigest() == "8cf61882c16304cb93aed577e57073b1" and
md5(str(question1_3).encode()).hexdigest() == "7dfebef1f84a7c27b5562b8e84f692b0" and
md5(str(question1[3]).encode()).hexdigest() == "a6105c0a611b41b08f1209506350279e"):
q1 = q1 +20
except:
q1 = 0
#Question2
try:
q2 = 0
if(md5(str(question2_1).encode()).hexdigest() == "69f10a3346bf0f8d8b902a06642ae4ca" and

md5(str(question2[1]).encode()).hexdigest() == "a6105c0a611b41b08f1209506350279e"):
q2 = q2 + 10
except:
q2 = 0
#Question3
try:
q3 = 0
if(md5(str(question3_1).encode()).hexdigest() == "7759614969fe59a095a56b8eb2f6e812" and

md5(str(question3_2).encode()).hexdigest() == "ba8c0fd84352dd004ac3af7eb5ad13d3"):
q3 = q3 + 20
except:
q3 = 0
#Question4
try:
q4 = 0
if(md5(str(question4[0]).encode()).hexdigest() == "f19bb046ca4ba9a016360ca151cc8a0a" and
md5(str(question4[1]).encode()).hexdigest() == "3804bd983ddd0d379c3167b9126fc866"):
q4 = q4 +10
except:
q4 = 0
#Question5
try:
q5 = 0
if(md5(json.dumps(question5_1, sort_keys=True).encode('utf-8')).hexdigest() ==
"25fa51b43ce1c5bbc55fa494ce634be1" and md5(str(question5[1]).encode()).hexdigest() ==
"a6105c0a611b41b08f1209506350279e" and md5(str(question5[2]).encode()).hexdigest() ==
"7fa3b767c460b54a2be4d49030b349c7"):
q5 = q5 +20
except:
q5 = 0
#Question6
try:
q6 =0
if(md5(str(question6_1).encode()).hexdigest() == "cf85a8394c75181155a2b67a581601b0" and

md5(str(question6[1]).encode()).hexdigest() == "7fa3b767c460b54a2be4d49030b349c7"):
q6 = q6 +20
except:
q6 = 0
try:
score = q1 + q2 + q3 + q4 + q5 + q6
print("FS_SCORE:{0}%".format(score))
except:
print("FS_SCORE:0%")

Milestone Challenge On Used Bikes Data Set

Uploaded by

Copyright:

Available Formats

Milestone Challenge On Used Bikes Data Set

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Read this document in other languages

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Milestone Challenge On Used Bikes Data Set

Uploaded by

Copyright:

Available Formats

Milestone Challenge on Used Bikes Data Set

 Add extra cells for coding if neccessary.

Run the below cell to import the basic neccessary packages

Run the below cell to download the dataset

Used_Bikes.csv 100%[===================>] 2.38M 9.68MB/s in 0.2s

2021-09-22 10:42:43 (9.68 MB/s) - ‘Used_Bikes.csv’ saved [2493547/2493547]

Load the dataset

{Column Name : Value}

mean = {"price" : price.mean(),"kms_driven" : kms_driven.mean(),"age" :, age.mean(),"power" :,

kms_driven = print ('Yes')

 (b) The column owner looks to have a high class imbalance problem.

{Category Name : Count}

{'First Owner': 500,

'Fourth Owner Or More': 300,

'Second Owner': 200}

 For the given dataset find the following -

from scipy.stats import iqr

 (b) From df_corr Age is negatively correlated with Price.

{Column Name : {Column Name : Value}}

{'age': {'age': 0.2, 'kms_driven': 0.08, 'power': -0.25, 'price': 0.08},

'kms_driven': {'age': -0.15,

 (b) From df_skew age is fairly symmetrical.

{Column Name : Value}

Run the below cell to save the results

with open('question1.txt', 'a') as f:

with open('question2.txt', 'a') as f:

with open('question3.txt', 'a') as f:

with open('question4.txt', 'a') as f:

with open('question5.txt', 'a') as f:

with open('question6.txt', 'a') as f:

from hashlib import md5

return [s.rstrip('\n') for s in data]

question1_1 =sorted(question1_1.items(), key=lambda x: x[0])

question1_3 =sorted(question1_3.items(), key=lambda x: x[0])

if(md5(str(question1_1).encode()).hexdigest() == "5234e07f99ab2b913f26a9dd0d37a3e7" and

question2_1 =sorted(question2_1.items(), key=lambda x: x[0])

if(md5(str(question2_1).encode()).hexdigest() == "69f10a3346bf0f8d8b902a06642ae4ca" and

question3_1 =sorted(question3_1.items(), key=lambda x: x[0])

question3_2 =sorted(question3_2.items(), key=lambda x: x[0])

if(md5(str(question3_1).encode()).hexdigest() == "7759614969fe59a095a56b8eb2f6e812" and

question5_1 =sorted(question5_1.items(), key=lambda x: x[0])

question6_1 =sorted(question6_1.items(), key=lambda x: x[0])

if(md5(str(question6_1).encode()).hexdigest() == "cf85a8394c75181155a2b67a581601b0" and

You might also like