Milestone Challenge On Used Bikes Data Set

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Milestone Challenge on Used Bikes Data Set

Welecome to the Milestione Challenge, in this scenario, you will be exploring descriptive statistics on used
bikes dataset .

Note -

 Add extra cells for coding if neccessary.

 Finally restart and run all the cells after the completion of the challenge.

Run the below cell to import the basic neccessary packages


Note- These are basic packages which is needed to solve this challenge,kindly include appropriate modules
from the below given packages to solve this challenge based on the given scenarios.
[1]:

import numpy as np
import pandas as pd
import statistics
import scipy

Run the below cell to download the dataset


[2]:

!wget hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
--2021-09-22 10:42:42-- http://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
Resolving hrcdn.net (hrcdn.net)... 23.77.203.146, 23.77.203.144, 2600:1407:1800::173f:49d8, ...
Connecting to hrcdn.net (hrcdn.net)|23.77.203.146|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv [following]
--2021-09-22 10:42:42-- https://hrcdn.net/s3_pub/istreet-assets/-ccjO7ToeMlvfSIOr-Wxfg/Used_Bikes.csv
Connecting to hrcdn.net (hrcdn.net)|23.77.203.146|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2493547 (2.4M) [application/octet-stream]
Saving to: ‘Used_Bikes.csv’

Used_Bikes.csv 100%[===================>] 2.38M 9.68MB/s in 0.2s

2021-09-22 10:42:43 (9.68 MB/s) - ‘Used_Bikes.csv’ saved [2493547/2493547]

Load the dataset


 Load the Bank Marketing dataset from the file Used_Bikes.csv and save it in the variable df.

[13]:

df = pd.read_csv("Used_Bikes.csv")

Question 1

 (a) From the given above dataset calculate the mean for all numerical columns and convert it into
dictionary and save it in the variable mean.

 (b) From the given above dataset calculate the median for all numerical columns and convert it
into dictionary and save it in the variable median.

 (c) From the given above dataset calculate the mode for all numerical columns and convert it into
dictionary and save it in the variable mode.

 (d) From the mean , median and mode values of the column kms_driven ,it is positively skewed
distribution.

- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable kms_driven.
Save the outputs for (a), (b) and (c) in the following format -

{Column Name : Value}

Example-

{'age': 85.85,

'kms_driven': 1500.01,

'power': 200.01}

Note- Here round off the mean and median values to two decimal points and mode to the integer.
[57]:

mean = {"price" : price.mean(),"kms_driven" : kms_driven.mean(),"age" :, age.mean(),"power" :,


power.mean()}

print (mean)

median = df.median()
mode = df.mode()

kms_driven = print ('Yes')


File "<ipython-input-57-b3982c389b48>", line 1
mean = {"price" : price.mean(),"kms_driven" : kms_driven.mean(),"age" :, age.mean(),"power" :,
power.mean()}
^
SyntaxError: invalid syntax

Question 2

 (a) From the column owner get the count of all categories and convert it into dictionary and save
it in the variable owner.

 (b) The column owner looks to have a high class imbalance problem.

- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable class_imb.
For the question (a) save the output in the following format -

{Category Name : Count}

Example-

{'First Owner': 500,

'Fourth Owner Or More': 300,

'Second Owner': 200}


[58]:

owner =

class_imb =
File "<ipython-input-58-7a9c7a6960d5>", line 1
owner =
^
SyntaxError: invalid syntax

Question 3

 For the given dataset find the following -

 - (a) Population Variance of dataset of all numerical columns and save it in variable p_var.

- (b) Population Standard Deviation of dataset of all numerical columns and save it in variable
p_sd.
For the questions (a) and (b) save the output in the following format -
{Column Name : Value}

Example-

{'age': 85.85,

'kms_driven': 1500.01,

'power': 200.01}

Note- Here round it off the values to two decimal points and convert all variables it into dictionary.
[59]:

p_var =

p_sd =
File "<ipython-input-59-039e0a8010f1>", line 1
p_var =
^
SyntaxError: invalid syntax

Question 4
 For the given dataset find the following -

 - (a) Interquartile Range of the column price and save it in variable iqr_price .

- (b) Interquartile Range of the column kms_driven and save it in variable iqr_kms_driven.
Note- Here round it off the values to two decimal points.
[ ]:

from scipy.stats import iqr

iqr_price =

iqr_kms_driven =

Question 5

 (a) From the given dataset find the correlation matrix for all the numerical columns and save it in
varibale df_corr.Here round it off the values to two decimal points.

 (b) From df_corr Age is negatively correlated with Price.


- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable Age_Price.
 (c) From df_corr Power is negatively correlated with Price.

- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable Power_Price.
For the questions (a) save the output in the following format -

{Column Name : {Column Name : Value}}

Example-

{'age': {'age': 0.2, 'kms_driven': 0.08, 'power': -0.25, 'price': 0.08},

'kms_driven': {'age': -0.15,

'kms_driven': 0.3,

'power': 0.18,

'price': 0.58}}
[ ]:

df_corr =

Age_Price =

Power_Price =

Question 6

 (a) From the given dataset find the skewness for all the numerical columns and save it in
varibale df_skew.Here round it off the values to two decimal points.

 (b) From df_skew age is fairly symmetrical.

- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable Age_Skew.
 (c) From df_skew power is highly skewed.

- If the answer for the above statement is yes assign the value yes as string otherwise value no as
string in the variable Power_Skew.
For the question (a) save the output in the following format -

{Column Name : Value}

Example-
{'age': 55.85,

'kms_driven': 76.01,

'power': 61.01}
[ ]:

df_skew =

Age_Skew =

Power_Skew =

Run the below cell to save the results


[ ]:

with open('question1.txt', 'a') as f:

print(mean , file=f)
print(median , file=f)
print(mode , file=f)
print(kms_driven , file=f)

with open('question2.txt', 'a') as f:


print(owner, file=f)
print(class_imb, file=f)

with open('question3.txt', 'a') as f:


print(p_var , file=f)
print(p_sd , file=f)

with open('question4.txt', 'a') as f:


print(iqr_price , file=f)
print(iqr_kms_driven , file=f)

with open('question5.txt', 'a') as f:

print(df_corr , file=f)
print(Age_Price , file=f)
print(Power_Price , file=f)

with open('question6.txt', 'a') as f:


print(df_skew , file=f)
print(Age_Skew , file=f)
print(Power_Skew , file=f)
[ ]:

 Desc_Stats_Question.ipynb
 Used_Bikes.csv
 score.py
 Terminal 1

import ast

from hashlib import md5

import json

def get_file(file_name):

with open(file_name) as f:

data = f.readlines()

return [s.rstrip('\n') for s in data]

score = 0

try:

#Question1

q1 = 0

question1 = get_file("question1.txt")

question1_1 = ast.literal_eval(question1[0])

question1_1 =sorted(question1_1.items(), key=lambda x: x[0])

question1_2 = ast.literal_eval(question1[1])
question1_2 =sorted(question1_2.items(), key=lambda x: x[0])

question1_3 = ast.literal_eval(question1[2])

question1_3 =sorted(question1_3.items(), key=lambda x: x[0])

if(md5(str(question1_1).encode()).hexdigest() == "5234e07f99ab2b913f26a9dd0d37a3e7" and


md5(str(question1_2).encode()).hexdigest() == "8cf61882c16304cb93aed577e57073b1" and
md5(str(question1_3).encode()).hexdigest() == "7dfebef1f84a7c27b5562b8e84f692b0" and
md5(str(question1[3]).encode()).hexdigest() == "a6105c0a611b41b08f1209506350279e"):

q1 = q1 +20

except:

q1 = 0

#Question2

try:

q2 = 0

question2 = get_file("question2.txt")

question2_1 = ast.literal_eval(question2[0])

question2_1 =sorted(question2_1.items(), key=lambda x: x[0])

if(md5(str(question2_1).encode()).hexdigest() == "69f10a3346bf0f8d8b902a06642ae4ca" and


md5(str(question2[1]).encode()).hexdigest() == "a6105c0a611b41b08f1209506350279e"):

q2 = q2 + 10

except:

q2 = 0
#Question3

try:

q3 = 0

question3 = get_file("question3.txt")

question3_1 = ast.literal_eval(question3[0])

question3_1 =sorted(question3_1.items(), key=lambda x: x[0])

question3_2 = ast.literal_eval(question3[1])

question3_2 =sorted(question3_2.items(), key=lambda x: x[0])

if(md5(str(question3_1).encode()).hexdigest() == "7759614969fe59a095a56b8eb2f6e812" and


md5(str(question3_2).encode()).hexdigest() == "ba8c0fd84352dd004ac3af7eb5ad13d3"):

q3 = q3 + 20

except:

q3 = 0

#Question4

try:

q4 = 0

question4 = get_file("question4.txt")
if(md5(str(question4[0]).encode()).hexdigest() == "f19bb046ca4ba9a016360ca151cc8a0a" and
md5(str(question4[1]).encode()).hexdigest() == "3804bd983ddd0d379c3167b9126fc866"):

q4 = q4 +10

except:

q4 = 0

#Question5

try:

q5 = 0

question5 = get_file("question5.txt")

question5_1 = ast.literal_eval(question5[0])

question5_1 =sorted(question5_1.items(), key=lambda x: x[0])

if(md5(json.dumps(question5_1, sort_keys=True).encode('utf-8')).hexdigest() ==
"25fa51b43ce1c5bbc55fa494ce634be1" and md5(str(question5[1]).encode()).hexdigest() ==
"a6105c0a611b41b08f1209506350279e" and md5(str(question5[2]).encode()).hexdigest() ==
"7fa3b767c460b54a2be4d49030b349c7"):

q5 = q5 +20

except:

q5 = 0

#Question6
try:

q6 =0

question6 = get_file("question6.txt")

question6_1 = ast.literal_eval(question6[0])

question6_1 =sorted(question6_1.items(), key=lambda x: x[0])

if(md5(str(question6_1).encode()).hexdigest() == "cf85a8394c75181155a2b67a581601b0" and


md5(str(question6[1]).encode()).hexdigest() == "7fa3b767c460b54a2be4d49030b349c7"):

q6 = q6 +20

except:

q6 = 0

try:

score = q1 + q2 + q3 + q4 + q5 + q6

print("FS_SCORE:{0}%".format(score))

except:

print("FS_SCORE:0%")

You might also like