0% found this document useful (0 votes)

27 views11 pages

GmPrac1 - Jupyter Notebook

The document describes the loading and initial exploration of a car dataset using pandas in Python. It includes operations such as reading the CSV file, checking for null values, calculating averages for specific columns, and creating new features based on existing data. The dataset consists of 205 entries with 26 columns, detailing various attributes of cars.

Uploaded by

azaanahrmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views11 pages

GmPrac1 - Jupyter Notebook

Uploaded by

azaanahrmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

In [1]: import pandas as pd

import matplotlib.pylab as plt

import numpy as np

In [2]: df = pd.read_csv("autodata.csv")

In [3]: df.head(5)

Out[3]:
num-
normalized- fuel- body- drive- engine- whe
symboling make aspiration of-
losses type style wheels location ba
doors

alfa-
0 3 122.0 gas std two convertible rwd front 8
romero

alfa-
1 3 122.0 gas std two convertible rwd front 8
romero

alfa-
2 1 122.0 gas std two hatchback rwd front 9
romero

3 2 164.0 audi gas std four sedan fwd front 9

4 2 164.0 audi gas std four sedan 4wd front 9

5 rows × 26 columns
 

In [4]: df.tail(5)

Out[4]:
num-
normalized- fuel- body- drive- engine- wheel
symboling make aspiration of-
losses type style wheels location base
doors

200 -1 95.0 volvo gas std four sedan rwd front 109.1

201 -1 95.0 volvo gas turbo four sedan rwd front 109.1

202 -1 95.0 volvo gas std four sedan rwd front 109.1

203 -1 95.0 volvo diesel turbo four sedan rwd front 109.1

204 -1 95.0 volvo gas turbo four sedan rwd front 109.1

5 rows × 26 columns
 
In [5]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 symboling 205 non-null int64
1 normalized-losses 205 non-null float64
2 make 205 non-null object
3 fuel-type 205 non-null object
4 aspiration 205 non-null object
5 num-of-doors 203 non-null object
6 body-style 205 non-null object
7 drive-wheels 205 non-null object
8 engine-location 205 non-null object
9 wheel-base 205 non-null float64
10 length 205 non-null float64
11 width 205 non-null float64
12 height 205 non-null float64
13 curb-weight 205 non-null int64
14 engine-type 205 non-null object
15 num-of-cylinders 205 non-null object
16 engine-size 205 non-null int64
17 fuel-system 205 non-null object
18 bore 205 non-null float64
19 stroke 205 non-null float64
20 compression-ratio 205 non-null float64
21 horsepower 205 non-null float64
22 peak-rpm 205 non-null float64
23 city-mpg 205 non-null int64
24 highway-mpg 205 non-null int64
25 price 205 non-null float64
dtypes: float64(11), int64(5), object(10)
memory usage: 41.8+ KB

In [6]: df.describe()

Out[6]:
normalized- wheel-
symboling length width height curb-weight
losses base

count 205.000000 205.000000 205.000000 205.000000 205.000000 205.000000 205.000000

mean 0.834146 122.000000 98.756585 174.049268 65.907805 53.724878 2555.565854

std 1.245307 31.681008 6.021776 12.337289 2.145204 2.443522 520.680204

min -2.000000 65.000000 86.600000 141.100000 60.300000 47.800000 1488.000000

25% 0.000000 101.000000 94.500000 166.300000 64.100000 52.000000 2145.000000

50% 1.000000 122.000000 97.000000 173.200000 65.500000 54.100000 2414.000000

75% 2.000000 137.000000 102.400000 183.100000 66.900000 55.500000 2935.000000

max 3.000000 256.000000 120.900000 208.100000 72.300000 59.800000 4066.000000

 
In [7]: df.isnull()

Out[7]:
num-
normalized- fuel- body- drive- engine- wheel-
symboling make aspiration of-
losses type style wheels location base
doors

0 False False False False False False False False False False

1 False False False False False False False False False False

2 False False False False False False False False False False

3 False False False False False False False False False False

4 False False False False False False False False False False

... ... ... ... ... ... ... ... ... ... ...

200 False False False False False False False False False False

201 False False False False False False False False False False

202 False False False False False False False False False False

203 False False False False False False False False False False

204 False False False False False False False False False False

205 rows × 26 columns

 

In [9]: df.notnull().sum()

Out[9]: symboling 205

normalized-losses 205
make 205
fuel-type 205
aspiration 205
num-of-doors 203
body-style 205
drive-wheels 205
engine-location 205
wheel-base 205
length 205
width 205
height 205
curb-weight 205
engine-type 205
num-of-cylinders 205
engine-size 205
fuel-system 205
bore 205
stroke 205
compression-ratio 205
horsepower 205
peak-rpm 205
city-mpg 205
highway-mpg 205
price 205
dtype: int64
In [10]: # calculate the mean vaule for "stroke" column
avg_stroke = df["stroke"].astype("float").mean(axis = 0)
print("Average of stroke:", avg_stroke)
# replace NaN by mean value in "stroke" column
df["stroke"].replace(np.nan, avg_stroke, inplace = True)

Average of stroke: 3.2554228855721337

In [11]: avg_hp = df["horsepower"].astype("float").mean(axis = 0)

print("Average of stroke:", avg_hp)

Average of stroke: 104.25615763546797

In [12]: df["peak-rpm"].replace(np.nan, avg_hp, inplace = True)

In [13]: df['num-of-doors'].value_counts()

Out[13]: four 114

two 89
Name: num-of-doors, dtype: int64

In [14]: df['num-of-doors'].value_counts().idxmax()

Out[14]: 'four'

In [15]: # Replace missing 'num-of-doors' values with the most frequent value ('four
df["num-of-doors"].fillna(df["num-of-doors"].mode()[0], inplace=True)

# Drop rows with NaN values in the "horsepower" column
df.dropna(subset=["horsepower"], axis=0, inplace=True)

# Reset the index after dropping rows
df.reset_index(drop=True, inplace=True)
In [17]: df.isnull().sum()

Out[17]: symboling 0
normalized-losses 0
make 0
fuel-type 0
aspiration 0
num-of-doors 0
body-style 0
drive-wheels 0
engine-location 0
wheel-base 0
length 0
width 0
height 0
curb-weight 0
engine-type 0
num-of-cylinders 0
engine-size 0
fuel-system 0
bore 0
stroke 0
compression-ratio 0
horsepower 0
peak-rpm 0
city-mpg 0
highway-mpg 0
price 0
dtype: int64

In [18]: df['city-L/100km'] = 235/df["city-mpg"]

df.head()

Out[18]:
num-
normalized- fuel- body- drive- engine- whe
symboling make aspiration of-
losses type style wheels location ba
doors

alfa-
0 3 122.0 gas std two convertible rwd front 8
romero

alfa-
1 3 122.0 gas std two convertible rwd front 8
romero

alfa-
2 1 122.0 gas std two hatchback rwd front 9
romero

3 2 164.0 audi gas std four sedan fwd front 9

4 2 164.0 audi gas std four sedan 4wd front 9

5 rows × 27 columns
 
In [19]: df['highway-L/100km'] = 235/df["highway-mpg"]
df.head()

Out[19]:
num-
normalized- fuel- body- drive- engine- whe
symboling make aspiration of-
losses type style wheels location ba
doors

alfa-
0 3 122.0 gas std two convertible rwd front 8
romero

alfa-
1 3 122.0 gas std two convertible rwd front 8
romero

alfa-
2 1 122.0 gas std two hatchback rwd front 9
romero

3 2 164.0 audi gas std four sedan fwd front 9

4 2 164.0 audi gas std four sedan 4wd front 9

5 rows × 28 columns
 

In [20]: df['length'] = df['length']/df['length'].max()

df['width'] = df['width']/df['width'].max()

In [21]: df['height'] = df['height']/df['height'].max()

df[["length","width","height"]].head()

Out[21]:
length width height

0 0.811148 0.886584 0.816054

1 0.811148 0.886584 0.816054

2 0.822681 0.905947 0.876254

3 0.848630 0.915629 0.908027

4 0.848630 0.918396 0.908027

In [22]: df.columns

Out[22]: Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiratio

n',
'num-of-doors', 'body-style', 'drive-wheels', 'engine-location',
'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-t
ype',
'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke',
'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg',
'highway-mpg', 'price', 'city-L/100km', 'highway-L/100km'],
dtype='object')

In [23]: df['aspiration'].value_counts()

Out[23]: std 168

turbo 37
Name: aspiration, dtype: int64
In [24]: dummy_variable_1 = pd.get_dummies(df["aspiration"])
dummy_variable_1.head()

Out[24]:
std turbo

0 1 0

1 1 0

2 1 0

3 1 0

4 1 0

In [25]: df = pd.concat([df, dummy_variable_1], axis=1)

df.drop("aspiration", axis = 1, inplace=True)

In [26]: df.head()

Out[26]:
num-
normalized- fuel- body- drive- engine- wheel-
symboling make of- lengt
losses type style wheels location base
doors

alfa-
0 3 122.0 gas two convertible rwd front 88.6 0.81114
romero

alfa-
1 3 122.0 gas two convertible rwd front 88.6 0.81114
romero

alfa-
2 1 122.0 gas two hatchback rwd front 94.5 0.82268
romero

3 2 164.0 audi gas four sedan fwd front 99.8 0.84863

4 2 164.0 audi gas four sedan 4wd front 99.4 0.84863

5 rows × 29 columns
 

In [27]: df["horsepower"]=df["horsepower"].astype(float, copy=True)

In [28]: %matplotlib inline
import matplotlib as plt
from matplotlib import pyplot
plt.pyplot.hist(df["horsepower"])
plt.pyplot.xlabel("horsepower")
plt.pyplot.ylabel("count")
plt.pyplot.title("horsepower bins")

Out[28]: Text(0.5, 1.0, 'horsepower bins')

In [29]: bins = np.linspace(min(df["horsepower"]), max(df["horsepower"]), 4)

bins

Out[29]: array([ 48., 128., 208., 288.])

In [30]: group_names = ['Low', 'Medium', 'High']

In [31]: # Define bin edges for horsepower

bins = [df["horsepower"].min(), 100, 150, df["horsepower"].max()] # Exampl
group_names = ["Low", "Medium", "High"] # Labels for bins

# Bin 'horsepower' column into categorical values
df['horsepower-binned'] = pd.cut(df['horsepower'], bins, labels=group_names

# Display first 20 rows of 'horsepower' and 'horsepower-binned' columns
df[['horsepower', 'horsepower-binned']].head(4)

Out[31]:
horsepower horsepower-binned

0 111.0 Medium

1 111.0 Medium

2 154.0 High

3 102.0 Medium
In [32]: df["horsepower-binned"].value_counts()

Out[32]: Low 110

Medium 62
High 32
Name: horsepower-binned, dtype: int64

In [33]: %matplotlib inline

import matplotlib as plt
from matplotlib import pyplot
pyplot.bar(group_names, df["horsepower-binned"].value_counts())
# set x/y labels and plot title
plt.pyplot.xlabel("horsepower")
plt.pyplot.ylabel("count")
plt.pyplot.title("horsepower bins")

Out[33]: Text(0.5, 1.0, 'horsepower bins')

In [34]: df["peak-rpm"]=df["peak-rpm"].astype(float, copy=True)

In [35]: %matplotlib inline
import matplotlib as plt
from matplotlib import pyplot
plt.pyplot.hist(df["peak-rpm"])
plt.pyplot.xlabel("peak-rpm")
plt.pyplot.ylabel("count")
plt.pyplot.title("Peak-rpm bins")

Out[35]: Text(0.5, 1.0, 'Peak-rpm bins')

In [36]: bins = np.linspace(min(df["peak-rpm"]), max(df["peak-rpm"]), 4)

bins

Out[36]: array([4150. , 4966.66666667, 5783.33333333, 6600. ])

In [37]: group_names1 = ['Low', 'Medium', 'High']

In [39]: import numpy as np

# Ensure 'peak-rpm' is numeric
df['peak-rpm'] = pd.to_numeric(df['peak-rpm'], errors='coerce')

# Fill missing values with the mean
df['peak-rpm'].fillna(df['peak-rpm'].mean(), inplace=True)

# Define bin edges (ensuring they are sorted)
bins = sorted([df["peak-rpm"].min(), 4000, 5000, 6000, df["peak-rpm"].max()

# Define bin labels
group_names = ["Low", "Medium", "High", "Very High"]

# Apply binning
df['peakrpm-binned'] = pd.cut(df['peak-rpm'], bins, labels=group_names, inc

# Display first 20 rows of 'peak-rpm' and 'peakrpm-binned'
df[['peak-rpm', 'peakrpm-binned']].head(5)

Out[39]:
peak-rpm peakrpm-binned

0 5000.0 Medium

1 5000.0 Medium

2 5000.0 Medium

3 5500.0 High

4 5500.0 High

In [40]: df["peakrpm-binned"].value_counts()

Out[40]: High 107

Medium 91
Low 5
Very High 2
Name: peakrpm-binned, dtype: int64

In [ ]:

Used Car Tycoon Game Games Codes (Update)
No ratings yet
Used Car Tycoon Game Games Codes (Update)
1 page
Read CSV Files Using Pandas Library
No ratings yet
Read CSV Files Using Pandas Library
11 pages
Task 3 Car Price Prediction Using Machine Learning
No ratings yet
Task 3 Car Price Prediction Using Machine Learning
30 pages
Mx5parts Catalogue
100% (1)
Mx5parts Catalogue
64 pages
Car Price Prediction
No ratings yet
Car Price Prediction
480 pages
NTRC - Axle Load Study-09-03-2011 PDF
100% (2)
NTRC - Axle Load Study-09-03-2011 PDF
177 pages
KH Global Pattern Digest
No ratings yet
KH Global Pattern Digest
86 pages
ABRITES Diagnostics For VAG User Manual
100% (1)
ABRITES Diagnostics For VAG User Manual
221 pages
2019 R1250RT
No ratings yet
2019 R1250RT
258 pages
Import As Import As: Numpy NP Pandas PD
No ratings yet
Import As Import As: Numpy NP Pandas PD
22 pages
Answer 12
No ratings yet
Answer 12
33 pages
Data Analysis
No ratings yet
Data Analysis
58 pages
Assignment
No ratings yet
Assignment
49 pages
Case IH Steiger Complete - Brochure - 1712
No ratings yet
Case IH Steiger Complete - Brochure - 1712
44 pages
Data Analysis Report
No ratings yet
Data Analysis Report
74 pages
Statisitics Project 7
No ratings yet
Statisitics Project 7
22 pages
Project 8 Predictive Analytics - Ipynb - Colaboratory
No ratings yet
Project 8 Predictive Analytics - Ipynb - Colaboratory
8 pages
BDA-4 EDA Project
No ratings yet
BDA-4 EDA Project
19 pages
Anthony Morales 6948 Laurel Ave Highland, CA 92346: Kemper P.O. Box 2843 Clinton, IA 52733
No ratings yet
Anthony Morales 6948 Laurel Ave Highland, CA 92346: Kemper P.O. Box 2843 Clinton, IA 52733
28 pages
se python_merged (1) (1) (1)
No ratings yet
se python_merged (1) (1) (1)
77 pages
EDA Withoutcode (1)
No ratings yet
EDA Withoutcode (1)
36 pages
IP project model
No ratings yet
IP project model
51 pages
Verna
No ratings yet
Verna
5 pages
SIERRA AGUILAR YONDER
No ratings yet
SIERRA AGUILAR YONDER
38 pages
Practical Example Full Notes
No ratings yet
Practical Example Full Notes
48 pages
City Cycle Fuel Consumption 2024
No ratings yet
City Cycle Fuel Consumption 2024
23 pages
Eda 1
No ratings yet
Eda 1
29 pages
Course2 - DataAnalysis With Python - Week3 - Exploratory Data Analysis
No ratings yet
Course2 - DataAnalysis With Python - Week3 - Exploratory Data Analysis
23 pages
Statisitics Project 3
No ratings yet
Statisitics Project 3
22 pages
RC Super 2 4cylinder
No ratings yet
RC Super 2 4cylinder
27 pages
Advance EDA & Predictive Analytics
No ratings yet
Advance EDA & Predictive Analytics
38 pages
Preliminary Estimate
No ratings yet
Preliminary Estimate
5 pages
Installation and Assembly Instructions
No ratings yet
Installation and Assembly Instructions
3 pages
#1 - Skill Builds - Data Analysis With Python
No ratings yet
#1 - Skill Builds - Data Analysis With Python
3 pages
vertopal.com_Lab_Exploratory-Data-Analysis
No ratings yet
vertopal.com_Lab_Exploratory-Data-Analysis
25 pages
05 - Liste Des Voitures Admises en FA-FN 2018 (1) Site Internet
No ratings yet
05 - Liste Des Voitures Admises en FA-FN 2018 (1) Site Internet
9 pages
N2A19603 Parts Invoice RBR22M005906
No ratings yet
N2A19603 Parts Invoice RBR22M005906
3 pages
Assignment+questions+python+fundmentals ANSWER
No ratings yet
Assignment+questions+python+fundmentals ANSWER
3 pages
69269720_IVECO_DAILY_-_Rescue_sheet_CAB-VAN_4x2_ed_2_ENG
No ratings yet
69269720_IVECO_DAILY_-_Rescue_sheet_CAB-VAN_4x2_ed_2_ENG
4 pages
Quikr Car Price Prediction Using Linear Regression 1717999953
No ratings yet
Quikr Car Price Prediction Using Linear Regression 1717999953
12 pages
Auto Dataset MK - Part 1: Pandas PD Numpy NP
No ratings yet
Auto Dataset MK - Part 1: Pandas PD Numpy NP
18 pages
Data Preparation-all pds
No ratings yet
Data Preparation-all pds
15 pages
ADB22X Disc Brake / Hub Parts Information Book: Next Page END
No ratings yet
ADB22X Disc Brake / Hub Parts Information Book: Next Page END
10 pages
ANB.AWT
No ratings yet
ANB.AWT
10 pages
nalysis-manipulation-and-cleaning
No ratings yet
nalysis-manipulation-and-cleaning
15 pages
Automobile Price Data
No ratings yet
Automobile Price Data
53 pages
41 - Đinh Thị Thùy Linh - 23a4050209.ipynb Colaboratory
No ratings yet
41 - Đinh Thị Thùy Linh - 23a4050209.ipynb Colaboratory
4 pages
Car Price Prediction
No ratings yet
Car Price Prediction
35 pages
pandas-2
No ratings yet
pandas-2
18 pages
car-price-prediction-1 (1)
No ratings yet
car-price-prediction-1 (1)
24 pages
DF - Symboling DF - Symboling DF - Sym
No ratings yet
DF - Symboling DF - Symboling DF - Sym
1 page
Airbag Disposal
No ratings yet
Airbag Disposal
7 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
BMW - E9x - Code - List - v1 - Translations Quick Reference
No ratings yet
BMW - E9x - Code - List - v1 - Translations Quick Reference
11 pages
Machine Learning Project 1690186790
No ratings yet
Machine Learning Project 1690186790
18 pages
Exp_5_Exploratory_Data_Analysis_sdk_ok
No ratings yet
Exp_5_Exploratory_Data_Analysis_sdk_ok
13 pages
datacleaning.ipynb - Colab
No ratings yet
datacleaning.ipynb - Colab
4 pages
2
No ratings yet
2
6 pages
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
No ratings yet
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
5 pages
elite-sports-cars-eda
No ratings yet
elite-sports-cars-eda
9 pages
UNECE Regulation 53.03.04
No ratings yet
UNECE Regulation 53.03.04
3 pages
SB 10059770 2280
No ratings yet
SB 10059770 2280
7 pages
DV ca-1
No ratings yet
DV ca-1
9 pages
Car Price Prediction Using ML
No ratings yet
Car Price Prediction Using ML
11 pages
Topic
No ratings yet
Topic
9 pages
Customer Information Vehicle Information Service Information
No ratings yet
Customer Information Vehicle Information Service Information
1 page
Untitled.ipynb_ (5) - JupyterLab
No ratings yet
Untitled.ipynb_ (5) - JupyterLab
4 pages
Mohy - Jupyter Notebook
No ratings yet
Mohy - Jupyter Notebook
3 pages
Ferrari project
No ratings yet
Ferrari project
2 pages
Lab Assignment 6
No ratings yet
Lab Assignment 6
5 pages
Untitled 21
No ratings yet
Untitled 21
6 pages
Untitled 0
No ratings yet
Untitled 0
3 pages
Mtcars - Ipynb - Colab
No ratings yet
Mtcars - Ipynb - Colab
2 pages
BB Sir - Compact Nov 24 (CAF)-226
No ratings yet
BB Sir - Compact Nov 24 (CAF)-226
1 page
Autos
No ratings yet
Autos
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
22 pages
DSBDA1
No ratings yet
DSBDA1
5 pages
Expt2.ipynb - Colaboratory
No ratings yet
Expt2.ipynb - Colaboratory
2 pages
Drop the columns _id_ and _Unnamed_ 0_ from axis...
No ratings yet
Drop the columns _id_ and _Unnamed_ 0_ from axis...
3 pages
Muestre los tipos de datos de cada columna utiliz...
No ratings yet
Muestre los tipos de datos de cada columna utiliz...
2 pages
Bobcat T40.180SLP 75 V
No ratings yet
Bobcat T40.180SLP 75 V
2 pages
Urocargo: E ML150E25WS 4x4 Rigid - Single Tyres
No ratings yet
Urocargo: E ML150E25WS 4x4 Rigid - Single Tyres
2 pages
Timming m20
No ratings yet
Timming m20
5 pages
vertopal.com_Numpy,,Pandas(24.4.25)
No ratings yet
vertopal.com_Numpy,,Pandas(24.4.25)
1 page
Car Inspection Report - Used Honda City 1.5 i-VTE…
No ratings yet
Car Inspection Report - Used Honda City 1.5 i-VTE…
2 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Bendix Gen4 y 5 Frame Abs
100% (1)
Bendix Gen4 y 5 Frame Abs
7 pages
Automotive Intelligentsia 2009-2010 Sports Car Guide
From Everand
Automotive Intelligentsia 2009-2010 Sports Car Guide
Jim Gorzelany
5/5 (2)
How to Power Tune Alfa Romeo Twin-Cam Engines
From Everand
How to Power Tune Alfa Romeo Twin-Cam Engines
Jim Kartalamakis
No ratings yet
How To Build & Power Tune Weber & Dellorto DCOE, DCO/SP & DHLA Carburettors 3rd Edition
From Everand
How To Build & Power Tune Weber & Dellorto DCOE, DCO/SP & DHLA Carburettors 3rd Edition
Des Hammill
No ratings yet