GmPrac1 - Jupyter Notebook
GmPrac1 - Jupyter Notebook
In [2]: df = pd.read_csv("autodata.csv")
In [3]: df.head(5)
Out[3]:
num-
normalized- fuel- body- drive- engine- whe
symboling make aspiration of-
losses type style wheels location ba
doors
alfa-
0 3 122.0 gas std two convertible rwd front 8
romero
alfa-
1 3 122.0 gas std two convertible rwd front 8
romero
alfa-
2 1 122.0 gas std two hatchback rwd front 9
romero
5 rows × 26 columns
In [4]: df.tail(5)
Out[4]:
num-
normalized- fuel- body- drive- engine- wheel
symboling make aspiration of-
losses type style wheels location base
doors
200 -1 95.0 volvo gas std four sedan rwd front 109.1
201 -1 95.0 volvo gas turbo four sedan rwd front 109.1
202 -1 95.0 volvo gas std four sedan rwd front 109.1
203 -1 95.0 volvo diesel turbo four sedan rwd front 109.1
204 -1 95.0 volvo gas turbo four sedan rwd front 109.1
5 rows × 26 columns
In [5]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 symboling 205 non-null int64
1 normalized-losses 205 non-null float64
2 make 205 non-null object
3 fuel-type 205 non-null object
4 aspiration 205 non-null object
5 num-of-doors 203 non-null object
6 body-style 205 non-null object
7 drive-wheels 205 non-null object
8 engine-location 205 non-null object
9 wheel-base 205 non-null float64
10 length 205 non-null float64
11 width 205 non-null float64
12 height 205 non-null float64
13 curb-weight 205 non-null int64
14 engine-type 205 non-null object
15 num-of-cylinders 205 non-null object
16 engine-size 205 non-null int64
17 fuel-system 205 non-null object
18 bore 205 non-null float64
19 stroke 205 non-null float64
20 compression-ratio 205 non-null float64
21 horsepower 205 non-null float64
22 peak-rpm 205 non-null float64
23 city-mpg 205 non-null int64
24 highway-mpg 205 non-null int64
25 price 205 non-null float64
dtypes: float64(11), int64(5), object(10)
memory usage: 41.8+ KB
In [6]: df.describe()
Out[6]:
normalized- wheel-
symboling length width height curb-weight
losses base
In [7]: df.isnull()
Out[7]:
num-
normalized- fuel- body- drive- engine- wheel-
symboling make aspiration of-
losses type style wheels location base
doors
0 False False False False False False False False False False
1 False False False False False False False False False False
2 False False False False False False False False False False
3 False False False False False False False False False False
4 False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ...
200 False False False False False False False False False False
201 False False False False False False False False False False
202 False False False False False False False False False False
203 False False False False False False False False False False
204 False False False False False False False False False False
In [9]: df.notnull().sum()
In [13]: df['num-of-doors'].value_counts()
In [14]: df['num-of-doors'].value_counts().idxmax()
Out[14]: 'four'
In [15]: # Replace missing 'num-of-doors' values with the most frequent value ('four
df["num-of-doors"].fillna(df["num-of-doors"].mode()[0], inplace=True)
# Drop rows with NaN values in the "horsepower" column
df.dropna(subset=["horsepower"], axis=0, inplace=True)
# Reset the index after dropping rows
df.reset_index(drop=True, inplace=True)
In [17]: df.isnull().sum()
Out[17]: symboling 0
normalized-losses 0
make 0
fuel-type 0
aspiration 0
num-of-doors 0
body-style 0
drive-wheels 0
engine-location 0
wheel-base 0
length 0
width 0
height 0
curb-weight 0
engine-type 0
num-of-cylinders 0
engine-size 0
fuel-system 0
bore 0
stroke 0
compression-ratio 0
horsepower 0
peak-rpm 0
city-mpg 0
highway-mpg 0
price 0
dtype: int64
Out[18]:
num-
normalized- fuel- body- drive- engine- whe
symboling make aspiration of-
losses type style wheels location ba
doors
alfa-
0 3 122.0 gas std two convertible rwd front 8
romero
alfa-
1 3 122.0 gas std two convertible rwd front 8
romero
alfa-
2 1 122.0 gas std two hatchback rwd front 9
romero
5 rows × 27 columns
In [19]: df['highway-L/100km'] = 235/df["highway-mpg"]
df.head()
Out[19]:
num-
normalized- fuel- body- drive- engine- whe
symboling make aspiration of-
losses type style wheels location ba
doors
alfa-
0 3 122.0 gas std two convertible rwd front 8
romero
alfa-
1 3 122.0 gas std two convertible rwd front 8
romero
alfa-
2 1 122.0 gas std two hatchback rwd front 9
romero
5 rows × 28 columns
Out[21]:
length width height
In [22]: df.columns
In [23]: df['aspiration'].value_counts()
Out[24]:
std turbo
0 1 0
1 1 0
2 1 0
3 1 0
4 1 0
In [26]: df.head()
Out[26]:
num-
normalized- fuel- body- drive- engine- wheel-
symboling make of- lengt
losses type style wheels location base
doors
alfa-
0 3 122.0 gas two convertible rwd front 88.6 0.81114
romero
alfa-
1 3 122.0 gas two convertible rwd front 88.6 0.81114
romero
alfa-
2 1 122.0 gas two hatchback rwd front 94.5 0.82268
romero
5 rows × 29 columns
Out[31]:
horsepower horsepower-binned
0 111.0 Medium
1 111.0 Medium
2 154.0 High
3 102.0 Medium
In [32]: df["horsepower-binned"].value_counts()
Out[39]:
peak-rpm peakrpm-binned
0 5000.0 Medium
1 5000.0 Medium
2 5000.0 Medium
3 5500.0 High
4 5500.0 High
In [40]: df["peakrpm-binned"].value_counts()
In [ ]: