0% found this document useful (0 votes)

14 views10 pages

Lab1 For Module3 - Python Code

The document outlines an exploratory data analysis of the US Cars Dataset, which includes information on 28 car brands sold in the US. It details the process of data cleaning, visualization, and analysis using libraries like Pandas, Matplotlib, and Seaborn. Key findings include popular car models, price distributions, and relationships between car age and price.

Uploaded by

Sean Jing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views10 pages

Lab1 For Module3 - Python Code

Uploaded by

Sean Jing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Lab3 by Dr.

Hoora Fakhrmoosavy

Exploratory Data Analysis of the US Cars Dataset

The US Cars Dataset contains scraped data from the online North
American Car auction. It contains information about 28 car brands for sale
in the US. In this post, we will perform exploratory data analysis on the US
Cars Dataset.

First, let’s import the Pandas library

import pandas as pd

Next, let’s remove the default display limits for Pandas data frames:
pd.set_option('display.max_columns', None)

Now, let’s read the data into a data frame:

df = pd.read_csv("USA_cars_datasets.csv")

Let’s print the list of columns in the data:

print(list(df.columns))

Let’s find the unique values in brand and year columns:

df.brand.unique()

years = df.year.unique()

Let's sort it:

np.sort(years)

We can also take a look at the number of rows in the data:

print("Number of rows: ", len(df))

Next, let’s print the first five rows of data:

print(df.head())
df.describe()

Now, let’s look at the brands of white cars:

df_d1 = df[df['color'] =='white']
print(set(df_d1['brand']))

We can also look at the most common brands for white cars:
from collections import Counter
print(dict(Counter(df_d1['brand']).most_common(5)))

Dealing with missing value:

df['mileage'].replace(np.nan, df[' mileage '].mean(), inplace=True)

df.year.replace(np.nan, df.year.mean(), inplace=True)

df.info()

Let's begin by importing matplotlib.pyplot and seaborn.

import seaborn as sns

import matplotlib

import matplotlib.pyplot as plt

%matplotlib inline

sns.set_style('darkgrid')

matplotlib.rcParams['font.size'] = 14

matplotlib.rcParams['figure.figsize'] = (8,
6)matplotlib.rcParams['figure.facecolor'] = '#00000000'
Let’s find popular models:
import plotly.express as px

models_df = df.dropna(subset = [ 'model'])

fig = px.treemap(models_df, path=['model'], title='Most Popular

Models')

fig.show()

Relationship between Car's Release Year and Price

The better way to study this relationship is to consider the age of car than the year when it was released.
Let's add another column in the dataframe for the age of car. The age is calculated with the help of datetime
library in Python.

import datetime

df['age'] = datetime.datetime.now().year - df['year']

sns.scatterplot(x=df.age, y=df.price, s=40);

Adding Log Price Column:

df['Log Price'] = df['price'].map(lambda p: np.log(p))

sns.scatterplot(x=df.Age, y=df['Log Price'], s=40);

On the logrithmic scale, the visualization becomes much clearer than before and the inverse relationship is
more obvious.

Popularity based on Model:

models= df.groupby('model')['model'].count()

models = pd.DataFrame(models)

models.columns = ['models Counts']

models.sort_values(by=['models Counts'], inplace=True, ascending=False)

models = models.head(5)

models.plot.bar();

plt.title('Preferred models')

plt.xlabel('models')

plt.ylabel('No. of Cars');
Finding Top brands in our database:

topbrands= df.groupby('brand')['brand'].count()

topbrands = pd.DataFrame(topbrands)

topbrands.columns = ['Top Brands']

topbrands.sort_values(by=['Top Brands'], inplace=True, ascending=False)

topbrands = topbrands.head(10)

topbrands.plot.bar();

plt.title('Famous Brands')

plt.xlabel('Brands')

plt.ylabel('No. of Cars');
Most Expensive Car Brands:

expensive= df.groupby('brand')['price'].mean()

expensive = pd.DataFrame(expensive)

expensive.columns = ['Average Prices']

expensive.sort_values(by=['Average Prices'], inplace=True, ascending=False)

expensive = expensive.head(10)

expensive.plot.bar();

plt.title('Expensive Brands')

plt.xlabel('Car Brands')

plt.ylabel('No. of Cars');
Let’s look at Distribution of Price:

cars_price_df = df[(df.price > 1000) & (df.price < 5000)]

plt.title('Distribution of Price')

plt.hist(cars_price_df.price, bins=np.arange(1000, 5000, 500));

plt.xlabel('Price')

plt.ylabel('No. of Samples')

plt.xlim(1000, 5000);

Let’s find the histogram of price:

plt.figure(figsize=(10,6))
sns.distplot(df['price']).set_title('Distribution of Car Prices')

Finally let’s create a boxplot of ‘price’ in the 5 most commonly occurring

‘brand’ categories:
import matplotlib.pyplot as plt

def get_boxplot_of_categories(data_frame, categorical_column, numerical_column, limit):

import seaborn as sns

from collections import Counter

keys = []

for i in dict(Counter(df[categorical_column].values).most_common(limit)):

keys.append(i)

print(keys)

df_new = df[df[categorical_column].isin(keys)]

sns.set()

sns.boxplot(x = df_new[categorical_column], y = df_new[numerical_column])

plt.show()

get_boxplot_of_categories(df, 'brand', 'price', 5)

Also, we can get for all brands:
plt.figure(figsize=(12,8))

sns.set(style='darkgrid')
sns.boxplot(x='brand', y='price', data=df).set_title("Price Distribution of Different Brands")

Please write a code to answer these questions.

Question1: Find Price Distribution of Top 3 Brands in database?

Question2: What is the average price of nissan, BMWand ford cars or

the 3 most famous car brands in database?

Question3: Cars from which release years are most cheapest (on
average) in database for the release years beyond 2000?

Question4: Which brand cars have covered most mileage on the roads?

Question5: Which state has the highest registered Mercedes cars?

Belarus Car Price Prediction
No ratings yet
Belarus Car Price Prediction
18 pages
Python Dataframe Assignment No 1 - Answerkey
No ratings yet
Python Dataframe Assignment No 1 - Answerkey
7 pages
Internship
No ratings yet
Internship
23 pages
Temp 2 Lab 1
No ratings yet
Temp 2 Lab 1
5 pages
Car Price Prediction
No ratings yet
Car Price Prediction
35 pages
9587 - 9638 - 9563 - ADS - Exp1.ipynb - Colab
No ratings yet
9587 - 9638 - 9563 - ADS - Exp1.ipynb - Colab
8 pages
Part A
No ratings yet
Part A
3 pages
Data Frames and Charts 2: 2.1 Dealing With Missing Values
No ratings yet
Data Frames and Charts 2: 2.1 Dealing With Missing Values
12 pages
Ist Part A
No ratings yet
Ist Part A
4 pages
Trilokesh Assignment
No ratings yet
Trilokesh Assignment
15 pages
Data Analytics Project PDF
No ratings yet
Data Analytics Project PDF
10 pages
Xiwf7pq1g: Pandas PD
No ratings yet
Xiwf7pq1g: Pandas PD
9 pages
Automobil E Data Analysis: Name Pgp-Dsba Online January' 21 Date: Dd/mm/yyyy
No ratings yet
Automobil E Data Analysis: Name Pgp-Dsba Online January' 21 Date: Dd/mm/yyyy
11 pages
1.5 Data Analysis With Python - Exploratory Data Analysis 1
No ratings yet
1.5 Data Analysis With Python - Exploratory Data Analysis 1
17 pages
Python Pandas Matplot
No ratings yet
Python Pandas Matplot
15 pages
Project Report
No ratings yet
Project Report
7 pages
Eda Notes
No ratings yet
Eda Notes
4 pages
Practical 2 .Ipynb - Colab
No ratings yet
Practical 2 .Ipynb - Colab
9 pages
Data Analisis 2
No ratings yet
Data Analisis 2
13 pages
Module 5 - Data Visualization - File 1
No ratings yet
Module 5 - Data Visualization - File 1
3 pages
qxc6bs1pw: 0.0.1 Matplotlib Assignment
No ratings yet
qxc6bs1pw: 0.0.1 Matplotlib Assignment
9 pages
Python Codes
No ratings yet
Python Codes
17 pages
Exploratiory Data Analysis
No ratings yet
Exploratiory Data Analysis
26 pages
Dav Week8 240953580
No ratings yet
Dav Week8 240953580
15 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Cars Sales Dashboard
No ratings yet
Cars Sales Dashboard
19 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
Project - Analyzing The Impact of Car Features On Price and Profitability
No ratings yet
Project - Analyzing The Impact of Car Features On Price and Profitability
8 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Project - Analyzing The Impact of Car Features On Price and Profitability
No ratings yet
Project - Analyzing The Impact of Car Features On Price and Profitability
8 pages
Car Price Prediction 1
No ratings yet
Car Price Prediction 1
24 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
7 pages
Laptop Prices Analysis
No ratings yet
Laptop Prices Analysis
6 pages
Sample Project - IP - 12
No ratings yet
Sample Project - IP - 12
14 pages
KrutikaKolhe 862467252 HW3
No ratings yet
KrutikaKolhe 862467252 HW3
14 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Lec ExploratoryDataAnalysis1Unit5Part1
No ratings yet
Lec ExploratoryDataAnalysis1Unit5Part1
22 pages
Team AN
No ratings yet
Team AN
23 pages
Intro To Exploratory Data Analysis Eda in Python
No ratings yet
Intro To Exploratory Data Analysis Eda in Python
7 pages
Numpy,,Pandas (24.4.25)
No ratings yet
Numpy,,Pandas (24.4.25)
1 page
Impact of Car Features
No ratings yet
Impact of Car Features
9 pages
Machine Learning Project 1690186790
No ratings yet
Machine Learning Project 1690186790
18 pages
Engo 645
No ratings yet
Engo 645
10 pages
Assgn
No ratings yet
Assgn
6 pages
22eg107a11 DWV
No ratings yet
22eg107a11 DWV
15 pages
Note
No ratings yet
Note
9 pages
4
No ratings yet
4
5 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Eda 1
No ratings yet
Eda 1
29 pages
Data Viscode Ass
No ratings yet
Data Viscode Ass
4 pages
USA Second Hand Car: Project Report
No ratings yet
USA Second Hand Car: Project Report
24 pages
Ip Project
No ratings yet
Ip Project
52 pages
Data Visualization For Python - Sales Retail - r1
No ratings yet
Data Visualization For Python - Sales Retail - r1
19 pages
IP Project Final
No ratings yet
IP Project Final
13 pages
Beginner Guide Matplotlib Data Visualization Exploration Python
No ratings yet
Beginner Guide Matplotlib Data Visualization Exploration Python
13 pages
Statistical Analysis with R For Dummies
From Everand
Statistical Analysis with R For Dummies
Joseph Schmuller
5/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
FDS Lab Manual R21
No ratings yet
FDS Lab Manual R21
47 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
Project Clean and Analyze Employee Exit Surveys
No ratings yet
Project Clean and Analyze Employee Exit Surveys
11 pages
Unit 4 Pandas
No ratings yet
Unit 4 Pandas
8 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Unit 04 Pandas
No ratings yet
Unit 04 Pandas
46 pages
Data Science AI Program Brochure
No ratings yet
Data Science AI Program Brochure
27 pages
Resources For Machine Learning
No ratings yet
Resources For Machine Learning
2 pages
Informatics Practice (065) - Practice Paper-1 - QP
No ratings yet
Informatics Practice (065) - Practice Paper-1 - QP
9 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
LAST MINUTES REVISION Pandas Series
No ratings yet
LAST MINUTES REVISION Pandas Series
6 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
IP Record Python 23-24 Aryan
No ratings yet
IP Record Python 23-24 Aryan
42 pages
Dav 1 Unit
No ratings yet
Dav 1 Unit
30 pages
IPython CUsersrohit
No ratings yet
IPython CUsersrohit
3 pages
6.IT-R23-II.B.Tech Syllabus
No ratings yet
6.IT-R23-II.B.Tech Syllabus
40 pages
Python Practical List 24
No ratings yet
Python Practical List 24
6 pages
Presentation 1
No ratings yet
Presentation 1
11 pages
Ip Project Dineshh
No ratings yet
Ip Project Dineshh
30 pages
Python For Finance Analyze Big Financial Data 1st Edition Yves Hilpisch Instant Download
No ratings yet
Python For Finance Analyze Big Financial Data 1st Edition Yves Hilpisch Instant Download
52 pages
227C4A Data Science
No ratings yet
227C4A Data Science
2 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
Machine Learning With Python.
100% (2)
Machine Learning With Python.
147 pages
Data Science With Python Course (CDS-P)
No ratings yet
Data Science With Python Course (CDS-P)
12 pages
PP Unit-5 Notes
No ratings yet
PP Unit-5 Notes
15 pages
Movie Recommendation System by Tarun Soni
No ratings yet
Movie Recommendation System by Tarun Soni
57 pages
Samuel Girma Resume 47692171
No ratings yet
Samuel Girma Resume 47692171
2 pages
Grade Xi- Ai Practical Answers(2025-26) Updated
No ratings yet
Grade Xi- Ai Practical Answers(2025-26) Updated
18 pages

Lab1 For Module3 - Python Code

Uploaded by

Lab1 For Module3 - Python Code

Uploaded by

Lab3 by Dr.

Exploratory Data Analysis of the US Cars Dataset

First, let’s import the Pandas library

Now, let’s read the data into a data frame:

Let’s print the list of columns in the data:

Let’s find the unique values in brand and year columns:

Let's sort it:

We can also take a look at the number of rows in the data:

Next, let’s print the first five rows of data:

Now, let’s look at the brands of white cars:

Dealing with missing value:

df.year.replace(np.nan, df.year.mean(), inplace=True)

Let's begin by importing matplotlib.pyplot and seaborn.

import matplotlib.pyplot as plt

models_df = df.dropna(subset = [ 'model'])

fig = px.treemap(models_df, path=['model'], title='Most Popular

Relationship between Car's Release Year and Price

df['age'] = datetime.datetime.now().year - df['year']

sns.scatterplot(x=df.age, y=df.price, s=40);

df['Log Price'] = df['price'].map(lambda p: np.log(p))

sns.scatterplot(x=df.Age, y=df['Log Price'], s=40);

Popularity based on Model:

models.columns = ['models Counts']

models.sort_values(by=['models Counts'], inplace=True, ascending=False)

topbrands.columns = ['Top Brands']

topbrands.sort_values(by=['Top Brands'], inplace=True, ascending=False)

expensive.columns = ['Average Prices']

expensive.sort_values(by=['Average Prices'], inplace=True, ascending=False)

cars_price_df = df[(df.price > 1000) & (df.price < 5000)]

plt.hist(cars_price_df.price, bins=np.arange(1000, 5000, 500));

Let’s find the histogram of price:

Finally let’s create a boxplot of ‘price’ in the 5 most commonly occurring

def get_boxplot_of_categories(data_frame, categorical_column, numerical_column, limit):

import seaborn as sns

sns.boxplot(x = df_new[categorical_column], y = df_new[numerical_column])

get_boxplot_of_categories(df, 'brand', 'price', 5)

Please write a code to answer these questions.

Question1: Find Price Distribution of Top 3 Brands in database?

Question2: What is the average price of nissan, BMWand ford cars or

Question5: Which state has the highest registered Mercedes cars?

You might also like