Prerequisite

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Prerequisite : 1

Aim:- Implement EDA using matplotlib library


Objective

The objective of this practical is to perform Exploratory Data Analysis (EDA) using the matplotlib
library to visualize and understand patterns, trends, and distributions within the dataset.

Description

EDA is a critical step in understanding data before applying machine learning algorithms. It involves
summarizing the dataset and visualizing its structure. Matplotlib is a widely-used Python library that
allows the creation of static, interactive, and animated visualizations. In this practical, we will
generate histograms, scatter plots, bar plots, and box plots to gain insights into the dataset's
characteristics.

Requirements

• System Requirements:

o Python installed with necessary libraries (matplotlib, pandas).

o Dataset for analysis (e.g., CSV format).

• Software Requirements:

o Python version 3.x.

o Libraries: matplotlib, pandas.

Conclusion

Performing EDA with matplotlib provides a clear understanding of the dataset's structure and
relationships. Through visualization, we can detect patterns, correlations, and outliers that inform
data preprocessing steps, ultimately improving the effectiveness of machine learning models.
Prerequisite : 2
Aim:- Using House Price Prediction dataset split the dataset in training and testing

Objective
The objective of this practical is to split the House Price Prediction dataset into training and
testing sets, which is essential for building and evaluating a machine learning model.
Description
In machine learning, splitting the dataset into training and testing sets allows us to train a
model on one portion (training set) and evaluate its performance on unseen data (testing
set). This ensures that the model generalizes well to new data. The train_test_split function
from sklearn.model_selection is commonly used for this purpose. Typically, 80% of the data
is used for training and 20% for testing.
Requirements
• System Requirements:
o Python environment with machine learning libraries installed.
o House Price Prediction dataset (CSV format).
• Software Requirements:
o Python version 3.x.
o Libraries: pandas, scikit-learn.
Conclusion
Splitting the dataset into training and testing sets is a crucial step in machine learning model
development. It allows for proper model evaluation and helps avoid overfitting. The training
set is used to fit the model, while the testing set ensures the model's effectiveness on new,
unseen data. This step enhances the reliability of the predictive model for house price
prediction.

You might also like