0% found this document useful (0 votes)

80 views5 pages

6 - Train - Test - Split - Ipynb - Colaboratory

The document describes using a dataset containing prices of used BMW cars to build a linear regression model to predict price based on mileage and age. The data is split into training and test sets using train_test_split. Scatter plots show linear relationships between price and the input variables. A linear regression model is fit on the training set and used to make predictions on the test set, achieving a score of 0.927. The random_state argument is also demonstrated.

Uploaded by

duryodhan sahoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views5 pages

6 - Train - Test - Split - Ipynb - Colaboratory

Uploaded by

duryodhan sahoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Training And Testing Available Data

We have a dataset containing prices of used BMW cars. We are going to analyze this dataset
and build a prediction function that can predict a price by taking mileage and age of the car
as
input. We will use sklearn train_test_split method to split training and testing dataset

import pandas as pd
df = pd.read_csv("carprices.csv")
df.head()

Mileage Age(yrs) Sell Price($)

0 69000 6 18000

1 35000 3 34000

2 57000 5 26100

3 22500 2 40000

4 46000 4 31500

import matplotlib.pyplot as plt
%matplotlib inline

Car Mileage Vs Sell Price ($)

plt.scatter(df['Mileage'],df['Sell Price($)'])

<matplotlib.collections.PathCollection at 0x2882746dd30>

Car Age Vs Sell Price ($)

plt.scatter(df['Age(yrs)'],df['Sell Price($)'])

<matplotlib.collections.PathCollection at 0x28826e06240>

Looking at above two scatter plots, using linear regression model makes sense as we can
clearly see a linear relationship between our dependant (i.e. Sell Price) and independant
variables (i.e. car age and car mileage)

The approach we are going to use here is to split available data in two sets

1. Training: We will train our model on this dataset

2. Testing: We will use this subset to make actual predictions using trained model

The reason we don't use same training set for testing is because our model has seen those
samples before, using same samples for making predictions might give us wrong impression
about accuracy of our model. It is like you ask same questions in exam paper as you tought the
students in the class.

X = df[['Mileage','Age(yrs)']]

y = df['Sell Price($)']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)

X_train

Mileage Age(yrs)

11 79000 7

17 69000 5

10 83000 7

1 35000 3

0 69000 6

8 91000 8

7 72000 6

16 28000 2

6 52000 5

X_test
4 46000 4

19 52000 5
Mileage Age(yrs)
2 57000 5
3 22500 2
5 59000 5
12 59000 5
15 25400 3
14 82450 7

13 58780 4

9 67000 6

18 87600 8

y_train

11 19500

17 19700

10 18700

1 34000

0 18000

8 12000

7 19300

16 35500

6 32000

4 31500

19 28200

2 26100

5 26750

15 35000

Name: Sell Price($), dtype: int64

y_test

3 40000

12 26000

14 19400

13 27500

9 22000

18 12800

Name: Sell Price($), dtype: int64

Lets run linear regression model now

from sklearn.linear_model import LinearRegression

clf = LinearRegression()

clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

X_test

Mileage Age(yrs)

3 22500 2

12 59000 5

14 82450 7

13 58780 4

9 67000 6

18 87600 8

clf.predict(X_test)

array([ 38166.23426912, 25092.95646646, 16773.29470749, 24096.93956163,

22602.44614295, 15559.98266172])

y_test

3 40000

12 26000

14 19400

13 27500

9 22000

18 12800

Name: Sell Price($), dtype: int64

clf.score(X_test, y_test)

0.92713129118963111

random_state argument

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=10)

X_test

Mileage Age(yrs)

7 72000 6

10 83000 7

5 59000 5

6 52000 5

3 22500 2

18 87600 8

Colab paid products

-
Cancel contracts here

ICT Grade 7 First Quarter - Week 4
No ratings yet
ICT Grade 7 First Quarter - Week 4
41 pages
Keng Tiong NG - PCB-RE - Real-World Examples (2019) - Libgen - Li
No ratings yet
Keng Tiong NG - PCB-RE - Real-World Examples (2019) - Libgen - Li
298 pages
Quotation For Door Lock System
No ratings yet
Quotation For Door Lock System
1 page
MDO2 LON Mit Handpult - 3127B02G PDF
No ratings yet
MDO2 LON Mit Handpult - 3127B02G PDF
95 pages
KTP Lamtluang 2004
No ratings yet
KTP Lamtluang 2004
190 pages
Sports Club Applications System - Mostafa Okasha
No ratings yet
Sports Club Applications System - Mostafa Okasha
183 pages
How To Install Scania SDP3 2.31.1 Crack No Dongle On Windows 7 64-Bit - OBDII365.Com Official Blog
100% (3)
How To Install Scania SDP3 2.31.1 Crack No Dongle On Windows 7 64-Bit - OBDII365.Com Official Blog
13 pages
Linux Kernel Hackers' Guide
80% (5)
Linux Kernel Hackers' Guide
701 pages
Se Lab Manual
No ratings yet
Se Lab Manual
43 pages
G9 Dietexpert Report
No ratings yet
G9 Dietexpert Report
56 pages
COMSOL_RayOpticsModule_UsersGuide_Ver6.1
No ratings yet
COMSOL_RayOpticsModule_UsersGuide_Ver6.1
266 pages
NGO Management System-Full Coding
No ratings yet
NGO Management System-Full Coding
21 pages
07 FNG31 Session Management
No ratings yet
07 FNG31 Session Management
96 pages
DC 240 - DC250 Max Set Up
83% (6)
DC 240 - DC250 Max Set Up
36 pages
CML-FINAL
No ratings yet
CML-FINAL
20 pages
Layouts in Unit 5
No ratings yet
Layouts in Unit 5
12 pages
5 - One - Hot - Encoding - Ipynb - Colaboratory
No ratings yet
5 - One - Hot - Encoding - Ipynb - Colaboratory
8 pages
1 - Linear - Regression - Ipynb - Colaboratory
No ratings yet
1 - Linear - Regression - Ipynb - Colaboratory
7 pages
Lesson 4 Logic and Knowledge Representation
No ratings yet
Lesson 4 Logic and Knowledge Representation
100 pages
Sample Question Paper
No ratings yet
Sample Question Paper
6 pages
19 Lição - Pronomes Interrogativos
No ratings yet
19 Lição - Pronomes Interrogativos
8 pages
An Introduction To Seaborn
No ratings yet
An Introduction To Seaborn
42 pages
Explainable Ai in Pervasive Healthcare
No ratings yet
Explainable Ai in Pervasive Healthcare
25 pages
Fourier
No ratings yet
Fourier
13 pages
50.2 - Chi Square Goodness-of-Fit Test
No ratings yet
50.2 - Chi Square Goodness-of-Fit Test
11 pages
Suresh PHP Resume
No ratings yet
Suresh PHP Resume
4 pages
CS 4 - Knowledge Representation - First Order Logic
No ratings yet
CS 4 - Knowledge Representation - First Order Logic
86 pages
Statistics Presentation
No ratings yet
Statistics Presentation
21 pages
PMM-MD-53030-10 5 Phase Stepper Driver
No ratings yet
PMM-MD-53030-10 5 Phase Stepper Driver
18 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
5402 Owner's Guide: CA-Powered Speaker System With Control Pod
No ratings yet
5402 Owner's Guide: CA-Powered Speaker System With Control Pod
16 pages
PPT03-First Order Logic & Inference in FOL
No ratings yet
PPT03-First Order Logic & Inference in FOL
59 pages
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
No ratings yet
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
6 pages
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
No ratings yet
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
80 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
Statistics Powerpoint Presentation - Regression
No ratings yet
Statistics Powerpoint Presentation - Regression
17 pages
2 - Linear - Regression - Multivariate - Ipynb - Colaboratory
No ratings yet
2 - Linear - Regression - Multivariate - Ipynb - Colaboratory
4 pages
m8 Fol
No ratings yet
m8 Fol
27 pages
65 3 1 (2023)
No ratings yet
65 3 1 (2023)
12 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Data Science Intervieew Questions
100% (1)
Data Science Intervieew Questions
16 pages
5 Exercise One Hot Encoding - Ipynb - Colaboratory
No ratings yet
5 Exercise One Hot Encoding - Ipynb - Colaboratory
4 pages
Artificial Intelligence Presentation - RiadSpahiu
No ratings yet
Artificial Intelligence Presentation - RiadSpahiu
39 pages
Knowledge Representation First Order Logic
No ratings yet
Knowledge Representation First Order Logic
49 pages
Topic For The Class:: Knowledge and Reasoning
No ratings yet
Topic For The Class:: Knowledge and Reasoning
41 pages
examSectionGuide9_Results_2024-12-22-03-01
No ratings yet
examSectionGuide9_Results_2024-12-22-03-01
8 pages
All Pairs Shortest Path
No ratings yet
All Pairs Shortest Path
28 pages
FineTuning Process Using OpenAI 1703440516
No ratings yet
FineTuning Process Using OpenAI 1703440516
14 pages
SDWAN
100% (1)
SDWAN
281 pages
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
0% (1)
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
8 pages
Unit 2
No ratings yet
Unit 2
112 pages
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
No ratings yet
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
76 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
6 pages
Web3 Presentation
No ratings yet
Web3 Presentation
32 pages
Muhammad Tauseef CV
No ratings yet
Muhammad Tauseef CV
3 pages
AIML - 04 Single Layer Perceptron
No ratings yet
AIML - 04 Single Layer Perceptron
11 pages
Data Science Project
No ratings yet
Data Science Project
3 pages
Xilinx ISE Manual
No ratings yet
Xilinx ISE Manual
69 pages
Prompt Engineering For Vision Models Slides 1720084286
No ratings yet
Prompt Engineering For Vision Models Slides 1720084286
17 pages
Reinforcement Worksheet Chap 1 Class 8
No ratings yet
Reinforcement Worksheet Chap 1 Class 8
3 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
Data Science Introduction
No ratings yet
Data Science Introduction
82 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
SVM
No ratings yet
SVM
12 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Predicate Logic
No ratings yet
Predicate Logic
64 pages
CSC445: Neural Networks
No ratings yet
CSC445: Neural Networks
51 pages
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
No ratings yet
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
4 pages
ch9 Ensemble Learning
No ratings yet
ch9 Ensemble Learning
19 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Automatic Music Generation
No ratings yet
Automatic Music Generation
16 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Technical Seminar: Sapthagiri College of Engineering
No ratings yet
Technical Seminar: Sapthagiri College of Engineering
18 pages
Deep Learning: - Course Code: - Unit 1
No ratings yet
Deep Learning: - Course Code: - Unit 1
21 pages
Knowledge Representation Additional Reading
No ratings yet
Knowledge Representation Additional Reading
26 pages
Mining The Web Graph: Technical Seminar Presentation On
No ratings yet
Mining The Web Graph: Technical Seminar Presentation On
15 pages
Hospital Management System
100% (5)
Hospital Management System
34 pages
Best Practices For Prompt Engineering With The OpenAI
No ratings yet
Best Practices For Prompt Engineering With The OpenAI
6 pages
AutoGen - The Automated Program Generator
No ratings yet
AutoGen - The Automated Program Generator
196 pages
Application of First-Order Logic in Knowledge Based Systems PDF
No ratings yet
Application of First-Order Logic in Knowledge Based Systems PDF
7 pages
Knowledge Based Systems (Sistem Berbasis Pengetahuan) : Ir. Wahidin Wahab M.SC PH.D
No ratings yet
Knowledge Based Systems (Sistem Berbasis Pengetahuan) : Ir. Wahidin Wahab M.SC PH.D
33 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
Vector Database in LLMs
No ratings yet
Vector Database in LLMs
14 pages
QUESTION BANK UNIT 5 - Computer Organization and Architecture
No ratings yet
QUESTION BANK UNIT 5 - Computer Organization and Architecture
9 pages
Business Communication Course Syllabus 2019-2020
No ratings yet
Business Communication Course Syllabus 2019-2020
5 pages
Short Report On Expert Systems
100% (1)
Short Report On Expert Systems
12 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
From Everand
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
Fouad Sabry
No ratings yet