0% found this document useful (0 votes)
19 views

Student - Linear Regression Example - Colaboratory

This document discusses preprocessing data for machine learning modeling. It loads and explores a dataset containing years of experience and salary for employees. It then splits the data into training and test sets, fits a simple linear regression model to the training set, makes predictions on the test set, and evaluates the model performance.

Uploaded by

Shreya Dutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Student - Linear Regression Example - Colaboratory

This document discusses preprocessing data for machine learning modeling. It loads and explores a dataset containing years of experience and salary for employees. It then splits the data into training and test sets, fits a simple linear regression model to the training set, makes predictions on the test set, and evaluates the model performance.

Uploaded by

Shreya Dutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

# # Data Preprocessing

# Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from google.colab import files


uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please reru
enable.
Saving Salary_Data.csv to Salary_Data.csv

# Importing the dataset


dataset = pd.read_csv('Salary_Data.csv')
dataset
YearsExperience Salary

0 1.1 39343.0
dataset.describe()
1 1.3 46205.0

2 YearsExperience
1.5 37731.0 Salary

count
3 30.000000
2.0 43525.030.000000

mean
4 5.313333
2.2 76003.000000
39891.0

5std 2.837888
2.9 27414.429785
56642.0

6min 1.100000
3.0 37731.000000
60150.0

725% 3.200000
3.2 56720.750000
54445.0

850% 4.700000
3.2 65237.000000
64445.0

975% 7.700000
3.7 100544.750000
57189.0

max
10 10.500000
3.9 122391.000000
63218.0

11 4.0 55794.0
# Mounting Google Drive
12 4.0 56957.0
from google.colab import drive
drive.mount('/content/drive')
13 4.1 57081.0

14
Drive 4.5 at
already mounted 61111.0
/content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount

15 4.9 67938.0

16 5.1 66029.0
# Importing the dataset
17 = pd.read_csv('/content/drive/My
# dataset 5.3 83088.0 Drive/ATAL/Salary_Data.csv')

18 5.9 81363.0
---------------------------------------------------------------------------
FileNotFoundError
19 6.0 93940.0 Traceback (most recent call last)
<ipython-input-6-242e04d314aa> in <module>()
20 1 # Importing
6.8 the91738.0
dataset
----> 2 dataset = pd.read_csv('/content/drive/My Drive/ATAL/Salary_Data.csv')
21 7.1 98273.0
4 frames
22 7.9 101302.0
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
23 2008 kwds["usecols"]
8.2 113812.0 = self.usecols
2009
->
24 2010 self._reader
8.7 109431.0= parsers.TextReader(src, **kwds)
2011 self.unnamed_cols = self._reader.unnamed_cols
25 2012 9.0 105582.0

26 9.5 116969.0
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
27 9.6 112635.0
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
28 10.3 122391.0
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/ATAL/Salary_Data.csv'
29 10.5 121872.0
SEARCH STACK OVERFLOW

print(dataset)

YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
5 2.9 56642.0
6 3.0 60150.0
7 3.2 54445.0
8 3.2 64445.0
9 3.7 57189.0
10 3.9 63218.0
11 4.0 55794.0
12 4.0 56957.0
13 4.1 57081.0
14 4.5 61111.0
15 4.9 67938.0
16 5.1 66029.0
17 5.3 83088.0
18 5.9 81363.0
19 6.0 93940.0
20 6.8 91738.0
21 7.1 98273.0
22 7.9 101302.0
23 8.2 113812.0
24 8.7 109431.0
25 9.0 105582.0
26 9.5 116969.0
27 9.6 112635.0
28 10.3 122391.0
29 10.5 121872.0

dataset.shape

(30, 2)

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 30 non-null float64
1 Salary 30 non-null float64
dtypes: float64(2)
memory usage: 608.0 bytes

# Extracting dependent and independent variables:


# Extracting independent variable:
X = dataset.iloc[:, :-1].values
# Extracting dependent variable:
y = dataset.iloc[:, 1].values

print(X)

[[ 1.1]
[ 1.3]
[ 1.5]
[ 2. ]
[ 2.2]
[ 2.9]
[ 3. ]
[ 3.2]
[ 3.2]
[ 3.7]
[ 3.9]
[ 4. ]
[ 4. ]
[ 4.1]
[ 4.5]
[ 4.9]
[ 5.1]
[ 5.3]
[ 5.9]
[ 6. ]
[ 6.8]
[ 7.1]
[ 7.9]
[ 8.2]
[ 8.7]
[ 9. ]
[ 9.5]
[ 9.6]
[10.3]
[10.5]]

print(y)

[ 39343 46205 37731 43525 39891 56642 60150 54445 64445 57189
63218 55794 56957 57081 61111 67938 66029 83088 81363 93940
91738 98273 101302 113812 109431 105582 116969 112635 122391 121872]

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

print(X_train)

[[ 2.9]
[ 5.1]
[ 3.2]
[ 4.5]
[ 8.2]
[ 6.8]
[ 1.3]
[10.5]
[ 3. ]
[ 2.2]
[ 5.9]
[ 6. ]
[ 3.7]
[ 3.2]
[ 9. ]
[ 2. ]
[ 1.1]
[ 7.1]
[ 4.9]
[ 4. ]]

print(X_test)

[[ 1.5]
[10.3]
[ 4.1]
[ 3.9]
[ 9.5]
[ 8.7]
[ 9.6]
[ 4. ]
[ 5.3]
[ 7.9]]

print(y_test)

[ 37731 122391 57081 63218 116969 109431 112635 55794 83088 101302]

print(y_train)

[ 56642. 66029. 64445. 61111. 113812. 91738. 46205. 121872. 60150.


39891. 81363. 93940. 57189. 54445. 105582. 43525. 39343. 98273.
67938. 56957.]

# Fitting Simple Linear Regression to the Training set


from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
▾ LinearRegression
y_predLinearRegression()
= regressor.predict(X_test)
#print("%2.f"%(y_pred))
print(y_pred)

[ 40835.10590871 123079.39940819 65134.55626083 63265.36777221


115602.64545369 108125.8914992 116537.23969801 64199.96201652
76349.68719258 100649.1375447 ]

# Visualising the Training set results


plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results


plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results


plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_test, y_pred, color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
print("Regressor slope: %2.f "%( regressor.coef_[0]))
print("Regressor intercept:%2.f "% regressor.intercept_)

Regressor slope: 9346


Regressor intercept:26816

YearsExperience= 10
print("Salary for given Years of Experience is : %.f" %(regressor.predict([[YearsExperience]])))

Salary for given Years of Experience is : 120276

from sklearn import metrics


print("MAE %2.f" %(metrics.mean_absolute_error(y_test,y_pred)))

MAE 3426

from sklearn import metrics


print("RMSE %2.f" %(np.sqrt(metrics.mean_absolute_error(y_test,y_pred))))

RMSE 59

print('Train Score: %f' %(regressor.score(X_train, y_train)))


print('Test Score: %f' % (regressor.score(X_test, y_test)) )

Train Score: 0.938190


Test Score: 0.974915

Colab paid products - Cancel contracts here

You might also like