0% found this document useful (0 votes)

26 views8 pages

howxtre

Uploaded by

josephallen.abc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views8 pages

howxtre

Uploaded by

josephallen.abc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Machine Learning Lab

DA-2
Date: 17.08.2024
Name: Arnav Bahuguna
Reg: 21BCE3795

Q1. The data set Breast_Cancer (available in sklearn learn lib.) have 30 baseline
variables 'mean radius' 'mean texture' 'mean perimeter' 'mean area' 'mean smoothness'
'mean compactness' 'mean concavity' 'mean concave points' 'mean symmetry' 'mean
fractal dimension' 'radius error' 'texture error' 'perimeter error' 'area error' 'smoothness
error' 'compactness error' 'concavity error''concave points error' 'symmetry error' 'fractal
dimension error''worst radius' 'worst texture' 'worst perimeter' 'worst area''worst
smoothness' 'worst compactness' 'worst concavity’'worst concave points' 'worst
symmetry' 'worst fractal dimension' were obtained for each of n = 569 patients, as well as
the response of interest, a quantitative measure of disease progression one year after
baseline.

1. Apply standard scalar on the independent features mentioned below and Do regression
analysis for the impact on 'mean fractal dimension' by the features of 'mean texture',
'mean area' and 'mean compactness'.
2. Evaluate the performance of the regression model using R2, MSE, MAE and SSE

Q2. Download employee retention dataset from here:

https://www.kaggle.com/giripujar/hr-analytics

1. Do some exploratory data analysis to ﬁgure out which variables have direct and clear
impact on employee retention (ie, whether they leave the company or continue to work)
2. Now build logistic regression model using variables that were narrowed down in step 1
3. Measure the accuracy (precision,recal,F1 and ROC) of the model

Python Notebook:
Q1: Linear Regression Analysis
import pandas as pd
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt

bs = load_breast_cancer(as_frame=True)
df = bs.data

df.head()

mean radius mean texture mean perimeter mean area mean

smoothness \
0 17.99 10.38 122.80 1001.0
0.11840
1 20.57 17.77 132.90 1326.0
0.08474
2 19.69 21.25 130.00 1203.0
0.10960
3 11.42 20.38 77.58 386.1
0.14250
4 20.29 14.34 135.10 1297.0
0.10030

mean compactness mean concavity mean concave points mean

symmetry \
0 0.27760 0.3001 0.14710
0.2419
1 0.07864 0.0869 0.07017
0.1812
2 0.15990 0.1974 0.12790
0.2069
3 0.28390 0.2414 0.10520
0.2597
4 0.13280 0.1980 0.10430
0.1809

mean fractal dimension ... worst radius worst texture worst

perimeter \
0 0.07871 ... 25.38 17.33
184.60
1 0.05667 ... 24.99 23.41
158.80
2 0.05999 ... 23.57 25.53
152.50
3 0.09744 ... 14.91 26.50
98.87
4 0.05883 ... 22.54 16.67
152.20
worst area worst smoothness worst compactness worst concavity \
0 2019.0 0.1622 0.6656 0.7119
1 1956.0 0.1238 0.1866 0.2416
2 1709.0 0.1444 0.4245 0.4504
3 567.7 0.2098 0.8663 0.6869
4 1575.0 0.1374 0.2050 0.4000

worst concave points worst symmetry worst fractal dimension

0 0.2654 0.4601 0.11890
1 0.1860 0.2750 0.08902
2 0.2430 0.3613 0.08758
3 0.2575 0.6638 0.17300
4 0.1625 0.2364 0.07678

[5 rows x 30 columns]

X = df[['mean texture', 'mean area', 'mean compactness']]

y = df['mean fractal dimension']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.3, random_state=101)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X_train, y_train)
preds = model.predict(X_test)

from sklearn.metrics import mean_absolute_error, mean_squared_error,

r2_score

print("Mean absolute error: ", mean_absolute_error(preds, y_test))

print("Mean squared error: ", mean_squared_error(preds, y_test))
print("R2 score: ", r2_score(preds, y_test))
print("Sum of squared error: ", mean_squared_error(preds,
y_test)*len(y_test))

Mean absolute error: 0.002677190702938391

Mean squared error: 1.2463526001898355e-05
R2 score: 0.6673819543520271
Sum of squared error: 0.0021312629463246186
Q2: Logistic Regression Analysis
df1 = pd.read_csv('HR_comma_sep.csv')

df1.head()

satisfaction_level last_evaluation number_project

average_montly_hours \
0 0.38 0.53 2
157
1 0.80 0.86 5
262
2 0.11 0.88 7
272
3 0.72 0.87 5
223
4 0.37 0.52 2
159

time_spend_company Work_accident left promotion_last_5years

Department \
0 3 0 1 0
sales
1 6 0 1 0
sales
2 4 0 1 0
sales
3 5 0 1 0
sales
4 3 0 1 0
sales

salary
0 low
1 medium
2 medium
3 low
4 low

df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 satisfaction_level 14999 non-null float64
1 last_evaluation 14999 non-null float64
2 number_project 14999 non-null int64
3 average_montly_hours 14999 non-null int64
4 time_spend_company 14999 non-null int64
5 Work_accident 14999 non-null int64
6 left 14999 non-null int64
7 promotion_last_5years 14999 non-null int64
8 Department 14999 non-null object
9 salary 14999 non-null object
dtypes: float64(2), int64(6), object(2)
memory usage: 1.1+ MB

df1['Department'].unique()

array(['sales', 'accounting', 'hr', 'technical', 'support',

'management',
'IT', 'product_mng', 'marketing', 'RandD'], dtype=object)

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(drop='first', sparse_output=False)

encoded_features = encoder.fit_transform(df1[['salary']])

encoded_df = pd.DataFrame(encoded_features,
columns=encoder.get_feature_names_out(['salary']))
df1 = df1.join(encoded_df)
df1.drop(['salary'], axis=1, inplace=True)

df1.head()

satisfaction_level last_evaluation number_project

average_montly_hours \
0 0.38 0.53 2
157
1 0.80 0.86 5
262
2 0.11 0.88 7
272
3 0.72 0.87 5
223
4 0.37 0.52 2
159

time_spend_company Work_accident left promotion_last_5years

Department \
0 3 0 1 0
sales
1 6 0 1 0
sales
2 4 0 1 0
sales
3 5 0 1 0
sales
4 3 0 1 0
sales
salary_low salary_medium
0 1.0 0.0
1 0.0 1.0
2 0.0 1.0
3 1.0 0.0
4 1.0 0.0

plt.figure(figsize=(10,8), dpi=200)
sns.heatmap(df1.drop('Department', axis=1).corr(), cmap='coolwarm',
annot=True)

<Axes: >

X = df1[['last_evaluation', 'number_project', 'average_montly_hours',

'time_spend_company', 'salary_low']]
y = df1['left']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=101)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

from sklearn.linear_model import LogisticRegressionCV

model = LogisticRegressionCV(class_weight='balanced')

model.fit(X_train, y_train)
preds = model.predict(X_test)

from sklearn.metrics import precision_score, recall_score, f1_score,

roc_curve

print("precision: ", precision_score(preds,y_test))

print("recall: ", recall_score(preds,y_test))
print("f1 score: ", f1_score(preds,y_test))

precision: 0.6884939195509823
recall: 0.3768561187916027
f1 score: 0.48709463931171415

fpr, tpr, thresh = roc_curve(y_test, preds)

plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')

Text(0, 0.5, 'True Positive Rate')

Applsci 12 01357
No ratings yet
Applsci 12 01357
16 pages
Numerical Linear Algebra
No ratings yet
Numerical Linear Algebra
45 pages
Supervised Learning With Scikit-Learn: Preprocessing Data
No ratings yet
Supervised Learning With Scikit-Learn: Preprocessing Data
32 pages
Workflow Slides JSLong 110410 PDF
No ratings yet
Workflow Slides JSLong 110410 PDF
14 pages
ML Assignment Presentation
No ratings yet
ML Assignment Presentation
37 pages
Cts2 Problem Sheet 5
No ratings yet
Cts2 Problem Sheet 5
3 pages
Bank Loan
No ratings yet
Bank Loan
85 pages
Spatial Correlation New
No ratings yet
Spatial Correlation New
14 pages
Python Solution
No ratings yet
Python Solution
30 pages
Unit Class1
No ratings yet
Unit Class1
56 pages
Heart Disease Prediction! ❤️?
No ratings yet
Heart Disease Prediction! ❤️?
52 pages
Forecasting Economics and Financial Time Series: ARIMA vs. LSTM
No ratings yet
Forecasting Economics and Financial Time Series: ARIMA vs. LSTM
20 pages
Data Pre Processing 1
No ratings yet
Data Pre Processing 1
35 pages
Knowledge-Based Systems
No ratings yet
Knowledge-Based Systems
1 page
Lab Manual of DSIP
No ratings yet
Lab Manual of DSIP
22 pages
5. Security Assessment Priciples
No ratings yet
5. Security Assessment Priciples
42 pages
Unit 5 SPC
No ratings yet
Unit 5 SPC
16 pages
Employees Burnout Analysis
No ratings yet
Employees Burnout Analysis
20 pages
01 - Feature Engg
No ratings yet
01 - Feature Engg
43 pages
BPR: Bayesian Personalized Ranking From Implicit Feedback: Rendle Et Al. 452 UAI 2009
No ratings yet
BPR: Bayesian Personalized Ranking From Implicit Feedback: Rendle Et Al. 452 UAI 2009
10 pages
Certified Neural Network Watermarks With Randomized Smoothing
No ratings yet
Certified Neural Network Watermarks With Randomized Smoothing
16 pages
Vertopal.com AML Project LearnerNotebook LowCode
No ratings yet
Vertopal.com AML Project LearnerNotebook LowCode
74 pages
EWDLL Software Aging State Identification Based On LightGBM-LR Hybrid Model
No ratings yet
EWDLL Software Aging State Identification Based On LightGBM-LR Hybrid Model
13 pages
Employee Turnover
No ratings yet
Employee Turnover
20 pages
Information Theories Solutions Detail Trang 2
No ratings yet
Information Theories Solutions Detail Trang 2
19 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
PA DA2_merged
No ratings yet
PA DA2_merged
29 pages
Dinesh DWDM CCE
No ratings yet
Dinesh DWDM CCE
17 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
Online Identification of PMSM Parameters: Parameter Identifiability and Estimator Comparative Study
No ratings yet
Online Identification of PMSM Parameters: Parameter Identifiability and Estimator Comparative Study
14 pages
ML lab manual 1-10
No ratings yet
ML lab manual 1-10
58 pages
Operating System Attack
No ratings yet
Operating System Attack
16 pages
CONTROL SYSTEMS Set - 2 PDF
No ratings yet
CONTROL SYSTEMS Set - 2 PDF
10 pages
R Programing 6 Feb
No ratings yet
R Programing 6 Feb
10 pages
Intelligent System Dessign-Approaches
No ratings yet
Intelligent System Dessign-Approaches
7 pages
ML Assignment No 5
No ratings yet
ML Assignment No 5
11 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Hussain-assin2_cancrclassification
No ratings yet
Hussain-assin2_cancrclassification
12 pages
ML Cops
No ratings yet
ML Cops
17 pages
20BCP021 Assignment 3
No ratings yet
20BCP021 Assignment 3
7 pages
Wa0000
No ratings yet
Wa0000
8 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Principlesofworkflowin Dataanalysis: Scottlong
No ratings yet
Principlesofworkflowin Dataanalysis: Scottlong
14 pages
Network Infrastructure (1)
No ratings yet
Network Infrastructure (1)
18 pages
A95-R5 July2023
No ratings yet
A95-R5 July2023
8 pages
Breast Cancer Diagnosis 1703707725
No ratings yet
Breast Cancer Diagnosis 1703707725
52 pages
Tarea 4
No ratings yet
Tarea 4
6 pages
Application
No ratings yet
Application
22 pages
4 2 Online
No ratings yet
4 2 Online
25 pages
Internet of Things IoT Assisted Context Aware Fertilizer Recommendation
No ratings yet
Internet of Things IoT Assisted Context Aware Fertilizer Recommendation
15 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
11zon - Merged-Files (1) - Removed - Removed
No ratings yet
11zon - Merged-Files (1) - Removed - Removed
7 pages
OpenLab2
No ratings yet
OpenLab2
15 pages
Ike Theory
No ratings yet
Ike Theory
24 pages
Employee Turnover Analytics
No ratings yet
Employee Turnover Analytics
32 pages
Clustering Documentation Python Code
No ratings yet
Clustering Documentation Python Code
8 pages
KNN - Jupyter Notebook (1)
No ratings yet
KNN - Jupyter Notebook (1)
7 pages
Coding
No ratings yet
Coding
9 pages
Vertopal.com_ML Project 2
No ratings yet
Vertopal.com_ML Project 2
19 pages
5 Breast Cancer Model - Ipynb Colab
No ratings yet
5 Breast Cancer Model - Ipynb Colab
5 pages
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
6 pages
Merged
No ratings yet
Merged
35 pages
ML LAB 12 - Jupyter Notebook
No ratings yet
ML LAB 12 - Jupyter Notebook
11 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
Cyclic Code
No ratings yet
Cyclic Code
5 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Sample Paper 5 Q
No ratings yet
Sample Paper 5 Q
3 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
Openlab1
No ratings yet
Openlab1
17 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
Practical 3
No ratings yet
Practical 3
8 pages
PA DA1
No ratings yet
PA DA1
17 pages
Applications of machine learning techniques for enhancing nondestructive food quality and safety detection
No ratings yet
Applications of machine learning techniques for enhancing nondestructive food quality and safety detection
22 pages
Exp_2-EDA_CaliforniaData Set_HeatMap_PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp_2-EDA_CaliforniaData Set_HeatMap_PairPlot-checkpoint - Jupyter Notebook
12 pages
Notes DSA
No ratings yet
Notes DSA
15 pages
Adjusting Brightness and Contrast
No ratings yet
Adjusting Brightness and Contrast
5 pages
Statistical Data Analysis - Ipynb - Colaboratory
No ratings yet
Statistical Data Analysis - Ipynb - Colaboratory
6 pages
BTECH1010622_LAB4
No ratings yet
BTECH1010622_LAB4
4 pages
OLS Assumptions and diagnostics
No ratings yet
OLS Assumptions and diagnostics
18 pages
Active Vibration Control of Smart Piezo
No ratings yet
Active Vibration Control of Smart Piezo
8 pages
Elastic Net Clustering Algorithm - Copy
No ratings yet
Elastic Net Clustering Algorithm - Copy
6 pages
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
LAB # 08 Naive Bayes.ipynb - Colab
No ratings yet
LAB # 08 Naive Bayes.ipynb - Colab
3 pages
125994159
No ratings yet
125994159
12 pages
KNN - Ipynb - Colaboratory
No ratings yet
KNN - Ipynb - Colaboratory
3 pages
DTE-2 R Language Paper
No ratings yet
DTE-2 R Language Paper
8 pages
7.01 Feature Selection
No ratings yet
7.01 Feature Selection
3 pages
Linear Non Linear Data Structure
No ratings yet
Linear Non Linear Data Structure
1 page
Vertopal.com Random Forest
No ratings yet
Vertopal.com Random Forest
5 pages
IoT Based Soil Nutrients Analysis and Monitoring System for Smart Agriculture
No ratings yet
IoT Based Soil Nutrients Analysis and Monitoring System for Smart Agriculture
7 pages
CAO Task
No ratings yet
CAO Task
1 page
Correlation: Import As Import As Import As Import As From Import From Import Import Matplotlib Import
No ratings yet
Correlation: Import As Import As Import As Import As From Import From Import Import Matplotlib Import
1 page
Project 1
No ratings yet
Project 1
6 pages
OPMT Final Exam Guide Q5
No ratings yet
OPMT Final Exam Guide Q5
5 pages
686807
No ratings yet
686807
5 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
4new (1)
No ratings yet
4new (1)
4 pages
Mlext
No ratings yet
Mlext
1 page
Gre Formula Book
From Everand
Gre Formula Book
Saifuddin Kamran
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet