ML in Python Part-2

This document discusses various techniques for pre-processing data and evaluating machine learning models in Python. It covers: - Standardizing and normalizing numerical data for modeling using scikit-learn. - Resampling methods like k-fold cross-validation to evaluate model accuracy on unseen data. - Metrics for evaluating regression and classification algorithms, including accuracy, log loss, and RMSE. - Spot-checking algorithms like kNN, linear regression, and random forests on sample datasets. - Comparing the performance of different algorithms like logistic regression and LDA to select the best model.

Uploaded by

Usman Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

ML in Python Part-2

Uploaded by

Usman Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Machine Learning in Python 2

Dr. Hafeez
Prepare for Modeling
Pre-processing data
• Raw data might not be ready in the best shape
for modeling
• Pre-processing data is required
• To best present inherit structure in the data to
model algo
• What python offers to pre-process data?
scikit-learn offers
• Two standard idioms for transforming data
– Fit and multiple transform
– Combined fit-and-transform
• Techniques to prepare data for modeling
– Standardize numerical data (mean=0 and stdev=1)
• Through scale and center options
– Normalize numerical data (0-1)
• Through range option
– Explore advanced feature engineering
• Binarizing
Example: Pima inidans diabetes dataset
• Calculate parameters to standardize the data
• Create a standardize copy of input data
• Standardize data (mean=0, stdev=1)
• From sklearn.preprocessing import StandarScaler
• Import pandas
• Import numpy
• url=https://goo.gl/bDdBiA
• names = ['preg', 'plas', 'pres', 'skin', 'test',
'mass', 'pedi', 'age', 'class']
Code
• dataframe = pandas.read_csv(url, names=names)
• array = dataframe.values
• # separate array into input and output components
• X = array[:,0:8]
• Y = array[:,8]
• scaler = StandardScaler().fit(X)
• rescaledX = scaler.transform(X)
• # summarize transformed data
• numpy.set_printoptions(precision=3)
• print(rescaledX[0:5,:])

Resampling Methods: Algorithm evaluation

• The data split used to train a machine learning algo is

called training dataset
 Problem:
• However, such data split cannot be used to provide
reliable estimates of accuracy for the model on
new/unseen data
• Nonetheless, whole idea of creating model was to
enable predictions on new data
 Solution:
• Use resampling methods
Resampling Methods: Algorithm Evaluation

• Use statistical methods called resampling

methods
• Split your training data into further subsets
• Use some of the subsets for training and
remaining subsets to estimate the accuracy of
the model on unseen data
Resampling Methods: Algorithm Evaluation

• In nutshell:
• Split dataset into training and test sets
• estimate accuracy of an ML algo using k-fold cross
validation
– Splits training data into k subsets
• Estimate accuracy of an ML algo using leave one out
cross validation
 Next, use scikit-learn to estimate accuracy of Logistic
regression on Pima Indians of diabetes using 10-fold
cross validation
Evaluate using cross validation
• from pandas import read_csv
• from sklearn.model_selection import Kfold
• from sklearn.model_selection import
cross_val_score
• from sklearn.linear_model import
LogisticRegression
• url = "https://goo.gl/bDdBiA" names = ['preg',
'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age',
'class']
Evaluate using cross validation
• dataframe = read_csv(url, names=names)
• array = dataframe.values
• X = array[:,0:8]
• Y = array[:,8]
• kfold = KFold(n_splits=10, random_state=7)
• model = LogisticRegression()
• results = cross_val_score(model, X, Y, cv=kfold)
• print("Accuracy: %.3f%% (%.3f%%)" %
(results.mean()*100.0, results.std()*100.0))
Algorithm evaluation metrics
• Metrics to harness the ML algorithms in scikit-learn library
– cross_val_score()
– Defaults can be used for regression and classification problems
• Practice accuracy and kappa metrics on a classification
problem
• Practice how to generate confusion matrix and a
classification report
• Practice how to use RMSE and Rsquared metrics on a
regression problem
Algorithm evaluation metrics
• Calculate LogLoss metric on Pima Indians onset of diabetes dataset
• from pandas import read_csv
• from sklearn.model_selection import Kfold
• from sklearn.model_selection import cross_val_score
• from sklearn.linear_model import LogisticRegression
• url = "https://goo.gl/bDdBiA" names = ['preg', 'plas', 'pres', 'skin',
'test', 'mass', 'pedi', 'age', 'class’]
• dataframe = read_csv(url, names=names)
• array = dataframe.values
• X = array[:,0:8]
• Y = array[:,8]
Algorithm evaluation metrics
• kfold = Kfold(n_splits=10, random_state=7)
• model = LogisticRegression()
• scoring = 'neg_log_loss’
• results = cross_val_score(model, X, Y,
• cv=kfold, scoring=scoring)
• print("Logloss: %.3f (%.3f)" % (results.mean(),
results.std()))
Spot-Check ML Algorithm
• Difficult to know which ML Algo will perform best on the data
beforehand
• Trail and error, also spot-checking
• Scikit-learn library provides tools to compare the estimated
accuracy of these algos
• Spot-check linear algorithm on a dataset
– Linear regression, logistic regression and linear discriminant analysis
• Spot-check non-linear algorithm on a dataset
– kNN, SVM and CART
• Spot-check sophisticated ensemble algo on a dataset
– Random forest and stochastic gradient boosting
Spot-checking example
• k-Nearest Neighbor algo on Boston House Price
dataset
• #kNN Regression
• from pandas import read_csv from
sklearn.model_selection import KFold from
sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsRegressor
url = "https://goo.gl/FmJUSM" names = ['CRIM', 'ZN',
'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
'PTRATIO', 'B', 'LSTAT', 'MEDV']
Spot-checking example
• k-Nearest Neighbor algo on Boston House Price dataset
• dataframe = read_csv(url, delim_whitespace=True,
names=names)
• array = dataframe.values
• X = array[:,0:13] Y = array[:,13]
• kfold = KFold(n_splits=10, random_state=7) model =
KNeighborsRegressor()
• scoring = 'neg_mean_squared_error'
• results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
• print(results.mean())
Model comparison and selection
• Next, you need to compare estimated
performance of different algos and then,
select the best one
• Compare linear algos with each other for a
given dataset
• Compare non-linear algos with each other for
a given dataset
• Create plots of the results comparing algos
Model comparison and selection
• Example shows logistic regression and linear discriminant analysis
on Pima Indians diabetes dataset
• # Compare Algorithms
• from pandas import read_csv
• from sklearn.model_selection import KFold
• from sklearn.model_selection import cross_val_score
• from sklearn.linear_model import LogisticRegression
• from sklearn.discriminant_analysis import
LinearDiscriminantAnalysis
• # load dataset url = "https://goo.gl/bDdBiA"
• names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
Model comparison and selection
• dataframe = read_csv(url, names=names) array = dataframe.values
• X = array[:,0:8]
• Y = array[:,8]
• # prepare models
• models = []
• models.append(('LR', LogisticRegression()))
• models.append(('LDA', LinearDiscriminantAnalysis()))
• # evaluate each model in turn
• results = []
• names = []
• scoring = 'accuracy'
Model comparison and selection
• for name, model in models:
– kfold = KFold(n_splits=10,random_state=7)
– cv_results = cross_val_score(model, X,
Y,cv=kfold,scoring=scoring)
– results.append(cv_results)
– names.append(name)
– msg="%s:%f(%f)"%(name,cv_results.mean(),
cv_results.std())
– print(msg)
Algorithm Tuning

Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
Introduction To Computer Security Matt Bishop Exercise Solutions
11% (9)
Introduction To Computer Security Matt Bishop Exercise Solutions
3 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
Prof Ed 200 Questions With Answer Key
No ratings yet
Prof Ed 200 Questions With Answer Key
57 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Learn Machine Learning in One Lesson Book
No ratings yet
Learn Machine Learning in One Lesson Book
8 pages
Model_learning_steps
No ratings yet
Model_learning_steps
12 pages
week_3
No ratings yet
week_3
10 pages
ML Lab Programs (1)
No ratings yet
ML Lab Programs (1)
9 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Logistic Regression vs. SVMs - Solution
No ratings yet
Logistic Regression vs. SVMs - Solution
7 pages
Ritesh Mangla ML PracticalFile
No ratings yet
Ritesh Mangla ML PracticalFile
55 pages
TD2345
No ratings yet
TD2345
3 pages
PS Project - Jupyter Notebook
No ratings yet
PS Project - Jupyter Notebook
6 pages
ML_Lab_01999676272
No ratings yet
ML_Lab_01999676272
12 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
CP4252 MACHINE LEARNING LABORATORY
No ratings yet
CP4252 MACHINE LEARNING LABORATORY
37 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
ML Activity Kalyan
No ratings yet
ML Activity Kalyan
21 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
22 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
Ml Record
No ratings yet
Ml Record
23 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
ML
No ratings yet
ML
8 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
ML Lab Manual1
No ratings yet
ML Lab Manual1
23 pages
Data Analysis in Python-3
No ratings yet
Data Analysis in Python-3
4 pages
01 Machine Learning
No ratings yet
01 Machine Learning
25 pages
1. Linear Regression (Code)
No ratings yet
1. Linear Regression (Code)
9 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
3 pages
DTEXP5
No ratings yet
DTEXP5
8 pages
How To Prepare Your Dataset For Machine Learning in Python
No ratings yet
How To Prepare Your Dataset For Machine Learning in Python
14 pages
ML MANUAL WITH OUTPUTS (2)
No ratings yet
ML MANUAL WITH OUTPUTS (2)
30 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
36 pages
knowledge enginnering record
No ratings yet
knowledge enginnering record
21 pages
21CSC305P Ml - Lab Programs 1 -9
No ratings yet
21CSC305P Ml - Lab Programs 1 -9
36 pages
Kartik mlp 4-9prg (1)
No ratings yet
Kartik mlp 4-9prg (1)
10 pages
ML New record (5)
No ratings yet
ML New record (5)
51 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
data-mining-lab-manual-CSE-VII-Sem
No ratings yet
data-mining-lab-manual-CSE-VII-Sem
63 pages
ML Practical File
No ratings yet
ML Practical File
30 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
ML LAB 146
No ratings yet
ML LAB 146
50 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
ML Lab Programs For Exam
No ratings yet
ML Lab Programs For Exam
10 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
72b85f60-8523-423f-9efc-ff56aa21f3f3
No ratings yet
72b85f60-8523-423f-9efc-ff56aa21f3f3
29 pages
Machine
100% (1)
Machine
45 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
DMBI
No ratings yet
DMBI
15 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
Karmbir 19 ML
No ratings yet
Karmbir 19 ML
20 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
Class 14 - Basic Coding in Python - 5
No ratings yet
Class 14 - Basic Coding in Python - 5
24 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
17 pages
WLAN Security: Threats and Countermeasures: VOL 2 (2018) NO 4 e-ISSN: 2549-9904 ISSN: 2549-9610
No ratings yet
WLAN Security: Threats and Countermeasures: VOL 2 (2018) NO 4 e-ISSN: 2549-9904 ISSN: 2549-9610
7 pages
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
20 pages
ML in Python
No ratings yet
ML in Python
15 pages
Cluster Analysis 04: Elbow, Slihouette, Hierarchical Clustering, Agglomerative Clustering, Min, Max, Group Average
No ratings yet
Cluster Analysis 04: Elbow, Slihouette, Hierarchical Clustering, Agglomerative Clustering, Min, Max, Group Average
28 pages
ATS-15-16 Security Testing Part 3
No ratings yet
ATS-15-16 Security Testing Part 3
56 pages
ATS 12-13 Security Testing 1
No ratings yet
ATS 12-13 Security Testing 1
37 pages
ATS-14 Security Testing Part 2
No ratings yet
ATS-14 Security Testing Part 2
13 pages
Introduction Regression Analysis: Muhammad Naveed Aman
No ratings yet
Introduction Regression Analysis: Muhammad Naveed Aman
12 pages
Advance Topics in Info & Comm Security Lecture 2: Security Policies and Prevention Tips
No ratings yet
Advance Topics in Info & Comm Security Lecture 2: Security Policies and Prevention Tips
14 pages
Cluster Analysis Set 01: Types of Clustering
No ratings yet
Cluster Analysis Set 01: Types of Clustering
18 pages
Fulltext01 PDF
No ratings yet
Fulltext01 PDF
50 pages
A Method of Detecting SQL Injection Attack To Secure Web Applications
No ratings yet
A Method of Detecting SQL Injection Attack To Secure Web Applications
9 pages
Tilod Elementary School: Baras South District
No ratings yet
Tilod Elementary School: Baras South District
2 pages
The Objective Basis of Morality
No ratings yet
The Objective Basis of Morality
3 pages
Good Morning!!!!
No ratings yet
Good Morning!!!!
11 pages
Delegative Style
No ratings yet
Delegative Style
8 pages
Post Assessment Week 1 and 2
No ratings yet
Post Assessment Week 1 and 2
4 pages
8 Grade Math With Mrs. Cramton
No ratings yet
8 Grade Math With Mrs. Cramton
5 pages
2021 G1 Lower Secondary Science Syllabus Updated Apr 2024
No ratings yet
2021 G1 Lower Secondary Science Syllabus Updated Apr 2024
54 pages
Attribution Report
No ratings yet
Attribution Report
5 pages
UTS Lesson1 Philosophical Perspective
No ratings yet
UTS Lesson1 Philosophical Perspective
60 pages
HG - DLL - Module 7 - OCT 10-14-2022
No ratings yet
HG - DLL - Module 7 - OCT 10-14-2022
3 pages
Ch.3 The Elements in The Numinous
No ratings yet
Ch.3 The Elements in The Numinous
4 pages
Relationship of Teacher and School Performance of Public Elementary Schools in District I-B of The Division of Antipolo City
No ratings yet
Relationship of Teacher and School Performance of Public Elementary Schools in District I-B of The Division of Antipolo City
11 pages
Peace Corps Learning Chichewa Book - Teacher Manual
No ratings yet
Peace Corps Learning Chichewa Book - Teacher Manual
162 pages
Technology Enhanced Learning Environments - Healey
100% (1)
Technology Enhanced Learning Environments - Healey
6 pages
Ent600 - NPD - Guidelines & Template - Amendment 26 Sept 2017
No ratings yet
Ent600 - NPD - Guidelines & Template - Amendment 26 Sept 2017
5 pages
The Implementation of Internet Integration in The Teaching of History Subject in Putrajaya
No ratings yet
The Implementation of Internet Integration in The Teaching of History Subject in Putrajaya
7 pages
gordon-1996-review-perception
No ratings yet
gordon-1996-review-perception
2 pages
Quarter 2 Week 2 Matatag DLL
No ratings yet
Quarter 2 Week 2 Matatag DLL
3 pages
Physics Lab Report Format
No ratings yet
Physics Lab Report Format
3 pages
Business Analytics
No ratings yet
Business Analytics
11 pages
SUMMARY of An Article Squaring The Circle
No ratings yet
SUMMARY of An Article Squaring The Circle
4 pages
GoethesDynamicTypology Riegner SHPS 2013
No ratings yet
GoethesDynamicTypology Riegner SHPS 2013
11 pages
Thomas-Lee Paula M 200312 Dma
No ratings yet
Thomas-Lee Paula M 200312 Dma
6 pages
Divine Word College of Bangued Bangued, Abra College Department
No ratings yet
Divine Word College of Bangued Bangued, Abra College Department
3 pages
Rubrics On Herbal Preparation
No ratings yet
Rubrics On Herbal Preparation
2 pages
Ced109 - The Teacher and The School Curriculum
No ratings yet
Ced109 - The Teacher and The School Curriculum
14 pages
Week4 Ethics101
No ratings yet
Week4 Ethics101
21 pages
Module 3 Tech Research 3rd Year
No ratings yet
Module 3 Tech Research 3rd Year
10 pages
K-1 Teacher Led Key Ideas and Details
No ratings yet
K-1 Teacher Led Key Ideas and Details
2 pages