0% found this document useful (0 votes)

82 views19 pages

MLP - Week 6 - MNIST - LogitReg - Ipynb - Colaboratory

This video demonstrates logistic regression for classifying handwritten digits using scikit-learn. It first imports relevant libraries and datasets. It then performs binary classification to detect the digit 0, modifying the labels and checking them. It trains an SGD classifier on this binary classification task, plotting the learning curve and evaluating performance metrics. It finds the training accuracy is high but there is some misclassification based on the confusion matrix and classification report.

Uploaded by

Meer Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views19 pages

MLP - Week 6 - MNIST - LogitReg - Ipynb - Colaboratory

Uploaded by

Meer Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Video #1

Introduction
Recap from MLT

Logistic regression is the workhorse of machine learning.

Before deep learning era, logistic regression was the default choice for solving real life
classification problems with hundreds of thousands of features.
It works in binary, multi-class and multi-label classification set ups.

Imports
In this notebook we solve the same problem of recognizing Handwritten digits using Logistic
regression model.

# Common imports

import numpy as np

from pprint import pprint

from tempfile import mkdtemp

from shutil import rmtree

# to make this notebook's output stable across runs

np.random.seed(42)

#sklearn specific imports

# Dataset fetching

# Feature scaling

from sklearn.preprocessing import MinMaxScaler

# Pipeline utility

from sklearn.pipeline import make_pipeline

# Classifiers: dummy, logistic regression (SGD and LogisticRegression)

# and least square classification
from sklearn.dummy import DummyClassifier

from sklearn.linear_model import SGDClassifier,RidgeClassifier, LogisticRegression,Logisti

# Model selection

from sklearn.model_selection import cross_validate,RandomizedSearchCV,GridSearchCV,cross_v
from sklearn.model_selection import learning_curve

# Evaluation metrics

from sklearn.metrics import log_loss

from sklearn.metrics import ConfusionMatrixDisplay

from sklearn.metrics import precision_score, recall_score, classification_report

from sklearn.metrics import precision_recall_curve

from sklearn.metrics import auc,roc_curve,roc_auc_score

# scipy

from scipy.stats import loguniform

# To plot pretty figures

%matplotlib inline

from matplotlib import pyplot as plt

import seaborn as sns

# global settings

mpl.rc('axes', labelsize=14)

mpl.rc('xtick', labelsize=12)

mpl.rc('ytick', labelsize=12)

mpl.rc('figure',figsize=(8,6))

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-1-e69497263d72> in <module>()

44 # global settings

---> 45 mpl.rc('axes', labelsize=14)

46 mpl.rc('xtick', labelsize=12)

47 mpl.rc('ytick', labelsize=12)

NameError: name 'mpl' is not defined

SEARCH STACK OVERFLOW

# Ignore all warnings (convergence..) by sklearn

def warn(*args, **kwargs):

pass

import warnings

warnings.warn = warn

Note on the classification steps

Each classifier implemented for addressing this problem has the following steps:

Preprocessing
Classification
Train with cross validation
[Optional] Hyper-parameter tuning
Performance evaluation

Handwritten Digit Classification

We are going to repeat the digit recognition task with the following classifiers

1. SGD classifier.
2. Logistic Regression.
3. Ridge Classifier.

We make use of same MNIST dataset.

from sklearn.datasets import fetch_openml

X_pd,y_pd= fetch_openml('mnist_784',version=1,return_X_y=True)

# convert to numpy array

X = X_pd.to_numpy()

y = y_pd.to_numpy()

Split the dataset into training and testing set.

x_train,x_test,y_train,y_test = X[:60000], X[60000:], y[:60000], y[60000:]

Pre-Processing
Unlike perceptron, where scaling the range is optional (but recommended), sigmoid
requires scaling the feature range between 0 to 1.

Contemplate the consequences if we don't apply the scaling operation on the input
datapoints.

Note: Do not apply mean centering as it removes zeros from the data, however, zeros
should be zeros in the dataset

Since we already visualized the samples in the dataset and know sufficient details, we are
going to use pipeline to make the code compact.

Binary Classification : 0-Detector

Let us start with a simple classification problem, that is, binary classification.
Since the original label vector contains 10 classes, we need to modfiy the number of
classes to 2. Therefore, the label '0' will be changed to '1' and all other labels (1-9) will be
changed to '0'.

(Important: for perceptron we set the negative labels to -1)

# intialize new variable names with all -1

y_train_0 = np.zeros((len(y_train)))

y_test_0 = np.zeros((len(y_test)))

# find indices of digit 0 image

indx_0 = np.where(y_train =='0') # remember original labels are of type str not int

# use those indices to modify y_train_0&y_test_0

y_train_0[indx_0] = 1

indx_0 = np.where(y_test == '0')

y_test_0[indx_0] = 1

Sanity check⚛

Let's display the elements of y_train and y_train_0 to verify whether the labels are
properly modified.

num_images = 9 # Choose a square number

factor = np.int(np.sqrt(num_images))

fig,ax = plt.subplots(nrows=factor,ncols=factor,figsize=(8,6))

idx_offset = 0 # take "num_images" starting from the index "idx_offset"

for i in range(factor):

index = idx_offset+i*(factor)

for j in range(factor):

ax[i,j].imshow(X[index+j].reshape(28,28),cmap='gray')

ax[i,j].set_title('Label:{0}'.format(str(y_train_0[index+j])))

ax[i,j].set_axis_off()

Basline Models
We already know that the basline model would produce an accuracy of 90.12%.

[ ] ↳ 1 cell hidden

SGD Classifier
Before using LogisticRegression for Binary classification problem, it will be helpful to recall
important concepts and equations covered in the technique course.

Recap
Let us quickly recap various components in the general settings:

1. Training data: (features, label) or (X, y), where label y is a discrete quantity from a finite
set. Features in this case are pixel values of an image.
2. Model :
z = w0 x0 + w1 x1 + w2 x2 + … + wm xm

T
= w x
and passing it through the sigmoid non-linear function (or Logistic function)
1 1
σ(z) = =
−z
1 + e 1 + exp(−z)

3. Loss function: We use Binary cross entropy loss.

1
(i) (i) (i) (i)
J (w) = − ∑ [y log(hw (x )) + (1 − y )(1 − log(hw (x )))]
n
4. Optimization:

Gradient Descent

Let's look into the paramters of the SGDClassifier() estimator implemented in sklearn:

class sklearn.linear_model.SGDClassifier(loss='hinge', *, penalty='l2',

alpha=0.0001, l1_ratio=0.15, fit_intercept=True, max_iter=1000, tol=0.001,
shuffle=True, verbose=0, epsilon=0.1, n_jobs=None, random_state=None,
learning_rate='optimal', eta0=0.0, power_t=0.5, early_stopping=False,
validation_fraction=0.1, n_iter_no_change=5, class_weight=None, warm_start=False,
average=False) .

Setting the loss parameter to 'loss=log' makes it a logistic regression classifier.

We may refer to documentation for more details on the SGDClassifier class.

Training without regularization

Let's instatiate an object of SGDClassifier.

Since we want to plot learning curve of training, we set max_iter = 1 and
warm_start=True .
In addition, we are using constant learning rate throughout the training. For that we set
learning_rate='constant' and set the learning rate through eta0=0.01 .
Since we are not using regularization, we set alpha=0 .

estimator = SGDClassifier(loss='log',

penalty='l2',

max_iter=1,

warm_start=True,

eta0=0.01,

alpha=0,

learning_rate='constant',

random_state=1729)

pipe_sgd= make_pipeline(MinMaxScaler(), estimator)

Let us call fit method of SGDClassifier() in iterative manner to plot the iteration vs loss
curve.
Loss=[]

iterations= 100

for i in range(iterations):

pipe_sgd.fit(x_train,y_train_0)
y_pred = pipe_sgd.predict_proba(x_train)

Loss.append(log_loss(y_train_0,y_pred))

Let's plot the learning curve.

We have iterator number on the x-axis and loss on the y-axis.

plt.figure()

plt.plot(np.arange(iterations),Loss)

plt.grid(True)

plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.show()

Observe that the loss is reducing iteration after iteration.

It signals us the model is getting trained as per the expectation.

Now that the model is trained, let's us calculate the training and test accuracy of the model.

print('Training accuracy: %.2f'%pipe_sgd.score(x_train,y_train_0))

print('Testing accuracy: %.2f'%pipe_sgd.score(x_test,y_test_0))

We know that accuracy alone is not a good metric for binary classification.
Let's compute Precision, recall and f1-score for the model.

y_hat_train_0 = pipe_sgd.predict(x_train)

cm_display = ConfusionMatrixDisplay.from_predictions(y_train_0,y_hat_train_0,values_format
plt.show()

Observe that the off-diagonal elements are not zero, which indicates that a few examples are
misclassified.

print(classification_report(y_train_0, y_hat_train_0, digits=3))

Let's generate classification report on test set.

y_hat_test_0 = pipe_sgd.predict(x_test)

cm_display = ConfusionMatrixDisplay.from_predictions(y_test_0,y_hat_test_0,values_format='
plt.show()

print(classification_report(y_test_0, y_hat_test_0, digits=3))

Cross validation

estimator = SGDClassifier(loss='log',

penalty='l2',

max_iter=100,

warm_start=False,

eta0=0.01,

alpha=0,

learning_rate='constant',

random_state=1729)

# create a pipeline

pipe_sgd_cv = make_pipeline(MinMaxScaler(),estimator)

Now we will train the classifier with cross validation.

cv_bin_clf = cross_validate(pipe_sgd_cv,

x_train,

y_train_0,

cv=5,

scoring=['precision','recall','f1'],

return_train_score=True,

)

pprint(cv_bin_clf)

It is good to check the weight values of all the features which will help us decide whether
regularization could be of any help.

# call the estimator object in the steps of pipeline

weights = pipe_sgd[1].coef_

bias = pipe_sgd[1].intercept_

print('Dimention of Weights w: {0}'.format(weights.shape))

print('Bias :{0}'.format(bias))

plt.figure()

plt.plot(np.arange(0,784),weights[0,:])

plt.xlabel('Feature index')

plt.ylabel('Weight value')

plt.ylim((np.min(weights)-5, np.max(weights)+5))

plt.grid()

It is interesting to observe how many weight values are exactly zero.

Those features contribute nothing in the classification.

num_zero_w = weights.shape[1]-np.count_nonzero(weights)

print('Number of weights with value zero:%f'%num_zero_w)

Looking at the above plot and the performance of the model on training and test, it is
obvious that the model do not require any regularization.

Training with regularization

However, what happens to the performance of the model if we penalize, out of temptation,
the weight values even to a smaller degree.
Think about it.

[ ] ↳ 13 cells hidden

Displaying input image and its prediction

[ ] ↳ 5 cells hidden

Hyper-parameter tuning
The learning rate and regularization rate are two important hyper-parameters of
sgdclassifier.

Let's use RandomizedSearchCV() and draw the value from the log-uniform distribution to
find a better combination of eta and alpha

eta_grid = loguniform(1e-3,1e-1)

alpha_grid = loguniform(1e-7,1e-1)

Note that, eta_grid & alpha_grid are objects that contain a method called rvs() which
can be called to get random values of given size.

Therefore, we pass this eta_grid & alpha_grid objects to RandomizedSearchCV() .

Internally, it makes use of this rvs() method of these objects for sampling.

print(eta_grid.rvs(10, random_state=42))

estimator= SGDClassifier(loss='log',

penalty='l2',

max_iter=100,

warm_start=False,

learning_rate='constant',

eta0=0.01,

alpha=0,

random_state=1729)

pipe_sgd_hpt = make_pipeline(MinMaxScaler(),estimator)

print(pipe_sgd_hpt)

scores = RandomizedSearchCV(pipe_sgd_hpt,

param_distributions={

'sgdclassifier__eta0':eta_grid,

'sgdclassifier__alpha':alpha_grid

cv=5,

scoring='precision',

n_iter=10,

refit=True,

random_state=1729)

# It take quite a long time to finish

scores.fit(x_train,y_train_0)

pprint(scores.cv_results_)

Let us pick the best estimator from the results

print('Best combination:(alpha:{0:.8f},eta:{1:.5f})'.format(scores.best_params_['sgdclassi

best_sgd_clf = scores.best_estimator_

y_hat_train_best_0 = best_sgd_clf.predict(x_train)

cm_display = ConfusionMatrixDisplay.from_predictions(y_train_0,y_hat_train_best_0,values_f
plt.show()

y_hat_test_best_0 = best_sgd_clf.predict(x_test)

cm_display = ConfusionMatrixDisplay.from_predictions(y_test_0,y_hat_test_best_0,values_for
plt.show()

print(classification_report(y_train_0,y_hat_train_best_0))

print(classification_report(y_test_0,y_hat_test_best_0))

Question: Why are the evaluation metric not better than SGDClassifier with manual parameter
setting?

We need to allow the hyper-parameter tuning process to run long enough. Try the
procedure with more iterations - that will allow it to explore more in the parameter
space and get us the best hyperparameters.
Learning Curve

[ ] ↳ 5 cells hidden

Classification Report
Precision/Recall Tradeoff

[ ] ↳ 5 cells hidden

Video#2

Logistic Regression
In the previous setup, we used SGDClassifier to train 0-detector model in an iterative
manner.

We can also train such a classifier by solving a set of equations obtained by setting
the derivative of loss w.r.t. weights to 0.

These are not linear equations and therefore we need a different set of solvers.

Sklearn uses solvers like liblinear , newton-cg , sag , saga and lbfgs to find the optimal
weights.

Regularization is applied by default.

Parameters:

LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0,

fit_intercept=True, intercept_scaling=1, class_weight=None,
random_state=None, solver='lbfgs', max_iter=100,
multi_class='auto', verbose=0, warm_start=False, n_jobs=None,
l1_ratio=None)

Note some of the important default parameters:

Regularization: `penalty='l2'``
Regularization rate: C=1
Solver: solver = 'lbfgs'

Let's implement LogisticRegression() without regularization by setting the parameter

C = ∞ . Therefore, we may expect performance close to SGDClassifier without
regularization
Training without regularization
STEP 1: Instantiate a pipeline object with two stages:

The first stage contains MinMaxScaler for scaling the input.

The second state contains a LogisticRegression classifier with the regularization
rate C=infinity .

STEP 2: Train the pipeline with feature matrix x_train and label vector y_train_0 .

pipe_logit = make_pipeline(MinMaxScaler(),LogisticRegression(random_state=1729,

solver='lbfgs',

C=np.infty))

pipe_logit.fit(x_train, y_train_0)

By executing this cell, we trained our LogisticRegression classifier, which can be used for
making predictions on the new inputs.

Hyperparameter search
In this section, we will search for the best value for parameter C under certain scoring function.

with GridSearchCV
In the previous cell, we trained LogisticRegression classifier with C=infinity . You may
wonder if that's the best value for C and if it is not the best value, how do we search for it?

Now we will demonstrate how to search for the best parameter value for regularization rate C ,
as an illustration, using GridSearchCV .

Note that you can also use RandomizedSearchCV for this purpose.

In order to use GridSearchCV , we first define a set of values that we want to try out for C . The
best value of C will be found from this set.

We define the pipleline object exactly like before with one exception: we have set parameter
value C to 1 in LogisticRegression estimator. You can set it to any value as the best value
would be searched with grid search.

The additional step here is to instatiate a GridSearchCV object with a pipeline estimator,
parameter grid specification and f1 as a scoring function.

Note that you can use other scoring functions like precision , recall , however
the value of C is found such that the given scoring function is optimized.
from sklearn.pipeline import Pipeline

grid_Cs = [0, 1e-4, 1e-3, 1e-2, 1e-1, 1.0, 10.0, 100.0]

scaler = MinMaxScaler()

logreg = LogisticRegression(C=1.0, random_state=1729)

pipe = Pipeline(steps=[("scaler", scaler),

("logistic", logreg)])

pipe_logit_cv = GridSearchCV(

pipe,

param_grid={"logistic__C": grid_Cs},

scoring='f1')

pipe_logit_cv.fit(x_train, y_train_0)

The GridSearchCV finds the best value of C and refits the estimator by default on the entire
training set. This gives us the logistic regression classifier with best value of C .

We can check the value of the best parameter by accessing the best_params_ member variable
of the GridSearchCV object.

pipe_logit_cv.best_params_

and the best score can be found in best_score_ member variable and can be obtained as
follows:

pipe_logit_cv.best_score_

The best estimator can be accessed with best_estimator_ member variable.

pipe_logit_cv.best_estimator_

With LogisticRegressionCV
Instead of using GridSearchCV for finding the best value for parameter C , we can use
LogisticRegressionCV for performing the same job.

STEP 1: Here we make use of LogisticRegressionCV estimator with number of cross

validation folds cv=5 and scoring scheme scoring='f1' in the pipeline object.

STEP 2: In the second step, we train the pipeline object as before.

estimator = LogisticRegressionCV(cv=5, scoring='f1', random_state=1729)

logit_cv = make_pipeline(MinMaxScaler(), estimator)

logit_cv.fit(x_train, y_train_0)

By default, LogisticRegressionCV refits the model on the entire training set with the best
parameter values obtained via cross validation.

Performance evaluation

Precision, recall and f1-score

Let's evaluate performance of these three different logistic regression classifiers for detecting
digit 0 from the image.

Logistic regression without regularization.

Best logistic regression classifier found through GridSearchCV .
Best classifier found through LogisticRegressionCV .

Note that GridSearchCV and LogisticRegressionCV by default refit the classifier

for the best hyperparameter values.

Let's get predicition for test set with these three classifiers:

lr_y_hat_0 = pipe_logit.predict(x_test)

lr_gs_y_hat_0 = pipe_logit_cv.best_estimator_.predict(x_test)

lr_cv_y_hat_0 = logit_cv.predict(x_test)

We will compare precision, recall and F1 score for the three classifiers.

precision_lr = precision_score(y_test_0, lr_y_hat_0)

recall_lr = recall_score(y_test_0, lr_y_hat_0)

precision_lr_gs = precision_score(y_test_0, lr_gs_y_hat_0)

recall_lr_gs = recall_score(y_test_0, lr_gs_y_hat_0)

precision_lr_cv = precision_score(y_test_0, lr_cv_y_hat_0)

recall_lr_cv = recall_score(y_test_0, lr_cv_y_hat_0)

print (f"LogReg: precision={precision_lr}, recall={recall_lr}")

print (f"GridSearch: precision={precision_lr_gs}, recall={recall_lr_gs}")

print (f"LogRegCV: precision={precision_lr_cv}, recall={recall_lr_cv}")

Note that all three classifiers have roughly the same performance as measured with precision
and recall.
The LogisticRegression classifier obtained through GridSearchCV has the highest
precision - marginally higher than the other two classifiers.
The LogisticRegression classifier obtained through LogisticRegressionCV has the
highest recall - marginally higher than the other two classifiers.

Using PR-curve

y_scores_lr = pipe_logit.decision_function(x_test)

precisions_lr, recalls_lr, thresholds_lr = precision_recall_curve(

y_test_0, y_scores_lr)

y_scores_lr_gs = pipe_logit_cv.decision_function(x_test)

precisions_lr_gs, recalls_lr_gs, thresholds_lr_gs = precision_recall_curve(

y_test_0, y_scores_lr_gs)

y_scores_lr_cv = logit_cv.decision_function(x_test)

precisions_lr_cv, recalls_lr_cv, thresholds_lr_cv = precision_recall_curve(

y_test_0, y_scores_lr_cv)

We have all the quantities for plotting the PR curve. Let's plot PR cuves for all three classifiers:

plt.figure(figsize=(10,4))

plt.plot(recalls_lr[:-1], precisions_lr[:-1], 'b--', label='LogReg')

plt.plot(recalls_lr_gs[:-1], precisions_lr_gs[:-1], 'r--', label='GridSearchCV')

plt.plot(recalls_lr_cv[:-1], precisions_lr_cv[:-1], 'g--', label='LogRegCV')

plt.ylabel('Precision')
plt.xlabel('Recall')

plt.grid(True)

plt.legend(loc='lower left')

plt.show()

Note that the PR curves for all three classifiers overlap significantly.

Let's calculate area under the PR curve:

from sklearn.metrics import auc

auc_lr = auc(recalls_lr[:-1], precisions_lr[:-1])

auc_lr_gs = auc(recalls_lr_gs[:-1], precisions_lr_gs[:-1])

auc_lr_cv = auc(recalls_lr_cv[:-1], precisions_lr_cv[:-1])

print ("AUC-PR for logistic regression:", auc_lr)

print ("AUC-PR for grid search:", auc_lr_gs)

print ("AUC-PR for logistic regression CV:", auc_lr_cv)

Observe that the AUC for all three classifier is roughly the same with LogisticRegression
classifier obtained through cross validation and grid search have slightly higher AUC under PR
curve.

Confusion matrix
We show a confusion matrix for test set with logistic regression classifier:

cm_display = ConfusionMatrixDisplay.from_predictions(y_test_0,lr_y_hat_0,

values_format='.5g')

# it return matplotlin plot object

plt.show()

Confusion matrix for test set with logistic regression classifier obtained through grid search:

cm_display = ConfusionMatrixDisplay.from_predictions(y_test_0,lr_gs_y_hat_0,

values_format='.5g')

# it return matplotlin plot object

plt.show()

Confusion matrix for test set with logistic regression classifier obtained through cross
validation:

cm_display = ConfusionMatrixDisplay.from_predictions(y_test_0,lr_cv_y_hat_0,
values_format='.5g')
# it return matplotlin plot object
plt.show()

Exercise: Plot ROC curve for all three classifiers and calculate area under ROC curve.

Video #3

Ridge Classifier
Ridge classifier cast the problem as least-square classification and finds the optimal
weight using some matrix decomposition technique such as SVD.

To train the ridge classifier, the labels should be y ∈ {+1, −1} .

The classifier also by default implements L2 regularization. However, we first implement it

without regularization by setting alpha=0
# intialize new variable names with all -1

y_train_0 = -1*np.ones((len(y_train)))

y_test_0 = -1*np.ones((len(y_test)))

# find indices of digit 0 image

indx_0 = np.where(y_train =='0')

# use those indices to modify y_train_0&y_test_0

y_train_0[indx_0] = 1

indx_0 = np.where(y_test == '0')

y_test_0[indx_0] = 1

First take a look into the parameters of the class

RidgeClassifier(alpha=1.0, *, fit_intercept=True,
normalize='deprecated', copy_X=True, max_iter=None, tol=0.001,
class_weight=None, solver='auto', positive=False,
random_state=None)

Note the parameter "normalize" is depreceated.

estimator = RidgeClassifier(normalize=False,alpha=0)

pipe_ridge = make_pipeline(MinMaxScaler(),estimator)

pipe_ridge.fit(x_train,y_train_0)

Performance

y_hat_test_0 = pipe_ridge.predict(x_test)

print(classification_report(y_test_0,y_hat_test_0))

Cross Validation

cv_bin_ridge_clf = cross_validate(
                      pipe_ridge, x_train, y_train_0, cv=5,
                      scoring=['precision', 'recall', 'f1'],
                      return_train_score=True,
                      return_estimator=True)
pprint(cv_bin_ridge_clf)

best_estimator_id = np.argmax(cv_bin_ridge_clf['train_f1']); best_estimator_id

best_estimator = cv_bin_ridge_clf['estimator'][best_estimator_id]

Let's evaluate the performance of the best classifier on the test set:
y_hat_test_0 = best_estimator.predict(x_test)

print(classification_report(y_test_0,y_hat_test_0))

Further exploration
Let's see what these classifiers leant about the digit 0

models = (pipe_sgd, pipe_sgd_l2, pipe_logit, pipe_ridge)

titles = ('sgd', 'regularized sgd', 'logit', 'ridge')

plt.figure(figsize=(4, 4))

plt.subplots(2, 2)

for i in range(0, 4):

w = models[i][1].coef_

w_matrix = w.reshape(28, 28)

w_matrix[w_matrix < 0]=0 #just set the value less than zero to zero

plt.subplot(2, 2, i+1)

plt.imshow(w_matrix, cmap='gray')

plt.title(titles[i])

plt.axis('off')

plt.grid(False)

fig.show()

Video 4

Multiclass Classifier (OneVsAll)

Multiclass Logit with SGD

estimator = SGDClassifier(loss='log',

penalty='l2',

max_iter=1,

warm_start=True,

eta0=0.01,

alpha=0,

learning_rate='constant',

random_state=1729)

pipe_sgd_ovr= make_pipeline(MinMaxScaler(),estimator)

It almost took 5 minutes for training.

Loss=[]

iterations= 100

for i in range(iterations):

pipe_sgd_ovr.fit(x_train,y_train)

y_pred = pipe_sgd_ovr.predict_proba(x_train)

Loss.append(log_loss(y_train,y_pred))

plt.figure()

plt.plot(np.arange(iterations),Loss)

plt.grid(True)

plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.show()

What actually happened behind the screen is that the library automatically created 10 binary
classifiers and trained them!. During the inference time, the input will be passed through all the
10 classifiers and the highest score among the ouputs will be considered as the predicted
class.To see it in action, let us execute the following lines of code

pipe_sgd_ovr[1]

pipe_sgd_ovr[1].coef_.shape

So it is a matrix of size 10 × 784. A row represents the weights of a single binary classifier.

y_hat = pipe_sgd_ovr.predict(x_test); y_hat[:5]

cm_display = ConfusionMatrixDisplay.from_predictions(y_test, y_hat,

values_format='.5g')

plt.show()

print(classification_report(y_test, y_hat))

Multi-class LogisticRegression using solvers

pipe_logit_ovr = make_pipeline(MinMaxScaler(),

LogisticRegression(random_state=1729,

solver='lbfgs',

C=np.infty))

pipe_logit_ovr.fit(x_train,y_train)

y_hat = pipe_logit_ovr.predict(x_test)

cm_display = ConfusionMatrixDisplay.from_predictions(y_test, y_hat,

values_format='.5g')

plt.show()

print(classification_report(y_test, y_hat))

Visualize the weight values

W = pipe_logit_ovr[1].coef_

# normalize

W = MinMaxScaler().fit_transform(W)

fig,ax = plt.subplots(3,3)

index = 1

for i in range(3):

for j in range(3):

ax[i][j].imshow(W[index,:].reshape(28,28),cmap='gray')

ax[i][j].set_title('W{0}'.format(index))

ax[i][j].set_axis_off()

index += 1

Excersise: Multiclass classification with RidgeClassifier.

Colab paid products

-
Cancel contracts here

k Means Presentation
No ratings yet
k Means Presentation
69 pages
Depression Detection Using Python Django and Tensorflow and Machine Learning
No ratings yet
Depression Detection Using Python Django and Tensorflow and Machine Learning
26 pages
Machine
100% (1)
Machine
45 pages
Deep Learning For Financial Applications - A Survey
100% (1)
Deep Learning For Financial Applications - A Survey
29 pages
Minor Project Synopsis - Dog Breed Identification
No ratings yet
Minor Project Synopsis - Dog Breed Identification
43 pages
Data Science Nigeria Machine and Deep Learning Study Guide
No ratings yet
Data Science Nigeria Machine and Deep Learning Study Guide
78 pages
1.4 Classification of Product
No ratings yet
1.4 Classification of Product
11 pages
Unit 6 Application of AI
No ratings yet
Unit 6 Application of AI
91 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Assignment 1-ML
No ratings yet
Assignment 1-ML
4 pages
Logistic Regression With A Neural Network Mindset: 1 - Packages
No ratings yet
Logistic Regression With A Neural Network Mindset: 1 - Packages
23 pages
NLP_Unit2 (2)
No ratings yet
NLP_Unit2 (2)
65 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
43 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 4
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 4
24 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
dd2437 Annda
No ratings yet
dd2437 Annda
45 pages
Planar Data Classification With One Hidden Layer v5
No ratings yet
Planar Data Classification With One Hidden Layer v5
19 pages
Ex 3
No ratings yet
Ex 3
12 pages
AMDL Practicals
No ratings yet
AMDL Practicals
27 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
30 pages
Dl Question Bank
No ratings yet
Dl Question Bank
23 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
04_training_linear_models
No ratings yet
04_training_linear_models
35 pages
FLIGHT DELAY Prediction 4th
No ratings yet
FLIGHT DELAY Prediction 4th
18 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
ML 4 To 9 Keyur
No ratings yet
ML 4 To 9 Keyur
21 pages
CVD Lab Manual
No ratings yet
CVD Lab Manual
33 pages
Ex 3
No ratings yet
Ex 3
12 pages
DL & AI - Lab Manual
No ratings yet
DL & AI - Lab Manual
33 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
KR23 DL Lab Record
No ratings yet
KR23 DL Lab Record
59 pages
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
No ratings yet
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
3 pages
178 hw3
No ratings yet
178 hw3
3 pages
AI by Hand Vol 1
No ratings yet
AI by Hand Vol 1
28 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Index: Name - JINESH PRAJAPAT Class - B. Tech, III Year Branch - AI & DS Sem - V
No ratings yet
Index: Name - JINESH PRAJAPAT Class - B. Tech, III Year Branch - AI & DS Sem - V
35 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
No ratings yet
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
6 pages
Electromyogram Pattern Recognition for c
No ratings yet
Electromyogram Pattern Recognition for c
18 pages
ANN_EXPERIENTIAL_LEARNING
No ratings yet
ANN_EXPERIENTIAL_LEARNING
43 pages
Ex 3
No ratings yet
Ex 3
12 pages
IN5400 - Machine Learning For Image Analysis
No ratings yet
IN5400 - Machine Learning For Image Analysis
6 pages
341-Forest Cover Type Prediction
No ratings yet
341-Forest Cover Type Prediction
5 pages
HW 3
No ratings yet
HW 3
4 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
No ratings yet
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
15 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
Week-7 (SWI)
No ratings yet
Week-7 (SWI)
19 pages
DL Practical 02 Binary Class Classifier Using ANN
No ratings yet
DL Practical 02 Binary Class Classifier Using ANN
5 pages
Assignment 4x
No ratings yet
Assignment 4x
19 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
AIML PRACTICALS
No ratings yet
AIML PRACTICALS
22 pages
M.E MACHINE LEARNING -CP4252 LAB MANUAL4716718074353656238
No ratings yet
M.E MACHINE LEARNING -CP4252 LAB MANUAL4716718074353656238
26 pages
Uncertainty in Big Data Analytics
No ratings yet
Uncertainty in Big Data Analytics
16 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Applications of Explainable Artificial Intelligence in Finance-A Systematic Review of Finance, Information Systems, and Computer Science Literature
No ratings yet
Applications of Explainable Artificial Intelligence in Finance-A Systematic Review of Finance, Information Systems, and Computer Science Literature
41 pages
ML_Lab_01999676272
No ratings yet
ML_Lab_01999676272
12 pages
Import As Import As Import As: "Default - CSV"
No ratings yet
Import As Import As Import As: "Default - CSV"
9 pages
Lab-5 Report
No ratings yet
Lab-5 Report
11 pages
CS335 Lab6
No ratings yet
CS335 Lab6
7 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
Product Helpfulness Detection With Novel Transformer Based BERT Embedding and Class Probability Features
No ratings yet
Product Helpfulness Detection With Novel Transformer Based BERT Embedding and Class Probability Features
13 pages
Classification Review
No ratings yet
Classification Review
8 pages
29_ML Exp_03
No ratings yet
29_ML Exp_03
4 pages
ML Lab 08 Manual - Logisitic Regression (Ver7)
No ratings yet
ML Lab 08 Manual - Logisitic Regression (Ver7)
9 pages
PythonForML2023 Laboratory07 08 Regression Classification Update2
No ratings yet
PythonForML2023 Laboratory07 08 Regression Classification Update2
6 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
logistic-regression
No ratings yet
logistic-regression
4 pages
PyTorch Neural Network Classifcation
No ratings yet
PyTorch Neural Network Classifcation
1 page
Research Paper3
No ratings yet
Research Paper3
9 pages
Fraud Detection Using Machine Learning and Deep Learning: December 2019
No ratings yet
Fraud Detection Using Machine Learning and Deep Learning: December 2019
7 pages
Log Reg Skimed.ipynb - Colab
No ratings yet
Log Reg Skimed.ipynb - Colab
10 pages
Report 4 AI-ML Lab
No ratings yet
Report 4 AI-ML Lab
5 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Research Article: A Hybrid Classification System For Heart Disease Diagnosis Based On The RFRS Method
No ratings yet
Research Article: A Hybrid Classification System For Heart Disease Diagnosis Based On The RFRS Method
11 pages
ClassificationTechniquesinMachineLearningApplicationsandIssues
No ratings yet
ClassificationTechniquesinMachineLearningApplicationsandIssues
8 pages
OMSA6740 Summer2023 Xie Syllabus
No ratings yet
OMSA6740 Summer2023 Xie Syllabus
6 pages
Tell Me About Yourself
No ratings yet
Tell Me About Yourself
8 pages
Ml Assignment 2
No ratings yet
Ml Assignment 2
6 pages
The Optimality of Naive Bayes: Harry Zhang
No ratings yet
The Optimality of Naive Bayes: Harry Zhang
6 pages
MLP Week 6 NaiveBayesImplementation - Ipynb - Colaboratory
No ratings yet
MLP Week 6 NaiveBayesImplementation - Ipynb - Colaboratory
5 pages
POD23S2C21890053
No ratings yet
POD23S2C21890053
2 pages
Improving Classification of J48 Algorithm Using Bagging, Boosting and Blending Ensemble Methods On SONAR Dataset Using WEKA
No ratings yet
Improving Classification of J48 Algorithm Using Bagging, Boosting and Blending Ensemble Methods On SONAR Dataset Using WEKA
3 pages
Quora Answer Classifier (Redux)
No ratings yet
Quora Answer Classifier (Redux)
2 pages
RAJA
No ratings yet
RAJA
1 page
Geospatial Analysis of Land Use and Land Cover Dynamics in Akure, Nigeria
No ratings yet
Geospatial Analysis of Land Use and Land Cover Dynamics in Akure, Nigeria
15 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet