Skip to content

MAINT replace obj and grad in sigmoid calibration with private loss module #27185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 6, 2023

Conversation

OmarManzoor
Copy link
Contributor

Reference Issues/PRs

Follow up of #26913

What does this implement/fix? Explain your changes.

  • Replaces the objective and gradient in sigmoid calibration with the methods of the private loss module.
  • Uses HalfBinomialLoss as the base loss.

Any other comments?

@github-actions
Copy link

github-actions bot commented Aug 28, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ab7e6a0. Link to the linter CI: here

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented Aug 28, 2023

Hi @lorentzenchr

I tried replacing with the loss_gradient method of the HalfBinomialLoss. But some of the tests fail including the very first one
test_calibration. It seems that the problem lies in how I am computing the loss and gradient using the loss module. Because if I replace the loss_grad function with the one that I have commented out, which basically uses the original computation, it works fine and the tests pass. Could you kindly have a look and provide some guidance? Maybe I am not using the loss module correctly. Just for reference here is the log for the test_calibration function with method=sigmoid and ensemble=False

FAILED              [ 75%]RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            2     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  3.26474D+01    |proj g|=  1.74728D+00

At iterate    1    f=  3.08460D+01    |proj g|=  2.65248D+00

At iterate    2    f=  2.57930D+01    |proj g|=  1.17055D+00

At iterate    3    f=  2.54497D+01    |proj g|=  1.27662D+00

At iterate    4    f=  2.50560D+01    |proj g|=  3.80842D-01

At iterate    5    f=  2.49982D+01    |proj g|=  5.93913D-02

At iterate    6    f=  2.49966D+01    |proj g|=  4.17555D-03

At iterate    7    f=  2.49966D+01    |proj g|=  6.16637D-05

At iterate    8    f=  2.49966D+01    |proj g|=  1.31444D-06

At iterate    9    f=  2.49966D+01    |proj g|=  1.47570D-07

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    2      9     11      1     0     0   1.476D-07   2.500D+01
  F =   24.996567512257208     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
 This problem is unconstrained.

sklearn/tests/test_calibration.py:60 (test_calibration[False-sigmoid])
0.1691915026787319 != 0.4701915941927794

Expected :0.4701915941927794
Actual   :0.1691915026787319
<Click to see difference>

data = (array([[1.55552559, 2.69252392, 3.45217209, 3.20990507, 3.54706644,
        4.39080293],
       [1.46335164, 2.521954..., 1, 1, 0, 0, 1, 1, 0, 1, 1, 0,
       0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
       1, 0]))
method = 'sigmoid', ensemble = False

    @pytest.mark.parametrize("method", ["sigmoid", "isotonic"])
    @pytest.mark.parametrize("ensemble", [True, False])
    def test_calibration(data, method, ensemble):
        # Test calibration objects with isotonic and sigmoid
        n_samples = N_SAMPLES // 2
        X, y = data
        sample_weight = np.random.RandomState(seed=42).uniform(size=y.size)
    
        X -= X.min()  # MultinomialNB only allows positive X
    
        # split train and test
        X_train, y_train, sw_train = X[:n_samples], y[:n_samples], sample_weight[:n_samples]
        X_test, y_test = X[n_samples:], y[n_samples:]
    
        # Naive-Bayes
        clf = MultinomialNB(force_alpha=True).fit(X_train, y_train, sample_weight=sw_train)
        prob_pos_clf = clf.predict_proba(X_test)[:, 1]
    
        cal_clf = CalibratedClassifierCV(clf, cv=y.size + 1, ensemble=ensemble)
        with pytest.raises(ValueError):
            cal_clf.fit(X, y)
    
        # Naive Bayes with calibration
        for this_X_train, this_X_test in [
            (X_train, X_test),
            (sparse.csr_matrix(X_train), sparse.csr_matrix(X_test)),
        ]:
            cal_clf = CalibratedClassifierCV(clf, method=method, cv=5, ensemble=ensemble)
            # Note that this fit overwrites the fit on the entire training
            # set
            cal_clf.fit(this_X_train, y_train, sample_weight=sw_train)
            prob_pos_cal_clf = cal_clf.predict_proba(this_X_test)[:, 1]
    
            # Check that brier score has improved after calibration
>           assert brier_score_loss(y_test, prob_pos_clf) > brier_score_loss(
                y_test, prob_pos_cal_clf
            )
E           assert 0.1691915026787319 > 0.4701915941927794
E            +  where 0.1691915026787319 = brier_score_loss(array([1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1,\n       0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1,\n       1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0,\n       1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0,\n       1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0]), array([0.59754135, 0.39173286, 0.57057784, 0.63618551, 0.5053372 ,\n       0.43009297, 0.3746049 , 0.42128169, 0.69084662, 0.47480948,\n       0.4898935 , 0.64080595, 0.37208411, 0.73741164, 0.1665211 ,\n       0.70891458, 0.38883098, 0.55419618, 0.84667265, 0.42419911,\n       0.49903646, 0.57665258, 0.47541526, 0.5009973 , 0.40126043,\n       0.54118915, 0.42945365, 0.47340528, 0.25394641, 0.69248401,\n       0.37824462, 0.62401919, 0.26457841, 0.79458909, 0.27360969,\n       0.48189339, 0.17265969, 0.24574674, 0.41905575, 0.52456954,\n       0.59142695, 0.31075045, 0.24010821, 0.64236825, 0.52948367,\n       0.67649459, 0.78925604, 0.60921218, 0.68232101, 0.8009755 ,\n       0.5812186 , 0.38947681, 0.2810683 , 0.71968707, 0.41020851,\n       0.5230645 , 0.65561618, 0.69341376, 0.55990113, 0.25885757,\n       0.64038658, 0.53987922, 0.35631448, 0.3109083 , 0.71152262,\n       0.26966987, 0.80812273, 0.46855832, 0.24488214, 0.45165938,\n       0.66380173, 0.68573539, 0.3636395 , 0.57207788, 0.70043097,\n       0.25805855, 0.35951421, 0.64611014, 0.49312859, 0.6162724 ,\n       0.5742364 , 0.67348769, 0.1540878 , 0.30242571, 0.55782435,\n       0.41457239, 0.60817683, 0.36436228, 0.61544405, 0.2510021 ,\n       0.76149752, 0.65839788, 0.48159576, 0.56249346, 0.74645415,\n       0.64149181, 0.2531176 , 0.42340659, 0.73080826, 0.57605868]))
E            +  and   0.4701915941927794 = brier_score_loss(array([1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1,\n       0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1,\n       1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0,\n       1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0,\n       1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0]), array([0.3653254 , 0.71508461, 0.41110395, 0.30389509, 0.52682038,\n       0.65605144, 0.73938161, 0.67013389, 0.22795215, 0.58073866,\n       0.55425609, 0.29694746, 0.74284203, 0.17464224, 0.92632551,\n       0.2060037 , 0.7192958 , 0.43974551, 0.0882795 , 0.66550337,\n       0.53804273, 0.40062357, 0.57968298, 0.53455383, 0.70099493,\n       0.46278611, 0.65708288, 0.58318289, 0.87057502, 0.22589694,\n       0.73433227, 0.32261735, 0.86175958, 0.12323191, 0.85388024,\n       0.5683499 , 0.92327137, 0.87704286, 0.67364487, 0.4924455 ,\n       0.37552735, 0.81751885, 0.88132764, 0.29461912, 0.48366076,\n       0.24652658, 0.12741426, 0.34619042, 0.23886518, 0.11837926,\n       0.39280497, 0.71836187, 0.84709538, 0.19368202, 0.68740555,\n       0.49513718, 0.2753109 , 0.2247358 , 0.42971523, 0.86656397,\n       0.29757425, 0.465117  , 0.76379918, 0.81735031, 0.20296827,\n       0.85736236, 0.11314554, 0.59158768, 0.87770841, 0.62044615,\n       0.26378103, 0.23445207, 0.75421363, 0.40850811, 0.21610924,\n       0.86722364, 0.75964402, 0.28908601, 0.54853062, 0.33484727,\n       0.40478175, 0.25054458, 0.93217108, 0.82623702, 0.43336033,\n       0.68065745, 0.34786902, 0.75325372, 0.33616859, 0.87293016,\n       0.1511758 , 0.27135783, 0.56887225, 0.42517594, 0.1655119 ,\n       0.295924  , 0.87124171, 0.66676447, 0.18155745, 0.40164436]))

@OmarManzoor OmarManzoor changed the title MAINT obj and grad in sigmoid calibration with private loss module MAINT replace obj and grad in sigmoid calibration with private loss module Aug 28, 2023
@OmarManzoor OmarManzoor marked this pull request as ready for review August 29, 2023 06:20
@OmarManzoor
Copy link
Contributor Author

The CI failure is related to missing docstrings for the parameters in HalfBinomialLoss.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know why the functional tests pass neither why the docstring test fails.

return np.array([dA, dB])
def loss_grad(AB):
P = bin_loss.link.inverse(-(AB[0] * F + AB[1]))
raw_prediction = bin_loss.link.link(P)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you had it right before: Just pass raw_predictions=-(AB[0] * F + AB[1]))
The minus is due to convention in Pratt‘s paper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @lorentzenchr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried why the tests still passed with wrong formulas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we take the expit and then the logit we get the same value back. For example

from scipy.special import expit, logit
A = -1.5
P = expit(A)
Q = logit(P)
>>Q
-1.5

A = 0.51
P = expit(A)
Q = logit(P)
>>Q
0.51

@OmarManzoor
Copy link
Contributor Author

For the docstring error I debugged it a bit. It seems that the way in which we pick up imported modules causes HalfBinomialLoss to be picked up since it is imported now in calibration.py and the file calibration.py (module sklearn.calibration) is public. Previously it seems that HalfBinomialLoss was not being picked up from anywhere else.
In my opinion there seem to be a few options to fix this error:

  • We can add the required param docs to the respective methods in HalfBinomialLoss
  • We can add HalfBinomialLoss in the list of _DOCSTRING_IGNORES
  • We can also import as from ._loss import HalfBinomialLoss as _HalfBinomialLoss

What do you suggest @lorentzenchr?

@lorentzenchr
Copy link
Member

For the time being, I'm fine with adding it to _DOCSTRING_IGNORES.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there…

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@OmarManzoor
Copy link
Contributor Author

@ogrisel Could you also kindly review this PR since you have the context regarding it?

@ogrisel
Copy link
Member

ogrisel commented Aug 31, 2023

I think we should add a changelog entry. The user facing behavior should mostly be the same but it can be helpful to document that the internals have been refactored in case it causes unexpected changes to some users.

Could you also run a quick benchmark on some data of your choice to check that there is no perf regression in case the changes in the optimizer settings have a more significant impact than anticipated.

@OmarManzoor
Copy link
Contributor Author

I think we should add a changelog entry. The user facing behavior should mostly be the same but it can be helpful to document that the internals have been refactored in case it causes unexpected changes to some users.

What should be the tag for this entry?

@OmarManzoor
Copy link
Contributor Author

On main

Iteration 0 fit time: 0.04736s
Iteration 0 predict time: 0.00310s
-----------------------
Iteration 1 fit time: 0.04645s
Iteration 1 predict time: 0.00295s
-----------------------
Iteration 2 fit time: 0.04595s
Iteration 2 predict time: 0.00295s
-----------------------
Iteration 3 fit time: 0.04613s
Iteration 3 predict time: 0.00294s
-----------------------
Iteration 4 fit time: 0.04587s
Iteration 4 predict time: 0.00294s
-----------------------
Iteration 5 fit time: 0.04691s
Iteration 5 predict time: 0.00316s
-----------------------
Iteration 6 fit time: 0.04640s
Iteration 6 predict time: 0.00289s
-----------------------
Iteration 7 fit time: 0.04608s
Iteration 7 predict time: 0.00290s
-----------------------
Iteration 8 fit time: 0.04639s
Iteration 8 predict time: 0.00296s
-----------------------
Iteration 9 fit time: 0.04655s
Iteration 9 predict time: 0.00298s
-----------------------

On this branch

Iteration 0 fit time: 0.03443s
Iteration 0 predict time: 0.00310s
-----------------------
Iteration 1 fit time: 0.03200s
Iteration 1 predict time: 0.00298s
-----------------------
Iteration 2 fit time: 0.03119s
Iteration 2 predict time: 0.00294s
-----------------------
Iteration 3 fit time: 0.03169s
Iteration 3 predict time: 0.00293s
-----------------------
Iteration 4 fit time: 0.03160s
Iteration 4 predict time: 0.00290s
-----------------------
Iteration 5 fit time: 0.03135s
Iteration 5 predict time: 0.00288s
-----------------------
Iteration 6 fit time: 0.03186s
Iteration 6 predict time: 0.00309s
-----------------------
Iteration 7 fit time: 0.03255s
Iteration 7 predict time: 0.00320s
-----------------------
Iteration 8 fit time: 0.03190s
Iteration 8 predict time: 0.00304s
-----------------------
Iteration 9 fit time: 0.03170s
Iteration 9 predict time: 0.00298s
-----------------------

Script details - I basically used code from plot_calibration.py

import numpy as np
from time import time

from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

from sklearn.calibration import CalibratedClassifierCV
from sklearn.naive_bayes import GaussianNB


n_samples = 80000
n_bins = 3  # use 3 bins for calibration_curve as we have 3 clusters here

# Generate 3 blobs with 2 classes where the second blob contains
# half positive samples and half negative samples. Probability in this
# blob is therefore 0.5.
centers = [(-5, -5), (0, 0), (5, 5)]
X, y = make_blobs(n_samples=n_samples, centers=centers, shuffle=False, random_state=42)

y[: n_samples // 2] = 0
y[n_samples // 2 :] = 1
sample_weight = np.random.RandomState(42).rand(y.shape[0])

# split train, test for calibration
X_train, X_test, y_train, y_test, sw_train, sw_test = train_test_split(
    X, y, sample_weight, test_size=0.25, random_state=42
)

for i in range(10):
    clf = GaussianNB()
    clf.fit(X_train, y_train)  # GaussianNB itself does not support sample-weights
    prob_pos_clf = clf.predict_proba(X_test)[:, 1]

    # With sigmoid calibration
    clf_sigmoid = CalibratedClassifierCV(clf, cv=2, method="sigmoid")

    tstart = time()
    clf_sigmoid.fit(X_train, y_train, sample_weight=sw_train)
    fit_time = time() - tstart

    tstart = time()
    prob_pos_sigmoid = clf_sigmoid.predict_proba(X_test)[:, 1]
    predict_time = time() - tstart

    print(f"Iteration {i} fit time: {fit_time:0.5f}s")
    print(f"Iteration {i} predict time: {predict_time:0.5f}s")
    print("-----------------------")

@OmarManzoor
Copy link
Contributor Author

@ogrisel Could you kindly have a look again?

@ogrisel ogrisel merged commit 1abc9ad into scikit-learn:main Sep 6, 2023
@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2023

Thanks Omar!

@OmarManzoor OmarManzoor deleted the sigmoid_calibration_loss branch September 6, 2023 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants