[MRG] ROCCH calibration method #4822

albahnsen · 2015-06-05T09:34:41Z

Implementation of the ROCCH calibration method, as described in http://www.jmlr.org/papers/volume13/hernandez-orallo12a/hernandez-orallo12a.pdf and http://link.springer.com/article/10.1007/s10994-007-5011-0

It is implemented as an other method within the CalibratedClassifierCV class. Moreover, the documentation is expanded to include the results.

The method performs similarly to Isotonic calibration when measured by Brier loss, F1 Score or AUC. But with an speedup of up to 1.3X

Here an example

In[2]: %paste

import pandas as pd
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
from sklearn.cross_validation import train_test_split
from sklearn.metrics import brier_score_loss, f1_score, roc_auc_score

X, y = datasets.make_classification(n_samples=100000, n_features=20,
                                    n_informative=2, n_redundant=10,
                                    random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.99,
                                                    random_state=42)

def test_times(method, est):
    clf =  CalibratedClassifierCV(est, cv=5, method=method)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    y_prob = clf.predict_proba(X_test)[:,1]
    brier = brier_score_loss(y_test, y_prob, pos_label=y.max())
    f1 = f1_score(y_test, y_pred)
    auc = roc_auc_score(y_test, y_prob)
    return brier, f1, auc

res = pd.DataFrame(columns=['brier_loss','f1_score','auc_score'])
## -- End pasted text --
Backend TkAgg is interactive backend. Turning interactive mode on.
In[3]: %timeit res.loc['GNB+isotonic'] = test_times(method='isotonic', est=GaussianNB())
1 loops, best of 3: 823 ms per loop
In[4]: %timeit res.loc['GNB+sigmoid'] = test_times(method='sigmoid', est=GaussianNB())
1 loops, best of 3: 776 ms per loop
In[5]: %timeit res.loc['GNB+rocch'] = test_times(method='rocch', est=GaussianNB())
1 loops, best of 3: 752 ms per loop
In[6]: %timeit res.loc['LSVC+isotonic'] = test_times(method='isotonic', est=LinearSVC())
1 loops, best of 3: 416 ms per loop
In[7]: %timeit res.loc['LSVC+sigmoid'] = test_times(method='sigmoid', est=LinearSVC())
1 loops, best of 3: 367 ms per loop
In[8]: %timeit res.loc['LSVC+rocch'] = test_times(method='rocch', est=LinearSVC())
1 loops, best of 3: 324 ms per loop
In[9]: res['time'] = [823, 776, 752, 416, 367, 324]
In[10]: print res

brier_loss	f1_score	auc_score	time
GNB+isotonic	0.098487	0.854058	0.939305
GNB+sigmoid	0.108937	0.866259	0.937025
GNB+rocch	0.098752	0.851963	0.939027
LSVC+isotonic	0.099401	0.864936	0.936852
LSVC+sigmoid	0.098603	0.862041	0.937657
LSVC+rocch	0.099584	0.864046	0.936733

…e diag

agramfort · 2015-06-05T09:37:38Z

what is the practical benefit of this method? speed improvement is marginal. Can you expect much better performance on some setups. If so when?

thanks

amueller · 2015-06-05T18:01:18Z

This is a relatively common method, right?
We discussed finding different threshold here: #4813.
That would target more the predict not the predict_proba, though.

albahnsen · 2015-06-08T09:24:04Z

@agramfort the ROCCH calibration is design to maximize the AUC (At least during training) therefore it is a better method when the estimated probabilities will be used afterwards and not just the prediction, i.e. in cost-sensitive classification.
Both the ROCCH and Isotonic methods fix the monotonicity of the estimated probabilities, moreover is this paper (http://link.springer.com/article/10.1007/s10994-007-5011-0) it is discussed the practical relation between them.
The speedup is gained because the ROCCH does not fit an additional model as the Isotonic method does. However, the ROCCH method does need to find a convex hull of the ROC curve, which, on special cases, may be very time consuming.

albahnsen · 2015-06-08T09:26:44Z

@amueller I do find some relation between this and #4813 as the calibration of probabilities is changing the threshold of the classifier indirectly. Nevertheless, I dont think this is what @rahuldave had in mind when he open that issue.

albahnsen · 2015-06-08T09:28:34Z

Regarding the Travis CI build. This code needs SciPy >=0.12, while the Travis test is being executed with Scipy 0.9. Initially, I had my own convex hull function, but it is slower than the one SciPy has.

amueller · 2015-06-08T17:08:30Z

We need to support scipy 0.9. Usually the way around this is to backport the function to scikit-learn. I'm not sure you should do the work before we agree on inclusion, though.

amueller · 2015-06-08T17:10:25Z

What do you mean by not fitting an additional model? You mean the isotonic model?
It maximizes the ROC on the hold-out validation set, right, not on the training set?

amueller · 2015-06-08T17:15:30Z

I'm not sure I have time to look into the papers right now. Do they use a hold-out method for the threshold or do they use the training set?
The R package caret does something similar (by default I think), and I was wondering if they use training or validation, but I didn't get a reply from them yet.

albahnsen · 2015-06-08T19:14:35Z

The support for scipy 0.9 its ok then. I will wait for the moment.

the ROC is maximized on the hold-out set, sorry my confusion. On the papers the hold-out method is also the one used.

I'm not familiar enough with the caret package.

amueller · 2015-06-08T19:34:59Z

I'm +1 on adding this as I feel it is a very general method.

agramfort · 2015-06-10T13:41:06Z

fair enough. No time to review though.

albahnsen · 2015-06-20T14:48:30Z

@amueller I did the backport to scipy ConvexHull function
It is not the same code as the one in scipy rely on the Qhull libary which implies several c files that I dont need for this particular function. Instead, when scipy.__version__ <0.12 I use the convexhull code from the Python Cookbook 2005.

amueller · 2015-06-22T16:18:24Z

Thanks, that seems like the way to go.

albahnsen · 2016-03-02T14:56:19Z

Guys, this PR has been open for a while. Not sure which are the next steps

amueller · 2016-10-10T20:27:55Z

Hm because the PR was name [WIP] I guess it didn't get any reviews. Also maybe because it's hard to get reviewer's attention. Can you maybe rebase? Thanks!

amueller · 2017-10-24T21:48:56Z

wow still no reviews after two years :-/ sorry....

lucyleeow · 2024-02-23T04:36:23Z

Is there still interest in including this? cc @amueller ?

adrinjalali · 2024-04-17T11:47:09Z

Unfortunately there hasn't been much support from the maintainer community here. So I think we can close until interest comes up again.

albahnsen added 3 commits June 4, 2015 23:25

Add calibration by ROC Convex Hull method

669f794

Issue on the convex hull when roc curve points are on the right of th…

0a15f0d

…e diag

DOC add documentation rocch

4f6bea9

albahnsen added 2 commits June 20, 2015 16:28

backport scipy ConvexHull class

2184857

backport scipy ConvexHull class, fix ref name

674aa95

amueller added the Waiting for Reviewer label Oct 10, 2016

amueller changed the title ~~[WIP] ROCCH calibration method~~ [MRG] ROCCH calibration method Oct 10, 2016

PGryllos mentioned this pull request Oct 23, 2017

Add wrapper class that changes threshold value for predict #8614

Closed

github-actions bot added the module:utils label Mar 2, 2020

cmarmo removed the Waiting for Reviewer label Dec 14, 2020

Base automatically changed from master to main January 22, 2021 10:48

adrinjalali closed this Apr 17, 2024

ogrisel mentioned this pull request Jun 14, 2024

[BUG] roc_auc_score is wrong (edge case) #29252

Closed

Uh oh!

[MRG] ROCCH calibration method #4822

[MRG] ROCCH calibration method #4822

Uh oh!

Conversation

albahnsen commented Jun 5, 2015 • edited by ogrisel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort commented Jun 5, 2015

Uh oh!

amueller commented Jun 5, 2015

Uh oh!

albahnsen commented Jun 8, 2015

Uh oh!

albahnsen commented Jun 8, 2015

Uh oh!

albahnsen commented Jun 8, 2015

Uh oh!

amueller commented Jun 8, 2015

Uh oh!

amueller commented Jun 8, 2015

Uh oh!

amueller commented Jun 8, 2015

Uh oh!

albahnsen commented Jun 8, 2015

Uh oh!

amueller commented Jun 8, 2015

Uh oh!

agramfort commented Jun 10, 2015 via email

Uh oh!

albahnsen commented Jun 20, 2015

Uh oh!

amueller commented Jun 22, 2015

Uh oh!

albahnsen commented Mar 2, 2016

Uh oh!

amueller commented Oct 10, 2016

Uh oh!

amueller commented Oct 24, 2017

Uh oh!

lucyleeow commented Feb 23, 2024

Uh oh!

adrinjalali commented Apr 17, 2024

Uh oh!

Uh oh!

albahnsen commented Jun 5, 2015 •

edited by ogrisel

Loading