Skip to content

ROC functions break down with low scores variance. #6688

Closed
@msoelch

Description

@msoelch

Description

The ROC functionality reduces its workload by first extracting forward differences between scores and then discarding potential thresholds that are too close together by calling utils.isclose.fixes. The problem is that this happens without notification of the user (or any chance to intervene, setting drop_intermediate argument to False does not change anything) and, moreover, without scaling the tolerance w.r.t. to the variance of the score vector.

Steps/Code to Reproduce

This is the (slightly simplified) example given in the documentation of metrics.roc_curve

import numpy as np
from sklearn import metrics
y = np.array([False, False, True, True])
original_scores = np.array([0.1, 0.4, 0.35, 0.8])

scores = original_scores
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
print fpr, tpr, thresholds, metrics.roc_auc_score(y, scores)

scores = 1e-6*original_scores
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
print fpr, tpr, thresholds, metrics.roc_auc_score(y, scores)

scores = 1e-7*original_scores
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
print fpr, tpr, thresholds, metrics.roc_auc_score(y, scores)

scores = 1e-8*original_scores
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
print fpr, tpr, thresholds, metrics.roc_auc_score(y, scores)

Expected Results

All printed lines should yield the same output.

Versions

Windows-10-10.0.10586
('Python', '2.7.11 |Anaconda 2.5.0 (64-bit)| (default, Jan 29 2016, 14:26:21) [MSC v.1500 64 bit (AMD64)]')
('NumPy', '1.10.4')
('SciPy', '0.17.0')
('Scikit-Learn', '0.17')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions