Skip to content

Multilabel ranking coverage error formula #7698

@emir-munoz

Description

@emir-munoz

Description

The formula for sklearn.metrics.coverage_error in multilabel ranking presented in the documentation does not match with the one in the referenced paper "Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook (pp. 667-685). Springer US."

The original formula has a -1 at the end which is not considered in the current implementation.

Steps/Code to Reproduce

The example given in the sklearn documentation is the following:

import numpy as np
from sklearn.metrics import coverage_error
y_true = np.array([[1, 0, 0], [0, 0, 1]])
y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
coverage_error(y_true, y_score)

which returns 2.5.

Expected Results

I think that the expected value should be 1.5.

Actual Results

The current sklearn implementation returns 2.5.

Versions

Windows-7-6.1.7601-SP1
Python 3.5.2 |Anaconda 4.0.0 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.2
SciPy 0.18.0
Scikit-Learn 0.18

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions