Skip to content

Matthews correlation coefficient metric throws misleading division by zero RuntimeWarning #16924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simberaj opened this issue Apr 14, 2020 · 7 comments · Fixed by #19977
Closed
Labels
Bug Easy Well-defined and straightforward way to resolve module:metrics

Comments

@simberaj
Copy link

Description

With tested values all equal, sklearn.metrics.matthews_corrcoef throws a RuntimeWarning reporting a division by zero. This behavior was already reported in #1937 and reported fixed, but reappears in recent versions.

Steps/Code to Reproduce

The snippet below reproduces the warning.

import sklearn.metrics                         
trues = [1,0,1,1,0]                            
preds = [0,0,0,0,0]                            
sklearn.metrics.matthews_corrcoef(trues, preds)

Expected Results

No warning is thrown.

Actual Results

The following warning is thrown:

C:\anaconda\envs\sklearn-test\lib\site-packages\sklearn\metrics\_classification.py:900: RuntimeWarning: invalid value encountered in double_scalars
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)

Versions

System:
    python: 3.8.2 (default, Mar 25 2020, 08:56:29) [MSC v.1916 64 bit (AMD64)]
executable: C:\anaconda\envs\sklearn-test\python.exe
   machine: Windows-10-10.0.18362-SP0

Python dependencies:
       pip: 20.0.2
setuptools: 46.1.3.post20200330
   sklearn: 0.22.1
     numpy: 1.18.1
     scipy: 1.4.1
    Cython: None
    pandas: None
matplotlib: None
    joblib: 0.14.1
@Connerrrrr
Copy link
Contributor

Connerrrrr commented Feb 26, 2021

Can I take this one?

Not sure check if only one column or row of confusion matrix has nonzero values will work though.

According to Jurman, Riccadonna, Furlanello, (2012). A Comparison of MCC and CEN Error Measures in MultiClass Prediction

MCC is equal to 0 when C is all zeros but for one column (all samples have been classified to be of a class k)

and Wikipedia of Matthews correlation coefficient,

If any of the four sums in the denominator is zero, the denominator can be arbitrarily set to one

the fix planned above looks reasonable to me.

@brunofacca
Copy link

This is still happening.

@jnothman jnothman added Easy Well-defined and straightforward way to resolve help wanted labels Apr 24, 2021
@jnothman
Copy link
Member

A pull request is welcome. @Connerrrrr are you still interested?

@Connerrrrr
Copy link
Contributor

Sure, working on it.

@Connerrrrr
Copy link
Contributor

@jnothman PR was created.

@scienception
Copy link

Was this solved in 2022? I'm still getting this error RuntimeWarning: invalid value encountered in double_scalars
mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)

@Connerrrrr
Copy link
Contributor

Was this solved in 2022? I'm still getting this error RuntimeWarning: invalid value encountered in double_scalars mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)

@scienception Any guidance of reproduce the error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Easy Well-defined and straightforward way to resolve module:metrics
Projects
None yet
6 participants