Skip to content

Overflow in matthews_corrcoef on a 64-bit mac #9622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sam-s opened this issue Aug 24, 2017 · 6 comments · Fixed by #9693
Closed

Overflow in matthews_corrcoef on a 64-bit mac #9622

sam-s opened this issue Aug 24, 2017 · 6 comments · Fixed by #9693
Labels
Milestone

Comments

@sam-s
Copy link
Contributor

sam-s commented Aug 24, 2017

Description

I see

sklearn/metrics/classification.py:538: RuntimeWarning: overflow encountered in long_scalars

and

sklearn/metrics/classification.py:538: RuntimeWarning: invalid value encountered in sqrt

in sklearn.metrics.matthews_corrcoef for large input vectors.

Steps/Code to Reproduce

Use functions from #2806 and then:

def mcc_test(n_points):
    y_true, y_pred = random_ys(n_points)
    mcc_safe = matthews_corrcoef(y_true, y_pred)
    mcc_unsafe = sklearn.metrics.matthews_corrcoef(y_true, y_pred)
    try:
        assert(abs(mcc_safe - mcc_unsafe) < 1e-8)
    except AssertionError:
        print('Error: mcc_safe=%s, mcc_unsafe=%s, n_points=%s' % (
            mcc_safe, mcc_unsafe, n_points))

>>> mcc_test(100)
>>> mcc_test(1000)
>>> mcc_test(10000)
>>> mcc_test(100000)
/Users/sds/.virtualenvs/algorisk/lib/python2.7/site-packages/sklearn/metrics/classification.py:538: RuntimeWarning: overflow encountered in long_scalars
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
Error: mcc_safe=0.898760987225, mcc_unsafe=1.75544683417, n_points=100000

Expected Results

nothing printed, like above for mcc_test(1000).

Actual Results

see messages above.

Versions

Darwin-16.7.0-x86_64-i386-64bit
('Python', '2.7.13 (default, Jul 18 2017, 09:17:00) \n[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('Scikit-Learn', '0.19.0')
@jnothman
Copy link
Member

Ugh. I'd be fine to adopt the alternative implementation there...

@jnothman jnothman added the Bug label Aug 24, 2017
@sam-s
Copy link
Contributor Author

sam-s commented Aug 24, 2017

what's stopping you?

@jnothman
Copy link
Member

Now that I look: support for multiclass is a pretty big reason not to adopt it.

@lesteve
Copy link
Member

lesteve commented Aug 28, 2017

This seems a Python 2.7 only issue (at least on my Ubuntu box). @sam-s can you confirm you do not see the problem with Python 3 on OSX?

Posting a reproducible snippet for convenience:

import sklearn.metrics
import numpy

def matthews_corrcoef(y_true, y_predicted):
    conf_matrix = sklearn.metrics.confusion_matrix(y_true, y_predicted)
    true_pos = conf_matrix[1,1]
    false_pos = conf_matrix[1,0]
    false_neg = conf_matrix[0,1]
    n_points = conf_matrix.sum()*1.0
    pos_rate = (true_pos + false_neg) / n_points
    activity = (true_pos + false_pos) / n_points
    mcc_numerator = true_pos / n_points - pos_rate * activity
    mcc_denominator = activity * pos_rate * (1 - activity) * (1 - pos_rate)
    return mcc_numerator / numpy.sqrt(mcc_denominator)

def random_ys(n_points):
    x_true = numpy.random.sample(n_points)
    x_pred = x_true + 0.2 * (numpy.random.sample(n_points) - 0.5)
    y_true = (x_true > 0.5) * 1.0
    y_pred = (x_pred > 0.5) * 1.0
    return y_true, y_pred

for n_points in [10, 100, 1000, 1000000]:
    y_true, y_pred = random_ys(n_points)
    mcc_safe = matthews_corrcoef(y_true, y_pred)
    mcc_unsafe = sklearn.metrics.matthews_corrcoef(y_true, y_pred)
    try:
        assert(abs(mcc_safe - mcc_unsafe) < 1e-8)
    except AssertionError:
        print('Error: mcc_safe=%s, mcc_unsafe=%s, n_points=%s' % (
            mcc_safe, mcc_unsafe, n_points))

@sam-s
Copy link
Contributor Author

sam-s commented Sep 5, 2017

Same error with python3:

/usr/local/lib/python3.6/site-packages/sklearn/metrics/classification.py:538: RuntimeWarning: overflow encountered in long_scalars
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
/usr/local/lib/python3.6/site-packages/sklearn/metrics/classification.py:538: RuntimeWarning: invalid value encountered in sqrt
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
Error: mcc_safe=0.900613991045, mcc_unsafe=0.0, n_points=1000000

Sorry about the delay.

@lesteve
Copy link
Member

lesteve commented Sep 6, 2017

Actually this is a regression in 0.19. I can reproduce the problem.

As it happened myPython 2 conda environment had 0.18.2 but my Python 3 had 0.19.

@lesteve lesteve added this to the 0.19.1 milestone Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants