Ensure estimators converged in test_bayesian_mixture_fit_predict #12266

oleksandr-pavlyk · 2018-10-03T19:48:39Z

In the test test_bayesian_mixture_fit_predict the BayesianGaussianMixture estimators used to build prediction did not converge (as indicated by message).

In such circumstances, fit_predict(X) bases its prediction on the data from e_step on the last iteration (see base.py#L244 and use: base.py#L274).

fit(X).predict(X) uses data after the m_step of the last iterations.

Hence for the test to reasonably expect the same predictions, estimations must converge.

The test failure was caught in Intel Distribution for Python, because due to changes to KMeans, the initialization of BayesianGaussianMixture was different.

It should be possible to reproduce this failure in current master by playing with random_state.

What does this implement/fix? Explain your changes.

The change increases max_iter keyword value and adds assertions for both estimators to have converged.

Any other comments?

Here is the reproducer in current 0.20.0 installed from pip:

import numpy as np
import copy

from sklearn.mixture import BayesianGaussianMixture
from sklearn.utils.testing import assert_array_equal
from sklearn.mixture.tests.test_gaussian_mixture import RandomData

def test_bayesian_mixture_fit_predict(seed):
    rng = np.random.RandomState(seed)
    rand_data = RandomData(rng, scale=7)
    n_components = 2 * rand_data.n_components
    for covar_type in ['full', 'tied', 'diag', 'spherical']:
        bgmm1 = BayesianGaussianMixture(n_components=n_components,
                                        max_iter=100, random_state=rng,
                                        tol=1e-3, reg_covar=0)
        bgmm1.covariance_type = covar_type
        bgmm2 = copy.deepcopy(bgmm1)
        X = rand_data.X[covar_type]
        Y_pred1 = bgmm1.fit(X).predict(X)
        Y_pred2 = bgmm2.fit_predict(X)
        assert_array_equal(Y_pred1, Y_pred2)

# passes, in the test suite, although produces many convergence warnings
test_bayesian_mixture_fit_predict(0) 

# fails with (mismatch 0.2%)
test_bayesian_mixture_fit_predict(1)

@GaelVaroquaux

… prediction that fit_predict(X) is the model fitting has not reached convergence

jeremiedbb · 2018-10-03T20:07:15Z

I'm indeed experiencing this failure due to changes in KMeans in #11950.

amueller · 2018-10-03T20:34:24Z

test failure

oleksandr-pavlyk · 2018-10-03T21:05:07Z

@amueller Test failed because EM did not converge even in 250 iterations on that machine.

There are two options, either increase max_iter value, or allow for small discrepancy in predictions if fit did not reach convergence.

Increasing max_iter makes the test run a little longer.

Is there a metric to check percentage of agreement between predictions?

oleksandr-pavlyk · 2018-10-08T19:59:31Z

@amueller I have added logic that predictions must be equal if both converged, otherwise accuracy score of over .95 is required (somewhat arbitrary). Any thoughts on such an approach? Thanks,

ogrisel · 2018-10-24T11:33:59Z

@jeremiedbb told me IRL that this test is unstable even on master: changing the seed make it fail very often.

I think the cause of the problem is that in fit_predict we do the and m_step after the e_step only if we have not converged.

I think this loop could be rewritten to ensure that we always do an e_step at last in any case (even when we do not converge).

ogrisel · 2018-10-24T11:38:30Z

I am working on a fix.

ogrisel · 2018-10-24T12:15:18Z

I think we can close in favor of #12451.

oleksandr-pavlyk · 2018-10-24T12:52:07Z

Agreed.

fixed test_bayesian_fit_predict. fit(X).predict(X) may give different…

bae1a62

… prediction that fit_predict(X) is the model fitting has not reached convergence

test_bayesian_mixture_fit_predict relaxed if not converged

a0221e7

ogrisel mentioned this pull request Oct 24, 2018

Fix inconsistent labels returned by BayesianGaussianMixture.fit_predict #12451

Merged

oleksandr-pavlyk closed this Oct 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ensure estimators converged in test_bayesian_mixture_fit_predict #12266

Ensure estimators converged in test_bayesian_mixture_fit_predict #12266

Uh oh!

oleksandr-pavlyk commented Oct 3, 2018

Uh oh!

jeremiedbb commented Oct 3, 2018

Uh oh!

amueller commented Oct 3, 2018

Uh oh!

oleksandr-pavlyk commented Oct 3, 2018 •

edited

Loading

Uh oh!

oleksandr-pavlyk commented Oct 8, 2018

Uh oh!

ogrisel commented Oct 24, 2018

Uh oh!

ogrisel commented Oct 24, 2018

Uh oh!

ogrisel commented Oct 24, 2018

Uh oh!

oleksandr-pavlyk commented Oct 24, 2018

Uh oh!

Uh oh!

Uh oh!

Ensure estimators converged in test_bayesian_mixture_fit_predict #12266

Ensure estimators converged in test_bayesian_mixture_fit_predict #12266

Uh oh!

Conversation

oleksandr-pavlyk commented Oct 3, 2018

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jeremiedbb commented Oct 3, 2018

Uh oh!

amueller commented Oct 3, 2018

Uh oh!

oleksandr-pavlyk commented Oct 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oleksandr-pavlyk commented Oct 8, 2018

Uh oh!

ogrisel commented Oct 24, 2018

Uh oh!

ogrisel commented Oct 24, 2018

Uh oh!

ogrisel commented Oct 24, 2018

Uh oh!

oleksandr-pavlyk commented Oct 24, 2018

Uh oh!

Uh oh!

oleksandr-pavlyk commented Oct 3, 2018 •

edited

Loading