Skip to content

Ensure estimators converged in test_bayesian_mixture_fit_predict #12266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

oleksandr-pavlyk
Copy link
Contributor

In the test test_bayesian_mixture_fit_predict the BayesianGaussianMixture estimators used to build prediction did not converge (as indicated by message).

In such circumstances, fit_predict(X) bases its prediction on the data from e_step on the last iteration (see base.py#L244 and use: base.py#L274).

fit(X).predict(X) uses data after the m_step of the last iterations.

Hence for the test to reasonably expect the same predictions, estimations must converge.

The test failure was caught in Intel Distribution for Python, because due to changes to KMeans, the initialization of BayesianGaussianMixture was different.

It should be possible to reproduce this failure in current master by playing with random_state.

What does this implement/fix? Explain your changes.

The change increases max_iter keyword value and adds assertions for both estimators to have converged.

Any other comments?

Here is the reproducer in current 0.20.0 installed from pip:

import numpy as np
import copy

from sklearn.mixture import BayesianGaussianMixture
from sklearn.utils.testing import assert_array_equal
from sklearn.mixture.tests.test_gaussian_mixture import RandomData

def test_bayesian_mixture_fit_predict(seed):
    rng = np.random.RandomState(seed)
    rand_data = RandomData(rng, scale=7)
    n_components = 2 * rand_data.n_components
    for covar_type in ['full', 'tied', 'diag', 'spherical']:
        bgmm1 = BayesianGaussianMixture(n_components=n_components,
                                        max_iter=100, random_state=rng,
                                        tol=1e-3, reg_covar=0)
        bgmm1.covariance_type = covar_type
        bgmm2 = copy.deepcopy(bgmm1)
        X = rand_data.X[covar_type]
        Y_pred1 = bgmm1.fit(X).predict(X)
        Y_pred2 = bgmm2.fit_predict(X)
        assert_array_equal(Y_pred1, Y_pred2)

# passes, in the test suite, although produces many convergence warnings
test_bayesian_mixture_fit_predict(0) 

# fails with (mismatch 0.2%)
test_bayesian_mixture_fit_predict(1) 

@GaelVaroquaux

… prediction that fit_predict(X) is the model fitting has not reached convergence
@jeremiedbb
Copy link
Member

I'm indeed experiencing this failure due to changes in KMeans in #11950.

@amueller
Copy link
Member

amueller commented Oct 3, 2018

test failure

@oleksandr-pavlyk
Copy link
Contributor Author

oleksandr-pavlyk commented Oct 3, 2018

@amueller Test failed because EM did not converge even in 250 iterations on that machine.

There are two options, either increase max_iter value, or allow for small discrepancy in predictions if fit did not reach convergence.

Increasing max_iter makes the test run a little longer.

Is there a metric to check percentage of agreement between predictions?

@oleksandr-pavlyk
Copy link
Contributor Author

@amueller I have added logic that predictions must be equal if both converged, otherwise accuracy score of over .95 is required (somewhat arbitrary). Any thoughts on such an approach? Thanks,

@ogrisel
Copy link
Member

ogrisel commented Oct 24, 2018

@jeremiedbb told me IRL that this test is unstable even on master: changing the seed make it fail very often.

I think the cause of the problem is that in fit_predict we do the and m_step after the e_step only if we have not converged.

I think this loop could be rewritten to ensure that we always do an e_step at last in any case (even when we do not converge).

@ogrisel
Copy link
Member

ogrisel commented Oct 24, 2018

I am working on a fix.

@ogrisel
Copy link
Member

ogrisel commented Oct 24, 2018

I think we can close in favor of #12451.

@oleksandr-pavlyk
Copy link
Contributor Author

Agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants