[MRG+1] Integration of GSoC2015 Gaussian Mixture (first step) #6666

tguillemot · 2016-04-15T12:42:48Z

@ogrisel @agramfort @TomDLT I've created a new PR to solve the problem with travis of the #6407 .
Sorry for the noise.

http://www.mixmod.org
https://cran.r-project.org/web/views/Cluster.html (bgmm for instance)

@ogrisel Mixmod and bgmm don't divide the tolerance by n_samples.

ogrisel · 2016-04-15T15:09:16Z

sklearn/mixture/tests/test_gaussian_mixture.py

+        for _ in range(100):
+            prev_log_likelihood = current_log_likelihood
+            current_log_likelihood = gmm.fit(X).score(X)
+            assert_greater(current_log_likelihood, prev_log_likelihood)


I think assert_greater_equal is safer here. There is no guarantee for strict monotonicity.

Also this test should catch ConvergenceWarning explicitly.

ogrisel · 2016-04-15T18:41:41Z

When launching the tests I get a numerical underflow here:

sklearn.mixture.tests.test_gaussian_mixture.test_gaussian_mixture_estimate_log_prob_resp ... /home/ogrisel/code/scikit-learn/sklearn/utils/extmath.py:411: RuntimeWarning: underflow encountered in exp
  out = np.log(np.sum(np.exp(arr - vmax), axis=0))

this could easily be resolved with scipy.misc.logsumexp (I think).

ogrisel · 2016-04-15T19:24:36Z

There is another logsumexp issue here:

sklearn.mixture.tests.test_gaussian_mixture.test_gaussian_mixture_estimate_log_prob_resp ... /home/ogrisel/code/scikit-learn/sklearn/utils/extmath.py:411: RuntimeWarning: underflow encountered in exp
  out = np.log(np.sum(np.exp(arr - vmax), axis=0))

ogrisel · 2016-04-15T19:25:29Z

This underflow might be more tricky but it would be great if you could have a look:

/home/ogrisel/code/scikit-learn/sklearn/mixture/tests/test_gaussian_mixture.py:448: RuntimeWarning: underflow encountered in exp
  resp = np.exp(log_resp)

ogrisel · 2016-04-15T19:26:35Z

Please also make sure to filter the DeprecationWarnings for the old GMM class in the tests.

ogrisel · 2016-04-15T19:27:46Z

There are also a couple of ConvergenceWarning that we should catch when they are expected (or fix if they are not expected):

sklearn.mixture.tests.test_gaussian_mixture.test_gaussian_mixture_aic_bic ... /home/ogrisel/code/scikit-learn/sklearn/mixture/base.py:218: ConvergenceWarning: Initialization 1 did not converged. Try different init parameters, or increase n_init, tol or check for degenerate data.
  % (init + 1), ConvergenceWarning)
ok
sklearn.mixture.tests.test_gaussian_mixture.test_warm_start ... /home/ogrisel/code/scikit-learn/sklearn/mixture/base.py:218: ConvergenceWarning: Initialization 1 did not converged. Try different init parameters, or increase n_init, tol or check for degenerate data.
  % (init + 1), ConvergenceWarning)

ogrisel · 2016-04-15T19:41:04Z

sklearn/mixture/tests/test_gaussian_mixture.py

+            prev_log_likelihood = current_log_likelihood
+            current_log_likelihood = gmm.fit(X).score(X)
+            assert_greater_equal(current_log_likelihood, prev_log_likelihood)
+


Maybe you could also add a check that gmm.n_iter_ is also incremented by 1 at each iteration.

I'm not agree with for these one.
For me warm_start is a way to fit another model which is close to the previous fitted model.
Consequently, n_iter_ corresponds to the number of iterations use to fit the new model.
I can change the way it's implemented if you prefer.

tguillemot · 2016-04-18T13:10:56Z

this could easily be resolved with scipy.misc.logsumexp (I think).

In fact, the problem is the same with scipy.misc.logsumexp.
I have change the test to solve this problem.

There is still another underflow in the regularisation test but this one is normal : I've filter it.

ogrisel · 2016-04-18T17:50:33Z

Instead of using a catch-all @ignore_warnings it would be great to selectively filter DeprecationWarnings and add an inline comment to state that those are tests for the deprecated GMM class.

ogrisel · 2016-04-18T17:51:31Z

This is also a good opportunity to review all those tests and check that there is an equivalent test for the GaussianMixture class when it makes sense.

ogrisel · 2016-04-18T21:29:31Z

We still get the convergence warning by running most of the examples. I think the default value for tol is too strict. I think we should probably increase it to make it such that the GaussianMixture examples do not raise any convergence warning with the default convergence criterion.

ogrisel · 2016-04-18T21:45:53Z

I would set the default tol to tol=1e-3 (don't forget to change the docstring of the GaussianMixture class accordingly) while also setting tol=1e-7 as non-default constructor argument only in the test_warm_start to check for stricter convergence.

tguillemot · 2016-04-19T10:11:28Z

Ok I've added some tests to check the attribute values and that multiple init are giving better or equal results.

ogrisel · 2016-04-19T22:04:02Z

sklearn/mixture/gaussian_mixture.py

+            raise ValueError("The algorithm has diverged because of too "
+                             "few samples per components. "
+                             "Try to decrease the number of components, or "
+                             "increase reg_covar.")


It would be better to add make the error message more informative by including the current values for n_components and reg_covar.

I write it here what we discuss earlier to push a trace of your talk on github.
I prefer to not change that function because I should add reg_covar as parameter only for the warning message.

ogrisel · 2016-04-19T22:18:00Z

@tguillemot this was my last batch of comments for this PR. +1 for merge once they are addressed ;)

ogrisel · 2016-04-20T20:02:52Z

sklearn/utils/testing.py

    if callable(obj):
        return _ignore_warnings(obj)
-    else:
+    elif category is None:
        return _IgnoreWarnings()


What I had in mind was to instead extend _IgnoreWarnings to accept a cagegory keyword that it would itself pass to its own underlying call to warnings.simplefilter in its __enter__ method.

tguillemot · 2016-04-21T15:43:53Z

Ok @ogrisel , it took a long time but I've modified the ignore_warning as you wanted.
I've added some tests to be sure it's working as expected.

ogrisel · 2016-04-21T15:49:59Z

sklearn/utils/testing.py


-    """Improved and simplified Python warnings context manager
-
+    This class allows to ignore the warnings raise by a function.
    Copied from Python 2.7.5 and modified as required.


This is no longer a straigth copy from Python 2.7.5. This part of the docstring needs to be updated.

ogrisel · 2016-04-21T15:55:09Z

LGMT, +1 for merge once the docstring of _IgnoreWarning has been updated and if the tests pass on travis + appveyor. Please also squash the commits ~~and add an entry in whats_new.rst. Please document the deprecation in the relevant section for 0.18~~.

@agramfort @TomDLT do you like the warnings change?

Depreciation of the GMM class. Modification of the GaussianMixture class. Some functions from the original GSoC code have been removed, renamed or simplified. Some new functions have been introduced (as the 'check_parameters' function). Some parameters names have been changed : - covars_ -> covariances_ : to be coherent with sklearn/covariances Addition of the parameter 'warm_start' allowing to fit data by using the previous computation. The old examples have been modified to replace the deprecated GMM class by the new GaussianMixture class. Every exemple use the eigenvectors norm to solve the scale ellipse problem (Issues 6548). Correction of all commentaries from the PR - Rename MixtureBase -> BaseMixture - Remove n_features_ - Fix some problems - Add some tests Correction of the bic/aic test. Fix the test_check_means and test_check_covariances. Remove all references to the deprecated GMM class. Remove initialized_. Add and correct docstring. Correct the order of random_state. Fix small typo. Some fix in prevision of the integration of the new BayesianGaussianMixture class. Modification in preparation of the integration of the BayesianGaussianMixture class. Add 'best_n_iter' attribute. Fix some bugs and tests. Change the parameter order in the documentation. Change best_n_iter_ name to n_iter_. Fix of the warm_start problem. Fix the divergence error message. Correction of the random state init in the test file. Fix the testing problems. Update and add comments into the monotonic test.

TomDLT · 2016-04-21T16:18:57Z

@TomDLT do you like the warning change?

Yep, looks fine to me.

ogrisel · 2016-04-21T22:24:50Z

Thank you very much @tguillemot and @xuewei4d!

GaelVaroquaux · 2016-04-21T22:25:02Z

🍻 !

ogrisel · 2016-04-21T22:25:42Z

🍻

raghavrv · 2016-04-21T22:41:06Z

🍻

xuewei4d · 2016-04-22T01:03:03Z

🍻!

And sorry for not merging code. I am busy with writing papers.

agramfort · 2016-04-22T06:57:43Z

🍻 ** 2

tguillemot · 2016-04-22T08:00:39Z

Youpi !!!
🎉

tguillemot mentioned this pull request Apr 15, 2016

[MRG] Integration of GSoC2015 Gaussian Mixture (first step) #6407

Closed

ogrisel reviewed Apr 15, 2016
View reviewed changes

tguillemot force-pushed the GSoC-GMM-new branch 4 times, most recently from 1c72072 to e55a94f Compare April 18, 2016 13:16

tguillemot force-pushed the GSoC-GMM-new branch from 3e3e1ab to 4965259 Compare April 19, 2016 09:54

ogrisel reviewed Apr 19, 2016
View reviewed changes

ogrisel reviewed Apr 20, 2016
View reviewed changes

tguillemot force-pushed the GSoC-GMM-new branch 2 times, most recently from aeabdee to 2a77da0 Compare April 21, 2016 15:41

tguillemot force-pushed the GSoC-GMM-new branch from 2a77da0 to 06bf29e Compare April 21, 2016 15:42

ogrisel reviewed Apr 21, 2016
View reviewed changes

tguillemot force-pushed the GSoC-GMM-new branch from 06bf29e to 194ded1 Compare April 21, 2016 16:21

Modification of the ignore_warning function and _IgnoreWarning class.

6fb6c63

tguillemot force-pushed the GSoC-GMM-new branch from 194ded1 to 6fb6c63 Compare April 21, 2016 16:28

tguillemot changed the title ~~[MRG] Integration of GSoC2015 Gaussian Mixture (first step)~~ [MRG+1] Integration of GSoC2015 Gaussian Mixture (first step) Apr 21, 2016

ogrisel merged commit 6bc346b into scikit-learn:master Apr 21, 2016

tguillemot mentioned this pull request Apr 22, 2016

[MRG+1] [DOC] Adding GMM to plot_cluster_comparison.py #6305

Merged

ogrisel mentioned this pull request Apr 22, 2016

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651

Merged

8 tasks

tguillemot mentioned this pull request Apr 27, 2016

Modified GMM initialization to only use linear memory and time in sph… #6720

Closed

tguillemot mentioned this pull request Sep 2, 2016

Bug: GMM score() returns an array, not a value. #2473

Closed

tguillemot mentioned this pull request Oct 5, 2016

log-responsibilities in GMM #3813

Closed

tguillemot mentioned this pull request Jan 4, 2017

incorrect estimated means lead to non positive definite covariance in GMM #4429

Closed

NickleDave mentioned this pull request Jul 7, 2022

pin scikit-learn version to less than / equal to 0.18.2 NickleDave/songdkl#29

Closed

Uh oh!

[MRG+1] Integration of GSoC2015 Gaussian Mixture (first step) #6666

[MRG+1] Integration of GSoC2015 Gaussian Mixture (first step) #6666

Uh oh!

Conversation

tguillemot commented Apr 15, 2016

Uh oh!

ogrisel Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 15, 2016

Uh oh!

ogrisel commented Apr 15, 2016

Uh oh!

ogrisel commented Apr 15, 2016

Uh oh!

ogrisel commented Apr 15, 2016

Uh oh!

ogrisel commented Apr 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

tguillemot Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

tguillemot commented Apr 18, 2016

Uh oh!

ogrisel commented Apr 18, 2016

Uh oh!

ogrisel commented Apr 18, 2016

Uh oh!

ogrisel commented Apr 18, 2016

Uh oh!

ogrisel commented Apr 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tguillemot commented Apr 19, 2016

Uh oh!

ogrisel Apr 19, 2016

Choose a reason for hiding this comment

Uh oh!

tguillemot Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 19, 2016

Uh oh!

ogrisel Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tguillemot commented Apr 21, 2016

Uh oh!

ogrisel Apr 21, 2016

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomDLT commented Apr 21, 2016

Uh oh!

ogrisel commented Apr 21, 2016

Uh oh!

GaelVaroquaux commented Apr 21, 2016 via email

Uh oh!

ogrisel commented Apr 21, 2016

Uh oh!

raghavrv commented Apr 21, 2016

Uh oh!

xuewei4d commented Apr 22, 2016

Uh oh!

agramfort commented Apr 22, 2016

Uh oh!

tguillemot commented Apr 22, 2016

Uh oh!

Uh oh!

ogrisel commented Apr 15, 2016 •

edited

Loading

ogrisel commented Apr 18, 2016 •

edited

Loading

ogrisel Apr 20, 2016 •

edited

Loading

ogrisel commented Apr 21, 2016 •

edited

Loading