Skip to content

[MRG+1] Integration of GSoC2015 Gaussian Mixture (first step) #6666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 21, 2016

Conversation

tguillemot
Copy link
Contributor

@ogrisel @agramfort @TomDLT I've created a new PR to solve the problem with travis of the #6407 .
Sorry for the noise.

http://www.mixmod.org
https://cran.r-project.org/web/views/Cluster.html (bgmm for instance)

@ogrisel Mixmod and bgmm don't divide the tolerance by n_samples.

for _ in range(100):
prev_log_likelihood = current_log_likelihood
current_log_likelihood = gmm.fit(X).score(X)
assert_greater(current_log_likelihood, prev_log_likelihood)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assert_greater_equal is safer here. There is no guarantee for strict monotonicity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this test should catch ConvergenceWarning explicitly.

@ogrisel
Copy link
Member

ogrisel commented Apr 15, 2016

When launching the tests I get a numerical underflow here:

sklearn.mixture.tests.test_gaussian_mixture.test_gaussian_mixture_estimate_log_prob_resp ... /home/ogrisel/code/scikit-learn/sklearn/utils/extmath.py:411: RuntimeWarning: underflow encountered in exp
  out = np.log(np.sum(np.exp(arr - vmax), axis=0))

this could easily be resolved with scipy.misc.logsumexp (I think).

@ogrisel
Copy link
Member

ogrisel commented Apr 15, 2016

There is another logsumexp issue here:

sklearn.mixture.tests.test_gaussian_mixture.test_gaussian_mixture_estimate_log_prob_resp ... /home/ogrisel/code/scikit-learn/sklearn/utils/extmath.py:411: RuntimeWarning: underflow encountered in exp
  out = np.log(np.sum(np.exp(arr - vmax), axis=0))

@ogrisel
Copy link
Member

ogrisel commented Apr 15, 2016

This underflow might be more tricky but it would be great if you could have a look:

/home/ogrisel/code/scikit-learn/sklearn/mixture/tests/test_gaussian_mixture.py:448: RuntimeWarning: underflow encountered in exp
  resp = np.exp(log_resp)

@ogrisel
Copy link
Member

ogrisel commented Apr 15, 2016

Please also make sure to filter the DeprecationWarnings for the old GMM class in the tests.

@ogrisel
Copy link
Member

ogrisel commented Apr 15, 2016

There are also a couple of ConvergenceWarning that we should catch when they are expected (or fix if they are not expected):

sklearn.mixture.tests.test_gaussian_mixture.test_gaussian_mixture_aic_bic ... /home/ogrisel/code/scikit-learn/sklearn/mixture/base.py:218: ConvergenceWarning: Initialization 1 did not converged. Try different init parameters, or increase n_init, tol or check for degenerate data.
  % (init + 1), ConvergenceWarning)
ok
sklearn.mixture.tests.test_gaussian_mixture.test_warm_start ... /home/ogrisel/code/scikit-learn/sklearn/mixture/base.py:218: ConvergenceWarning: Initialization 1 did not converged. Try different init parameters, or increase n_init, tol or check for degenerate data.
  % (init + 1), ConvergenceWarning)

prev_log_likelihood = current_log_likelihood
current_log_likelihood = gmm.fit(X).score(X)
assert_greater_equal(current_log_likelihood, prev_log_likelihood)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could also add a check that gmm.n_iter_ is also incremented by 1 at each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not agree with for these one.
For me warm_start is a way to fit another model which is close to the previous fitted model.
Consequently, n_iter_ corresponds to the number of iterations use to fit the new model.
I can change the way it's implemented if you prefer.

@tguillemot
Copy link
Contributor Author

this could easily be resolved with scipy.misc.logsumexp (I think).

In fact, the problem is the same with scipy.misc.logsumexp.
I have change the test to solve this problem.

There is still another underflow in the regularisation test but this one is normal : I've filter it.

@tguillemot tguillemot force-pushed the GSoC-GMM-new branch 4 times, most recently from 1c72072 to e55a94f Compare April 18, 2016 13:16
@ogrisel
Copy link
Member

ogrisel commented Apr 18, 2016

Instead of using a catch-all @ignore_warnings it would be great to selectively filter DeprecationWarnings and add an inline comment to state that those are tests for the deprecated GMM class.

@ogrisel
Copy link
Member

ogrisel commented Apr 18, 2016

This is also a good opportunity to review all those tests and check that there is an equivalent test for the GaussianMixture class when it makes sense.

@ogrisel
Copy link
Member

ogrisel commented Apr 18, 2016

We still get the convergence warning by running most of the examples. I think the default value for tol is too strict. I think we should probably increase it to make it such that the GaussianMixture examples do not raise any convergence warning with the default convergence criterion.

@ogrisel
Copy link
Member

ogrisel commented Apr 18, 2016

I would set the default tol to tol=1e-3 (don't forget to change the docstring of the GaussianMixture class accordingly) while also setting tol=1e-7 as non-default constructor argument only in the test_warm_start to check for stricter convergence.

@tguillemot
Copy link
Contributor Author

Ok I've added some tests to check the attribute values and that multiple init are giving better or equal results.

raise ValueError("The algorithm has diverged because of too "
"few samples per components. "
"Try to decrease the number of components, or "
"increase reg_covar.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add make the error message more informative by including the current values for n_components and reg_covar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I write it here what we discuss earlier to push a trace of your talk on github.
I prefer to not change that function because I should add reg_covar as parameter only for the warning message.

@ogrisel
Copy link
Member

ogrisel commented Apr 19, 2016

@tguillemot this was my last batch of comments for this PR. +1 for merge once they are addressed ;)

if callable(obj):
return _ignore_warnings(obj)
else:
elif category is None:
return _IgnoreWarnings()
Copy link
Member

@ogrisel ogrisel Apr 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind was to instead extend _IgnoreWarnings to accept a cagegory keyword that it would itself pass to its own underlying call to warnings.simplefilter in its __enter__ method.

@tguillemot tguillemot force-pushed the GSoC-GMM-new branch 2 times, most recently from aeabdee to 2a77da0 Compare April 21, 2016 15:41
@tguillemot
Copy link
Contributor Author

Ok @ogrisel , it took a long time but I've modified the ignore_warning as you wanted.
I've added some tests to be sure it's working as expected.


"""Improved and simplified Python warnings context manager

This class allows to ignore the warnings raise by a function.
Copied from Python 2.7.5 and modified as required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer a straigth copy from Python 2.7.5. This part of the docstring needs to be updated.

@ogrisel
Copy link
Member

ogrisel commented Apr 21, 2016

LGMT, +1 for merge once the docstring of _IgnoreWarning has been updated and if the tests pass on travis + appveyor. Please also squash the commits and add an entry in whats_new.rst. Please document the deprecation in the relevant section for 0.18.

@agramfort @TomDLT do you like the warnings change?

Depreciation of the GMM class.

Modification of the GaussianMixture class.
  Some functions from the original GSoC code have been removed, renamed or simplified.
  Some new functions have been introduced (as the 'check_parameters' function).
  Some parameters names have been changed :
   - covars_ -> covariances_ : to be coherent with sklearn/covariances
  Addition of the parameter 'warm_start' allowing to fit data by using the previous computation.

The old examples have been modified to replace the deprecated GMM class by the new GaussianMixture class.
Every exemple use the eigenvectors norm to solve the scale ellipse problem (Issues 6548).

Correction of all commentaries from the PR
- Rename MixtureBase -> BaseMixture
- Remove n_features_
- Fix some problems
- Add some tests

Correction of the bic/aic test.

Fix the test_check_means and test_check_covariances.

Remove all references to the deprecated GMM class.

Remove initialized_.
Add and correct docstring.

Correct the order of random_state.

Fix small typo.

Some fix in prevision of the integration of the new BayesianGaussianMixture class.

Modification in preparation of the integration of the BayesianGaussianMixture class.
Add 'best_n_iter' attribute.
Fix some bugs and tests.

Change the parameter order in the documentation.

Change best_n_iter_ name to n_iter_.

Fix of the warm_start problem.
Fix the divergence error message.
Correction of the random state init in the test file.
Fix the testing problems.

Update and add comments into the monotonic test.
@TomDLT
Copy link
Member

TomDLT commented Apr 21, 2016

@TomDLT do you like the warning change?

Yep, looks fine to me.

@tguillemot tguillemot changed the title [MRG] Integration of GSoC2015 Gaussian Mixture (first step) [MRG+1] Integration of GSoC2015 Gaussian Mixture (first step) Apr 21, 2016
@ogrisel ogrisel merged commit 6bc346b into scikit-learn:master Apr 21, 2016
@ogrisel
Copy link
Member

ogrisel commented Apr 21, 2016

Thank you very much @tguillemot and @xuewei4d!

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Apr 21, 2016 via email

@ogrisel
Copy link
Member

ogrisel commented Apr 21, 2016

🍻

1 similar comment
@raghavrv
Copy link
Member

🍻

@xuewei4d
Copy link
Contributor

🍻!

And sorry for not merging code. I am busy with writing papers.

@agramfort
Copy link
Member

🍻 ** 2

@tguillemot
Copy link
Contributor Author

Youpi !!!
🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants