-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Integration of GSoC2015 Gaussian Mixture (first step) #6666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for _ in range(100): | ||
prev_log_likelihood = current_log_likelihood | ||
current_log_likelihood = gmm.fit(X).score(X) | ||
assert_greater(current_log_likelihood, prev_log_likelihood) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think assert_greater_equal
is safer here. There is no guarantee for strict monotonicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this test should catch ConvergenceWarning explicitly.
When launching the tests I get a numerical underflow here:
this could easily be resolved with |
There is another
|
This underflow might be more tricky but it would be great if you could have a look:
|
Please also make sure to filter the |
There are also a couple of
|
prev_log_likelihood = current_log_likelihood | ||
current_log_likelihood = gmm.fit(X).score(X) | ||
assert_greater_equal(current_log_likelihood, prev_log_likelihood) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could also add a check that gmm.n_iter_
is also incremented by 1 at each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not agree with for these one.
For me warm_start
is a way to fit another model which is close to the previous fitted model.
Consequently, n_iter_
corresponds to the number of iterations use to fit the new model.
I can change the way it's implemented if you prefer.
In fact, the problem is the same with scipy.misc.logsumexp. There is still another underflow in the regularisation test but this one is normal : I've filter it. |
1c72072
to
e55a94f
Compare
Instead of using a catch-all |
This is also a good opportunity to review all those tests and check that there is an equivalent test for the |
We still get the convergence warning by running most of the examples. I think the default value for |
I would set the default tol to |
3e3e1ab
to
4965259
Compare
Ok I've added some tests to check the attribute values and that multiple init are giving better or equal results. |
raise ValueError("The algorithm has diverged because of too " | ||
"few samples per components. " | ||
"Try to decrease the number of components, or " | ||
"increase reg_covar.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to add make the error message more informative by including the current values for n_components
and reg_covar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I write it here what we discuss earlier to push a trace of your talk on github.
I prefer to not change that function because I should add reg_covar
as parameter only for the warning message.
@tguillemot this was my last batch of comments for this PR. +1 for merge once they are addressed ;) |
if callable(obj): | ||
return _ignore_warnings(obj) | ||
else: | ||
elif category is None: | ||
return _IgnoreWarnings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I had in mind was to instead extend _IgnoreWarnings
to accept a cagegory
keyword that it would itself pass to its own underlying call to warnings.simplefilter
in its __enter__
method.
aeabdee
to
2a77da0
Compare
2a77da0
to
06bf29e
Compare
Ok @ogrisel , it took a long time but I've modified the |
|
||
"""Improved and simplified Python warnings context manager | ||
|
||
This class allows to ignore the warnings raise by a function. | ||
Copied from Python 2.7.5 and modified as required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer a straigth copy from Python 2.7.5. This part of the docstring needs to be updated.
LGMT, +1 for merge once the docstring of @agramfort @TomDLT do you like the warnings change? |
Depreciation of the GMM class. Modification of the GaussianMixture class. Some functions from the original GSoC code have been removed, renamed or simplified. Some new functions have been introduced (as the 'check_parameters' function). Some parameters names have been changed : - covars_ -> covariances_ : to be coherent with sklearn/covariances Addition of the parameter 'warm_start' allowing to fit data by using the previous computation. The old examples have been modified to replace the deprecated GMM class by the new GaussianMixture class. Every exemple use the eigenvectors norm to solve the scale ellipse problem (Issues 6548). Correction of all commentaries from the PR - Rename MixtureBase -> BaseMixture - Remove n_features_ - Fix some problems - Add some tests Correction of the bic/aic test. Fix the test_check_means and test_check_covariances. Remove all references to the deprecated GMM class. Remove initialized_. Add and correct docstring. Correct the order of random_state. Fix small typo. Some fix in prevision of the integration of the new BayesianGaussianMixture class. Modification in preparation of the integration of the BayesianGaussianMixture class. Add 'best_n_iter' attribute. Fix some bugs and tests. Change the parameter order in the documentation. Change best_n_iter_ name to n_iter_. Fix of the warm_start problem. Fix the divergence error message. Correction of the random state init in the test file. Fix the testing problems. Update and add comments into the monotonic test.
Yep, looks fine to me. |
06bf29e
to
194ded1
Compare
194ded1
to
6fb6c63
Compare
Thank you very much @tguillemot and @xuewei4d! |
🍻 !
|
🍻 |
1 similar comment
🍻 |
🍻! And sorry for not merging code. I am busy with writing papers. |
🍻 ** 2 |
Youpi !!! |
@ogrisel @agramfort @TomDLT I've created a new PR to solve the problem with travis of the #6407 .
Sorry for the noise.
@ogrisel Mixmod and bgmm don't divide the tolerance by
n_samples
.