-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Integration of GSoC2015 Gaussian Mixture (first step) #6407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4476dc3
to
2f1200e
Compare
------- | ||
X : array, (n_samples, n_features) | ||
""" | ||
X = check_array(X, dtype=[np.float64, np.float32], ensure_2d=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not having ensure_2d=True, allow_nd=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if you check X here, you can remove the check in GMM._fit
I don't understand why you work on |
I think the goal is to make the diff readable, and then to do a different PR with a simple file renaming |
Well, the new classes in my PR have huge differences. That's not a small amount of diff. |
@TomDLT Thanks for your comments, I will correct all those things fastly.
Indeed. The only thing I've done for now is to add the class _MixtureBase and modify the old GMM class to derive from it. I plan next to do another PR to move _MixtureBase into base.py. @xuewei4d Maybe I didn't explain my goal correctly and I'm sorry for that. Of course your PR has huge differences and obviously I have noticed it. My purpose is to merge all your work. For me, the best solution is to merge it step by step. It is what I've proposed with this PR. Of course, it's just a little part of your work but it's easier to analyse it in that way. |
The problem is to deprecate the old So I would suggest to:
This |
And include |
In fact, I thought add |
e7841e7
to
a1cea27
Compare
@@ -112,7 +114,7 @@ def sample_gaussian(mean, covar, covariance_type='diag', n_samples=1, | |||
return (rand.T + mean).T | |||
|
|||
|
|||
class GMM(BaseEstimator): | |||
class _GMMBase(_MixtureBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not inherit from the new _MixtureBase
class. Just BaseEstimator
as previously. We do not want to change the implementation and behavior of the old GMM
class in any way besides issuing a deprecation warning in the init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I have to send all the code of _MixtureBase
and GaussianMixture
in the same PR.
To simplify the review, I can propose to split the code in different commit. I don't see a better way of not pushing 1000 lines at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but just the code for GaussianMixture (not the variational Bayes sub classes).
@tguillemot any news on this PR? |
Sorry for being late. I'll send everything before the end of the day. |
a7922ff
to
a878b60
Compare
return logsumexp(self._estimate_weighted_log_prob(X), axis=1) | ||
|
||
def score(self, X, y=None): | ||
"""Compute the log-likelihood of the model on the given data X. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is not correct: this is the per-sample average log-likelihood: as we use p.mean()
instead of p.sum()
.
Alright but I don't understand why dividing tol by Also shouldn't we divide tol the log-likelihood change by |
My mistake on the old class it's not divide by |
Ok great that makes sense to me then. It would be good to check how other packages check convergence for GMM trained with EM: http://www.mixmod.org |
for cov_type in COVARIANCE_TYPE: | ||
gmm = GaussianMixture(n_components=n_samples, covariance_type=cov_type, | ||
reg_covar=0) | ||
assert_raise_message(ValueError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should not introduce a dedicated exception such as DivergenceError
in sklearn.exceptions
but maybe we can stick to ValueError
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do another PR for that once we have finished with the GMM, ok ?
2930805
to
13a8ccf
Compare
Ok I need help to solve the travis problem. Do you have any explination ? |
This is weird. Maybe try to squash and rebase on top of master. |
b0eed8a
to
628e6a5
Compare
Another solution could be to propose a new PR with a new branch. |
or # delete your branch
git checkout GSoC-integration-step1
git checkout -b GSoC-integration-step1_backup
git branch -D GSoC-integration-step1
# create new branch with same name
git checkout master
git checkout -b GSoC-integration-step1
# cherry-pick and push
git cherry-pick 628e6a56c861aa582cf7f6d4032c9bff24bd07de
git push -f origin GSoC-integration-step1
git branch -D GSoC-integration-step1_backup Maybe do some |
9f7ece7
to
2201ef7
Compare
Depreciation of the GMM class. Modification of the GaussianMixture class. Some functions from the original GSoC code have been removed, renamed or simplified. Some new functions have been introduced (as the 'check_parameters' function). Some parameters names have been changed : - covars_ -> covariances_ : to be coherent with sklearn/covariances Addition of the parameter 'warm_start' allowing to fit data by using the previous computation. The old examples have been modified to replace the deprecated GMM class by the new GaussianMixture class. Every exemple use the eigenvectors norm to solve the scale ellipse problem (Issues 6548). Correction of all commentaries from the PR - Rename MixtureBase -> BaseMixture - Remove n_features_ - Fix some problems - Add some tests Correction of the bic/aic test. Fix the test_check_means and test_check_covariances. Remove all references to the deprecated GMM class. Remove initialized_. Add and correct docstring. Correct the order of random_state. Fix small typo. Some fix in prevision of the integration of the new BayesianGaussianMixture class. Modification in preparation of the integration of the BayesianGaussianMixture class. Add 'best_n_iter' attribute. Fix some bugs and tests. Change the parameter order in the documentation. Change best_n_iter_ name to n_iter_. Fix of the warm_start problem. Fix the divergence error message. Correction of the random state init in the test file. Fix the testing problems.
2201ef7
to
12d66fc
Compare
Ok I've tried your solution and also try to do it manually and it still doesn't work. |
Interestingly you solved the probem by adding the file, but unsolved it by removing the file (only py35 though) |
To solve the problem with the strange travis bug, I've created another PR #6666. |
@xuewei4d has done a huge work for the GM classes.
Nevertheless it seems that #4802 is not integrated yet. I thought it was because of the length of the contribution and the difficulty to measure the efficiency of the new classes with respect to the previous ones.
I propose to integrate this work step by step in several pull requests (400 lines max each time).
This pull request introduces the new abstract class _MixtureBase with some common GMM attributes and methods. Now, GMM derives from _MixtureBase.
@ogrisel @amueller @agramfort