-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651
Conversation
# XXX @xuewei4d I think you forgot n_component in your code ? | ||
temp1 = (.5 * np.sum(temp1) + | ||
self.n_components * self._log_gaussian_norm_prior) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xuewei4d I think you forgot to multiply the log_gaussian_norm
by n_components
. Could you confirm it for the 4 functions please ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked it. I didn't forget it in Line791 in my PR
@tguillemot could you please rebase/squash on top of the current master to take the recent changes from #6666 into account in this PR? |
352b51a
to
427650b
Compare
427650b
to
8710b60
Compare
@tguillemot This PR needs to be updated to take the precision-based parametrization into account:
|
@ogrisel I push the last commit I've done but I'm working on another PR for the moment. |
2d93f16
to
8b205df
Compare
I've solved the problem with VBGMM. I've to do some cleaning but I think I will be good to merge next week. |
@tguillemot Sure. Can I have your email address? |
Is is expected that when increasing |
@xuewei4d Thanks for the formula. @ngoix The version current version of this PR is not working well and have a lot a problem. So I suspect that it is a problem of that. |
d486d12
to
65e3400
Compare
@ngoix The code of the BayesianGaussianMixture is corrected now. |
@tguillemot Can I have the updated formula pdf? |
It can be due to my data, but now the number of components found is always maximal (even with |
whoops, it does not always find the maximal number of components sorry. |
@xuewei4d I haven't corrected the latex formula. I will put everything on scikit once it will be done. @ngoix This method is an EM and converge to a local minimum. If the init is not good, you will never reach the global minimum. |
@agramfort @amueller @ogrisel BayesianGaussianMixture is mergeable. |
random_state = check_random_state(self.random_state) | ||
|
||
if self.init_params == 'kmeans': | ||
resp = np.zeros((n_samples, self.n_components)) | ||
label = cluster.KMeans(n_clusters=self.n_components, n_init=1, | ||
random_state=random_state).fit(X).labels_ | ||
random_state=0).fit(X).labels_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 -> random_state
@@ -248,12 +247,14 @@ def _e_step(self, X): | |||
Returns | |||
------- | |||
log_prob_norm : array, shape (n_samples,) | |||
log p(X) | |||
Logarithm of the probability of X. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logarithm of the probability of each sample in X.
4d5de40
to
9c7ca50
Compare
Let's merge now. Thanks for all your efforts @tguillemot! |
And also thank you again @xuewei4d for the initial code refactoring and maths derivations. |
Thanks @tguillemot ! |
Hurrah!! On 31 August 2016 at 08:28, Wei Xue notifications@github.com wrote:
|
Hurrah !!!!!!!! Thanks everyone !!! |
yay! 🍻 |
awesome :) Thanks everyone! |
:class:`BayesianGaussianMixture`. The new class solves the computational | ||
problems of the old class and computes the Variational Bayesian Gaussian | ||
mixture faster than before. | ||
Ref :ref:`b` for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tguillemot what's b
supposed to reference? It's a dead link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be fix with #7295.
…step) (scikit-learn#6651) * Add the new BayesianGaussianMixture class. Add the test file for the BayesianGaussianMixture. * Add the use of the cholesky decomposition of the precision matrix. * Fix some bugs. * Modification of GaussianMixture class. The purpose here is to prepare the integration of BayesianGaussianMixture. * Fix comments. * Modification of the Docstring. * Add license and author. * Fix pb typo of eq 10.64 and 10.62. * Correct VBGMM bugs. * Fix full version. * Fix the precision normalisation pb. * Fix all cov_type algo for BayesianGaussianMixture. * Optimisation of spherical and diag computation. * Code simplification. * Check the Gaussian Mixture tests are ok. * Add test. * Add new tests for BayesianGaussianMixture and GaussianMixture. * Add the bayesian_gaussian_example and the doc. * Fix comments. * Fix review comments and add license and author. * Fix test compare covar type. * Fix reviews. * Fix tests. * Fix review comments. * Correct reviews. * Fix travis pb. * Fix circleci pb. * Fix review comments. * Fix typo. * Fix comments. Add reg_covar and what's new. * Fix comments. * Fix comments. * [ci skip] Correct legend.
Previously modified with PR scikit-learn#6651
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR #6651 * Change tag name Old refers to new tag added with PR #7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
…step) (scikit-learn#6651) * Add the new BayesianGaussianMixture class. Add the test file for the BayesianGaussianMixture. * Add the use of the cholesky decomposition of the precision matrix. * Fix some bugs. * Modification of GaussianMixture class. The purpose here is to prepare the integration of BayesianGaussianMixture. * Fix comments. * Modification of the Docstring. * Add license and author. * Fix pb typo of eq 10.64 and 10.62. * Correct VBGMM bugs. * Fix full version. * Fix the precision normalisation pb. * Fix all cov_type algo for BayesianGaussianMixture. * Optimisation of spherical and diag computation. * Code simplification. * Check the Gaussian Mixture tests are ok. * Add test. * Add new tests for BayesianGaussianMixture and GaussianMixture. * Add the bayesian_gaussian_example and the doc. * Fix comments. * Fix review comments and add license and author. * Fix test compare covar type. * Fix reviews. * Fix tests. * Fix review comments. * Correct reviews. * Fix travis pb. * Fix circleci pb. * Fix review comments. * Fix typo. * Fix comments. Add reg_covar and what's new. * Fix comments. * Fix comments. * [ci skip] Correct legend.
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
This PR is the second part of the GSoC integration. It is directly based on the work of the #6407.
Here I propose to integrate the Bayesian Gaussian Mixture :
This PR is based on #6407, it will be better to analyse only the files that refer to the
BayesianGaussianMixture
class.