Skip to content

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Aug 30, 2016

Conversation

tguillemot
Copy link
Contributor

@tguillemot tguillemot commented Apr 12, 2016

This PR is the second part of the GSoC integration. It is directly based on the work of the #6407.
Here I propose to integrate the Bayesian Gaussian Mixture :

  • Check the code and formulas
  • Add the Bayesian Gaussian class
  • Add the docstring
  • Add the tests
  • Depreciation of the old class
  • Remove the mixtures with small weights during the process
  • Change the doc
  • Create some examples

This PR is based on #6407, it will be better to analyse only the files that refer to the BayesianGaussianMixture class.

# XXX @xuewei4d I think you forgot n_component in your code ?
temp1 = (.5 * np.sum(temp1) +
self.n_components * self._log_gaussian_norm_prior)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuewei4d I think you forgot to multiply the log_gaussian_norm by n_components. Could you confirm it for the 4 functions please ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked it. I didn't forget it in Line791 in my PR

@ogrisel
Copy link
Member

ogrisel commented Apr 22, 2016

@tguillemot could you please rebase/squash on top of the current master to take the recent changes from #6666 into account in this PR?

@tguillemot tguillemot force-pushed the GSoC-BayesianMixture branch 2 times, most recently from 352b51a to 427650b Compare April 22, 2016 10:49
@tguillemot tguillemot force-pushed the GSoC-BayesianMixture branch from 427650b to 8710b60 Compare May 19, 2016 09:11
@ogrisel
Copy link
Member

ogrisel commented May 25, 2016

@tguillemot This PR needs to be updated to take the precision-based parametrization into account:

ImportError: cannot import name _check_covariance_matrix

@tguillemot
Copy link
Contributor Author

@ogrisel I push the last commit I've done but I'm working on another PR for the moment.
There are some bugs I've to investigate.

@tguillemot
Copy link
Contributor Author

I've solved the problem with VBGMM. I've to do some cleaning but I think I will be good to merge next week.
@xuewei4d Can you send me the latex of your GSOC pdf of the bishop formulas place ? I will have to add these formula on sklearn. Thanks in advance.

@xuewei4d
Copy link
Contributor

@tguillemot Sure. Can I have your email address?

@ngoix
Copy link
Contributor

ngoix commented Jul 26, 2016

Is is expected that when increasing n_components, the number of components found (with non-negligible weights) can decrease, even with a large n_init? (alpha_init was fixed to 0.1)
It is due to the initialization step, right?

@tguillemot
Copy link
Contributor Author

@xuewei4d Thanks for the formula.

@ngoix The version current version of this PR is not working well and have a lot a problem. So I suspect that it is a problem of that.
Hopefully, BayesianGaussianMixture works perfectly now (after a lot of corrections).
I will push everything in a few moment.
I will try on my side but once I pushed, can you confirm that ?

@tguillemot tguillemot force-pushed the GSoC-BayesianMixture branch 4 times, most recently from d486d12 to 65e3400 Compare July 28, 2016 15:22
@tguillemot
Copy link
Contributor Author

@ngoix The code of the BayesianGaussianMixture is corrected now.
I need to add some tests, examples and docs until MRG.

@xuewei4d
Copy link
Contributor

@tguillemot Can I have the updated formula pdf?

@ngoix
Copy link
Contributor

ngoix commented Jul 28, 2016

It can be due to my data, but now the number of components found is always maximal (even with n_components = 100). The algorithm does not compute bic/aic scores, right?

@ngoix
Copy link
Contributor

ngoix commented Jul 29, 2016

whoops, it does not always find the maximal number of components sorry.
However, even when the number of components found is lower than n_components, it can still varies when increasing n_components. (even with alpha_init fixed)
How much does the number of components found depends on n_init?

@tguillemot
Copy link
Contributor Author

@xuewei4d I haven't corrected the latex formula. I will put everything on scikit once it will be done.

@ngoix This method is an EM and converge to a local minimum. If the init is not good, you will never reach the global minimum.
As the init is actually done with kmeans, if the kmeans is more or less the same at each iteration of n_iter, the solution will be the same every time.
I have put another init option called 'test' (I will remove it in a few moment), maybe if you use that you will have different results.
Can you do a notebook to show me your data and results ?

@tguillemot
Copy link
Contributor Author

@ogrisel @amueller @agramfort
plot_bayesian_gaussian_mixture_001

@tguillemot tguillemot changed the title [WIP] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) [MRG] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) Aug 3, 2016
@tguillemot
Copy link
Contributor Author

@agramfort @amueller @ogrisel BayesianGaussianMixture is mergeable.
Nevertheless the review will be easier when #7123 and #7124 will be merged.

random_state = check_random_state(self.random_state)

if self.init_params == 'kmeans':
resp = np.zeros((n_samples, self.n_components))
label = cluster.KMeans(n_clusters=self.n_components, n_init=1,
random_state=random_state).fit(X).labels_
random_state=0).fit(X).labels_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 -> random_state

@@ -248,12 +247,14 @@ def _e_step(self, X):
Returns
-------
log_prob_norm : array, shape (n_samples,)
log p(X)
Logarithm of the probability of X.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logarithm of the probability of each sample in X.

@TomDLT TomDLT added this to the 0.18 milestone Aug 29, 2016
@tguillemot
Copy link
Contributor Author

@ogrisel Do you prefer to wait that #7284 is ready to merge before merging this PR ?

@ogrisel
Copy link
Member

ogrisel commented Aug 30, 2016

Let's merge now. Thanks for all your efforts @tguillemot!

@ogrisel ogrisel merged commit f0862f7 into scikit-learn:master Aug 30, 2016
@ogrisel
Copy link
Member

ogrisel commented Aug 30, 2016

And also thank you again @xuewei4d for the initial code refactoring and maths derivations.

@xuewei4d
Copy link
Contributor

Thanks @tguillemot !
I will take a look at the math part, once I have time, @ogrisel

@jnothman
Copy link
Member

Hurrah!!

On 31 August 2016 at 08:28, Wei Xue notifications@github.com wrote:

Thanks @tguillemot https://github.com/tguillemot !
I will take a look at the math part, once I have time, @ogrisel
https://github.com/ogrisel


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6651 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz62GC5M8UAeBaJJtEx4BPHOz8KCiRks5qlK6NgaJpZM4IFO_l
.

@tguillemot
Copy link
Contributor Author

Hurrah !!!!!!!! Thanks everyone !!!

@raghavrv
Copy link
Member

yay! 🍻

@amueller
Copy link
Member

awesome :) Thanks everyone!

:class:`BayesianGaussianMixture`. The new class solves the computational
problems of the old class and computes the Variational Bayesian Gaussian
mixture faster than before.
Ref :ref:`b` for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tguillemot what's b supposed to reference? It's a dead link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be fix with #7295.

TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016
…step) (scikit-learn#6651)

* Add the new BayesianGaussianMixture class.
Add the test file for the BayesianGaussianMixture.

* Add the use of the cholesky decomposition of the precision matrix.

* Fix some bugs.

* Modification of GaussianMixture class.

The purpose here is to prepare the integration of BayesianGaussianMixture.

* Fix comments.

* Modification of the Docstring.

* Add license and author.

* Fix pb typo of eq 10.64 and 10.62.

* Correct VBGMM bugs.

* Fix full version.

* Fix the precision normalisation pb.

* Fix all cov_type algo for BayesianGaussianMixture.

* Optimisation of spherical and diag computation.

* Code simplification.

* Check the Gaussian Mixture tests are ok.

* Add test.

* Add new tests for BayesianGaussianMixture and GaussianMixture.

* Add the bayesian_gaussian_example and the doc.

* Fix comments.

* Fix review comments and add license and author.

* Fix test compare covar type.

* Fix reviews.

* Fix tests.

* Fix review comments.

* Correct reviews.

* Fix travis pb.

* Fix circleci pb.

* Fix review comments.

* Fix typo.

* Fix comments.

Add reg_covar and what's new.

* Fix comments.

* Fix comments.

* [ci skip] Correct legend.
balakmran pushed a commit to balakmran/scikit-learn that referenced this pull request Jul 20, 2017
Previously modified with PR scikit-learn#6651
jnothman pushed a commit that referenced this pull request Jul 30, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Aug 6, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…step) (scikit-learn#6651)

* Add the new BayesianGaussianMixture class.
Add the test file for the BayesianGaussianMixture.

* Add the use of the cholesky decomposition of the precision matrix.

* Fix some bugs.

* Modification of GaussianMixture class.

The purpose here is to prepare the integration of BayesianGaussianMixture.

* Fix comments.

* Modification of the Docstring.

* Add license and author.

* Fix pb typo of eq 10.64 and 10.62.

* Correct VBGMM bugs.

* Fix full version.

* Fix the precision normalisation pb.

* Fix all cov_type algo for BayesianGaussianMixture.

* Optimisation of spherical and diag computation.

* Code simplification.

* Check the Gaussian Mixture tests are ok.

* Add test.

* Add new tests for BayesianGaussianMixture and GaussianMixture.

* Add the bayesian_gaussian_example and the doc.

* Fix comments.

* Fix review comments and add license and author.

* Fix test compare covar type.

* Fix reviews.

* Fix tests.

* Fix review comments.

* Correct reviews.

* Fix travis pb.

* Fix circleci pb.

* Fix review comments.

* Fix typo.

* Fix comments.

Add reg_covar and what's new.

* Fix comments.

* Fix comments.

* [ci skip] Correct legend.
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants