[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651

tguillemot · 2016-04-12T10:07:58Z

This PR is the second part of the GSoC integration. It is directly based on the work of the #6407.
Here I propose to integrate the Bayesian Gaussian Mixture :

This PR is based on #6407, it will be better to analyse only the files that refer to the BayesianGaussianMixture class.

tguillemot · 2016-04-12T10:11:52Z

sklearn/mixture/bayesian_mixture.py

+        # XXX @xuewei4d I think you forgot n_component in your code ?
+        temp1 = (.5 * np.sum(temp1) +
+                 self.n_components * self._log_gaussian_norm_prior)
+


@xuewei4d I think you forgot to multiply the log_gaussian_norm by n_components. Could you confirm it for the 4 functions please ?

I checked it. I didn't forget it in Line791 in my PR

ogrisel · 2016-04-22T09:26:35Z

@tguillemot could you please rebase/squash on top of the current master to take the recent changes from #6666 into account in this PR?

ogrisel · 2016-05-25T14:28:20Z

@tguillemot This PR needs to be updated to take the precision-based parametrization into account:

ImportError: cannot import name _check_covariance_matrix

tguillemot · 2016-05-25T14:49:27Z

@ogrisel I push the last commit I've done but I'm working on another PR for the moment.
There are some bugs I've to investigate.

tguillemot · 2016-07-17T03:24:25Z

I've solved the problem with VBGMM. I've to do some cleaning but I think I will be good to merge next week.
@xuewei4d Can you send me the latex of your GSOC pdf of the bishop formulas place ? I will have to add these formula on sklearn. Thanks in advance.

xuewei4d · 2016-07-17T20:56:45Z

@tguillemot Sure. Can I have your email address?

ngoix · 2016-07-26T23:02:51Z

Is is expected that when increasing n_components, the number of components found (with non-negligible weights) can decrease, even with a large n_init? (alpha_init was fixed to 0.1)
It is due to the initialization step, right?

tguillemot · 2016-07-27T08:19:38Z

@xuewei4d Thanks for the formula.

@ngoix The version current version of this PR is not working well and have a lot a problem. So I suspect that it is a problem of that.
Hopefully, BayesianGaussianMixture works perfectly now (after a lot of corrections).
I will push everything in a few moment.
I will try on my side but once I pushed, can you confirm that ?

tguillemot · 2016-07-28T15:27:20Z

@ngoix The code of the BayesianGaussianMixture is corrected now.
I need to add some tests, examples and docs until MRG.

xuewei4d · 2016-07-28T18:52:49Z

@tguillemot Can I have the updated formula pdf?

ngoix · 2016-07-28T23:54:03Z

It can be due to my data, but now the number of components found is always maximal (even with n_components = 100). The algorithm does not compute bic/aic scores, right?

ngoix · 2016-07-29T00:09:24Z

whoops, it does not always find the maximal number of components sorry.
However, even when the number of components found is lower than n_components, it can still varies when increasing n_components. (even with alpha_init fixed)
How much does the number of components found depends on n_init?

tguillemot · 2016-07-29T07:38:09Z

@xuewei4d I haven't corrected the latex formula. I will put everything on scikit once it will be done.

@ngoix This method is an EM and converge to a local minimum. If the init is not good, you will never reach the global minimum.
As the init is actually done with kmeans, if the kmeans is more or less the same at each iteration of n_iter, the solution will be the same every time.
I have put another init option called 'test' (I will remove it in a few moment), maybe if you use that you will have different results.
Can you do a notebook to show me your data and results ?

tguillemot · 2016-08-03T09:33:15Z

@ogrisel @amueller @agramfort

tguillemot · 2016-08-03T09:40:22Z

@agramfort @amueller @ogrisel BayesianGaussianMixture is mergeable.
Nevertheless the review will be easier when #7123 and #7124 will be merged.

agramfort · 2016-08-04T08:05:24Z

sklearn/mixture/base.py

        random_state = check_random_state(self.random_state)

        if self.init_params == 'kmeans':
            resp = np.zeros((n_samples, self.n_components))
            label = cluster.KMeans(n_clusters=self.n_components, n_init=1,
-                                   random_state=random_state).fit(X).labels_
+                                   random_state=0).fit(X).labels_


0 -> random_state

ogrisel · 2016-08-29T14:10:28Z

sklearn/mixture/base.py

@@ -248,12 +247,14 @@ def _e_step(self, X):
        Returns
        -------
        log_prob_norm : array, shape (n_samples,)
-            log p(X)
+            Logarithm of the probability of X.


Logarithm of the probability of each sample in X.

tguillemot · 2016-08-30T16:00:25Z

@ogrisel Do you prefer to wait that #7284 is ready to merge before merging this PR ?

ogrisel · 2016-08-30T17:02:24Z

Let's merge now. Thanks for all your efforts @tguillemot!

ogrisel · 2016-08-30T17:03:18Z

And also thank you again @xuewei4d for the initial code refactoring and maths derivations.

xuewei4d · 2016-08-30T22:28:27Z

Thanks @tguillemot !
I will take a look at the math part, once I have time, @ogrisel

jnothman · 2016-08-31T00:11:33Z

Hurrah!!

On 31 August 2016 at 08:28, Wei Xue notifications@github.com wrote:

Thanks @tguillemot https://github.com/tguillemot !
I will take a look at the math part, once I have time, @ogrisel
https://github.com/ogrisel

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6651 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz62GC5M8UAeBaJJtEx4BPHOz8KCiRks5qlK6NgaJpZM4IFO_l
.

tguillemot · 2016-08-31T08:16:37Z

Hurrah !!!!!!!! Thanks everyone !!!

raghavrv · 2016-08-31T13:58:10Z

yay! 🍻

amueller · 2016-08-31T16:06:47Z

awesome :) Thanks everyone!

amueller · 2016-09-07T20:54:32Z

doc/whats_new.rst

+     :class:`BayesianGaussianMixture`. The new class solves the computational
+     problems of the old class and computes the Variational Bayesian Gaussian
+     mixture faster than before.
+     Ref :ref:`b` for more information.


@tguillemot what's b supposed to reference? It's a dead link.

This will be fix with #7295.

…step) (scikit-learn#6651) * Add the new BayesianGaussianMixture class. Add the test file for the BayesianGaussianMixture. * Add the use of the cholesky decomposition of the precision matrix. * Fix some bugs. * Modification of GaussianMixture class. The purpose here is to prepare the integration of BayesianGaussianMixture. * Fix comments. * Modification of the Docstring. * Add license and author. * Fix pb typo of eq 10.64 and 10.62. * Correct VBGMM bugs. * Fix full version. * Fix the precision normalisation pb. * Fix all cov_type algo for BayesianGaussianMixture. * Optimisation of spherical and diag computation. * Code simplification. * Check the Gaussian Mixture tests are ok. * Add test. * Add new tests for BayesianGaussianMixture and GaussianMixture. * Add the bayesian_gaussian_example and the doc. * Fix comments. * Fix review comments and add license and author. * Fix test compare covar type. * Fix reviews. * Fix tests. * Fix review comments. * Correct reviews. * Fix travis pb. * Fix circleci pb. * Fix review comments. * Fix typo. * Fix comments. Add reg_covar and what's new. * Fix comments. * Fix comments. * [ci skip] Correct legend.

Previously modified with PR scikit-learn#6651

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR #6651 * Change tag name Old refers to new tag added with PR #7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

…step) (scikit-learn#6651) * Add the new BayesianGaussianMixture class. Add the test file for the BayesianGaussianMixture. * Add the use of the cholesky decomposition of the precision matrix. * Fix some bugs. * Modification of GaussianMixture class. The purpose here is to prepare the integration of BayesianGaussianMixture. * Fix comments. * Modification of the Docstring. * Add license and author. * Fix pb typo of eq 10.64 and 10.62. * Correct VBGMM bugs. * Fix full version. * Fix the precision normalisation pb. * Fix all cov_type algo for BayesianGaussianMixture. * Optimisation of spherical and diag computation. * Code simplification. * Check the Gaussian Mixture tests are ok. * Add test. * Add new tests for BayesianGaussianMixture and GaussianMixture. * Add the bayesian_gaussian_example and the doc. * Fix comments. * Fix review comments and add license and author. * Fix test compare covar type. * Fix reviews. * Fix tests. * Fix review comments. * Correct reviews. * Fix travis pb. * Fix circleci pb. * Fix review comments. * Fix typo. * Fix comments. Add reg_covar and what's new. * Fix comments. * Fix comments. * [ci skip] Correct legend.

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

tguillemot reviewed Apr 12, 2016
View reviewed changes

amueller mentioned this pull request Apr 21, 2016

Scaling features using MinMaxScaler makes DPGMM always have one cluster #6694

Closed

tguillemot force-pushed the GSoC-BayesianMixture branch 2 times, most recently from 352b51a to 427650b Compare April 22, 2016 10:49

tguillemot force-pushed the GSoC-BayesianMixture branch from 427650b to 8710b60 Compare May 19, 2016 09:11

tguillemot mentioned this pull request May 25, 2016

[MRG+1] Modification of the GaussianMixture class. #6824

Merged

tguillemot force-pushed the GSoC-BayesianMixture branch from 2d93f16 to 8b205df Compare June 5, 2016 07:55

tguillemot force-pushed the GSoC-BayesianMixture branch 4 times, most recently from d486d12 to 65e3400 Compare July 28, 2016 15:22

tguillemot mentioned this pull request Aug 1, 2016

[MRG+2] Modification of GaussianMixture class. #7123

Merged

tguillemot changed the title ~~[WIP] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step)~~ [MRG] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) Aug 3, 2016

tguillemot mentioned this pull request Aug 3, 2016

naming of parameters in BayesianMixture #7129

Closed

agramfort reviewed Aug 4, 2016
View reviewed changes

ogrisel reviewed Aug 29, 2016
View reviewed changes

TomDLT added this to the 0.18 milestone Aug 29, 2016

tguillemot added 2 commits August 29, 2016 16:33

Fix comments.

a5dcd7f

[ci skip] Correct legend.

9c7ca50

tguillemot force-pushed the GSoC-BayesianMixture branch from 4d5de40 to 9c7ca50 Compare August 30, 2016 10:13

tguillemot mentioned this pull request Aug 30, 2016

[MRG+2] GSoC Final : Dirichlet Gaussian Mixture #7295

Closed

ogrisel merged commit f0862f7 into scikit-learn:master Aug 30, 2016

amueller reviewed Sep 7, 2016
View reviewed changes

balakmran pushed a commit to balakmran/scikit-learn that referenced this pull request Jul 20, 2017

Change label vbgmm to bgmm

e697f7a

Previously modified with PR scikit-learn#6651

balakmran mentioned this pull request Jul 20, 2017

Fix remaining sphinx errors #9417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651

tguillemot commented Apr 12, 2016 •

edited

Loading

tguillemot Apr 12, 2016

xuewei4d Apr 12, 2016

ogrisel commented Apr 22, 2016

ogrisel commented May 25, 2016

tguillemot commented May 25, 2016

tguillemot commented Jul 17, 2016

xuewei4d commented Jul 17, 2016

ngoix commented Jul 26, 2016

tguillemot commented Jul 27, 2016

tguillemot commented Jul 28, 2016

xuewei4d commented Jul 28, 2016

ngoix commented Jul 28, 2016

ngoix commented Jul 29, 2016

tguillemot commented Jul 29, 2016

tguillemot commented Aug 3, 2016

tguillemot commented Aug 3, 2016

agramfort Aug 4, 2016

ogrisel Aug 29, 2016

tguillemot commented Aug 30, 2016

ogrisel commented Aug 30, 2016

ogrisel commented Aug 30, 2016

xuewei4d commented Aug 30, 2016

jnothman commented Aug 31, 2016

tguillemot commented Aug 31, 2016

raghavrv commented Aug 31, 2016

amueller commented Aug 31, 2016

amueller Sep 7, 2016

tguillemot Sep 7, 2016

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651

[MRG+1] Bayesian Gaussian Mixture (Integration of GSoC2015 -- second step) #6651

Conversation

tguillemot commented Apr 12, 2016 • edited Loading

tguillemot Apr 12, 2016

Choose a reason for hiding this comment

xuewei4d Apr 12, 2016

Choose a reason for hiding this comment

ogrisel commented Apr 22, 2016

ogrisel commented May 25, 2016

tguillemot commented May 25, 2016

tguillemot commented Jul 17, 2016

xuewei4d commented Jul 17, 2016

ngoix commented Jul 26, 2016

tguillemot commented Jul 27, 2016

tguillemot commented Jul 28, 2016

xuewei4d commented Jul 28, 2016

ngoix commented Jul 28, 2016

ngoix commented Jul 29, 2016

tguillemot commented Jul 29, 2016

tguillemot commented Aug 3, 2016

tguillemot commented Aug 3, 2016

agramfort Aug 4, 2016

Choose a reason for hiding this comment

ogrisel Aug 29, 2016

Choose a reason for hiding this comment

tguillemot commented Aug 30, 2016

ogrisel commented Aug 30, 2016

ogrisel commented Aug 30, 2016

xuewei4d commented Aug 30, 2016

jnothman commented Aug 31, 2016

tguillemot commented Aug 31, 2016

raghavrv commented Aug 31, 2016

amueller commented Aug 31, 2016

amueller Sep 7, 2016

Choose a reason for hiding this comment

tguillemot Sep 7, 2016

Choose a reason for hiding this comment

tguillemot commented Apr 12, 2016 •

edited

Loading