-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] GSoC Final : Dirichlet Gaussian Mixture #7295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
The examples above compare Gaussian mixtures models with fixed number of | ||
components, to the Dirichlet Gaussian Mixtures models. **On the left** the GMM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GMM -> Gaussian mixtures
@TomDLT Thanks for this first round of review. |
I am in favor of keeping the "sin" example (adapted to use the new class). While I agree that it is a weird and artificial dataset, I also appreciate the facts that:
|
This class doesn't require the user to choose the number of | ||
components, and at the expense of extra computational time the user | ||
only needs to specify a loose upper bound on this number and a | ||
concentration parameter. | ||
|
||
.. |plot_gmm| image:: ../auto_examples/mixture/images/sphx_glr_plot_gmm_001.png | ||
:target: ../auto_examples/mixture/plot_gmm.html | ||
:scale: 48% | ||
:scale: 31% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the rendered page is not as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering what will be the result and it's ugly :).
I'll change that.
This seems pretty clean to me |
bfedbe4
to
6ab1324
Compare
@agramfort @ogrisel Do you think we can merge that for 0.18 ??? |
The BIC criterion can be used to select the number of components in a Gaussian | ||
Mixture in an efficient way. In theory, it recovers the true number of | ||
components only in the asymptotic regime (i.e. if much data is available). Note | ||
that using a :ref:`DirichletGaussianMixture <dpgmm>` avoids the specification of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not a class ref, could we have this as Dirichlet Gaussian Mixture
?
model with the Dirichlet Process. In practice the approximate the Dirichlet | ||
Process inference algorithm uses a truncated distribution with a fixed | ||
maximum number of components (called the Stick-breaking representation), | ||
but almost always the number of components actually used depends on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cut the sentence:
representation). The number of components actually used almost always depends on the data.
CI is broken because of a broken import and a PEP8 issue. Also could you please try to make the following test run in less than 1s by tweaking the training data size or the hyper parameters of the model?
|
8f0544d
to
1b45753
Compare
Thanks @ogrisel. |
:class:`GaussianMixture` and :class:`BayesianGaussianMixture` to fit a | ||
sine wave. | ||
|
||
* See :ref:`sphx_glr_auto_example_mixture_plot_concentration_prior.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reference is broken.
I have a problem of the a ghost file in travis. @ogrisel can you remove the cache ? |
1b45753
to
e9be320
Compare
@@ -111,7 +118,14 @@ class BayesianGaussianMixture(BaseMixture): | |||
'kmeans' : responsibilities are initialized using kmeans. | |||
'random' : responsibilities are initialized randomly. | |||
|
|||
dirichlet_concentration_prior : float | None, optional. | |||
weight_concentration_prior_type : {'dirichlet_process', | |||
'dirichlet_distribution'}, defaults to 'full'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two problems:
- This new line breaks the sphinx rendering in classes.rst. I am not sure how this should be fixed.
- The default value is
'dirichlet_process'
, not'full'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to fix the first problem because if I put it on a unique line pep8 will not be happy.
I think the red travis builds were caused by old cached versions of deleted python modules and test files. I manually deleted the cache for this PR in travis and relaunched the build to check if that fixes it. |
pyflakes has caught an unused variable:
|
e9be320
to
15afbec
Compare
process prior, however, show that the model can either learn a global structure | ||
for the data (small ``weight_concentration_prior``) or easily interpolate to | ||
finding relevant local structure (large ``weight_concentration_prior``), never | ||
falling into the problems shown by the ``GaussianMixture`` class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this analysis. Let me suggest the following instead:
This example demonstrates the behavior of Gaussian mixture models on data that was not generated by a mixture of Gaussian random variables. The dataset is formed by 100 points loosely spaced following a noisy sine curve. There is therefore no ground truth value for the number of Gaussian components.
The first model is a classical Gaussian Mixture Model with 10 components fit with the Expectation Maximization algorithm.
The second model is a Bayesian Gaussian Mixture Model with a Dirichlet process prior fit with variational inference. The low value of the concentration prior makes the model favor a lower number of active components. This models "decides" to focus its modeling power on the big picture of the structure of the datasets: groups of points with alternating directions modeled by non-spherical covariance matrices. Those alternating directions roughly capture the alternating nature of the original sine signal.
The third model is also Bayesian Gaussian Mixture Model with a Dirichlet process prior but this time the value of the concentration prior is higher giving the model more liberty to try to model the finer-grained structure of the data. The result is a mixture with a larger number of active components that is similar to the first model where we decided to fix the number of components to 10 arbitrarily.
Which model is the best is a matter of subjective judgement: do we want to favor models that only capture the big picture to summarize and explain most of the structure of the data while ignoring the details or do we prefer models that closely follow the high density regions of the signal?
The last two panels show how we can sample from the last two models. The resulting samples distributions do not look exactly like the original data distribution. The difference primarily stems from the approximation error we made by using a model that assumes that the data was generated by a finite number of Gaussian components instead of a continuous noisy sine curve.
This time the travis error is for real I think: https://travis-ci.org/scikit-learn/scikit-learn/jobs/158774349#L2336 |
I think I am done with the review. +1 for merge once CI is green and my last comment on |
I fixed the CI failure and addressed the doc of the example in #7386. If it's green, I will merge. |
Merged as #7386 🍻 |
@ogrisel Sorry I had to go yesterday. Thanks for taking care of that. |
Thanks everyone for your review and helps !!! |
This closes #7377, closes #7115, closes #2473, closes #2454, closes #1764 and closes #1637.
This is the last PR to remove completely the old GMM classes.
Here, you'll find the
DirichletGaussianMixture
class with the doc, examples and tests.It will be easier to review when #6651 will be merged (and also there will be no conflits).I have removed the exampleplot_gmm_sin.py
because it's wasn't showing the properties of DPGMM correctly for me (it modifies thecovariance_type
through the experiment to obtain better results).Instead of that I prefer introduce a similar exemple as the one I have introduce for the

BayesianGaussianMixture
.