[MRG+1] Adding a fit_predict method for the GMM #4593

clorenz7 · 2015-04-14T20:26:27Z

With low iterations, the prediction might not be 100% accurate due to
the final maximization step in the EM algorithm.

See issue:
#4579

ogrisel · 2015-04-14T20:36:35Z

sklearn/mixture/gmm.py

+        self.fit(X, y)
+
+        if self.responsibilities_ is None:
+            raise RuntimeError("Fitting failed, cannot predict")


It's better to use the _check_fitted_model: do a git grep _check_fitted_model to see examples in the code base.

actually this comment is no longer relevant in light of the other comments.

ogrisel · 2015-04-14T22:52:09Z

sklearn/mixture/gmm.py

+
+        responsibilities = self._fit(X, y)
+        if responsibilities is None:
+            prediction = None


How could that ever happen?

I would return directly:

return self._fit(X, y).argmax(axis=1)

and make sure that _fit can never return None (make it raise a ValueError or similar with a meaningful error message otherwise).

It could happen in the case when n_iter == 0. You're right that never returning None is better, so I added a check to run score_samples to get the correct output value in that case (that apparently happens when running an HMM). My current idea is just to output zeros because the idea of n_iter=0 seems to be to quickly initialize a model.

ogrisel · 2015-04-14T23:15:06Z

I am not sure I understand the travis failures, it probably requires to launch a debugger.

clorenz7 · 2015-04-15T17:59:38Z

@ogrisel Besides addressing your comments, I changed the GMM subclass fit method to _fit, and added some additional test cases.

ogrisel · 2015-04-15T18:14:11Z

For the travis failure, a solution would be to not implement fit_predict for DPGMM and VBGMM, it is possible to introduce a new _BaseGMM abstract base class with most of the current methods of GMM in it and then make GMM, ``DPGMMandVBGMM`. Finally only implement `fit_predict` in the `GMM` class.

git grep ABCMeta to see how we create abstract base classes in sklearn that support both Python 2 and Python 3 in the same code base.

ogrisel · 2015-04-15T18:16:09Z

Ah alright, ignore my last comment, I had an internet connection pbm and could not post it when I first wrote it. Now I see that fixed the problems with the subclasses.

ogrisel · 2015-04-15T18:17:29Z

sklearn/mixture/tests/test_gmm.py

+    component_1 = lrng.randn(n_samples, n_dim) + mu
+    X = np.vstack((component_0, component_1))
+
+    for m_constructor in (mixture.GMM, mixture.VBGMM, mixture.DPGMM):


Great! Thanks for having updated that test.

ogrisel · 2015-04-15T18:22:44Z

@eyaler does that PR meet your requirements from #4579?

LGTM, +1 for merge on my side.

@clorenz7 could please just add a new entry in the section on the new features for 0.17.dev0 in the doc/whats_new.rst file?

clorenz7 · 2015-04-15T18:37:17Z

@ogrisel Added what's new. Thanks for all your help!

amueller · 2015-04-16T15:31:52Z

sklearn/mixture/dpgmm.py

@@ -480,7 +480,7 @@ def _set_weights(self):
                                                    + self.gamma_[i, 2])
        self.weights_ /= np.sum(self.weights_)

-    def fit(self, X, y=None):
+    def _fit(self, X, y=None):


It needs to document its return value.

Done, thanks.

clorenz7 · 2015-04-26T23:23:56Z

@amueller @ogrisel Sorry for the delay in my response. I hope this looks better now.

amueller · 2015-04-27T20:04:44Z

sklearn/mixture/gmm.py

+
+        Returns
+        -------
+        C : array, shape = (n_samples,)


component_membership? Or a docstring?

Good point, I added that commentary, thanks.

amueller · 2015-04-27T21:33:33Z

LGTM apart from my minor comments.

With low iterations, the prediction might not be 100% accurate due to the final maximization step in the EM algorithm.

amueller · 2015-04-30T18:52:22Z

Let's merge when travis is happy.

[MRG+1] Adding a fit_predict method for the GMM

amueller · 2015-04-30T19:29:38Z

thanks.

xuewei4d · 2015-06-10T19:48:12Z

sklearn/mixture/gmm.py

-        argument init_params to the empty string '' when creating the
-        GMM object. Likewise, if you would like just to do an
-        initialization, set n_iter=0.
+    def _fit(self, X, y=None, do_prediction=False):


I don't understand why there is an additional parameter do_prediction=False. @clorenz7

clorenz7 mentioned this pull request Apr 14, 2015

add fit_predict to mixture.GMM #4579

Closed

ogrisel reviewed Apr 14, 2015
View reviewed changes

clorenz7 force-pushed the gmm_fit_predict branch from 9349c48 to 18d4df4 Compare April 14, 2015 21:14

ogrisel reviewed Apr 14, 2015
View reviewed changes

clorenz7 force-pushed the gmm_fit_predict branch from 18d4df4 to 986defe Compare April 15, 2015 17:27

ogrisel reviewed Apr 15, 2015
View reviewed changes

ogrisel changed the title ~~Adding a fit_predict method for the GMM~~ [MRG+1] Adding a fit_predict method for the GMM Apr 15, 2015

clorenz7 force-pushed the gmm_fit_predict branch from 986defe to 234bd07 Compare April 15, 2015 18:36

clorenz7 force-pushed the gmm_fit_predict branch from 234bd07 to ff4d8d2 Compare April 15, 2015 18:44

amueller reviewed Apr 16, 2015
View reviewed changes

clorenz7 force-pushed the gmm_fit_predict branch 2 times, most recently from de68a27 to 96bede3 Compare April 26, 2015 23:22

amueller reviewed Apr 27, 2015
View reviewed changes

Add a fit_predict method for the GMM classes

fc87eb3

With low iterations, the prediction might not be 100% accurate due to the final maximization step in the EM algorithm.

clorenz7 force-pushed the gmm_fit_predict branch from 96bede3 to fc87eb3 Compare April 30, 2015 18:49

amueller added a commit that referenced this pull request Apr 30, 2015

Merge pull request #4593 from clorenz7/gmm_fit_predict

e1fd955

[MRG+1] Adding a fit_predict method for the GMM

amueller merged commit e1fd955 into scikit-learn:master Apr 30, 2015

xuewei4d reviewed Jun 10, 2015
View reviewed changes

xuewei4d mentioned this pull request Jun 17, 2015

[RFC] GSoC2015 Improve GMM API #4802

Closed

16 tasks

Uh oh!

[MRG+1] Adding a fit_predict method for the GMM #4593

[MRG+1] Adding a fit_predict method for the GMM #4593

Uh oh!

Conversation

clorenz7 commented Apr 14, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 14, 2015

Uh oh!

clorenz7 commented Apr 15, 2015

Uh oh!

ogrisel commented Apr 15, 2015

Uh oh!

ogrisel commented Apr 15, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 15, 2015

Uh oh!

clorenz7 commented Apr 15, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clorenz7 commented Apr 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Apr 27, 2015

Uh oh!

amueller commented Apr 30, 2015

Uh oh!

amueller commented Apr 30, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!