Skip to content

Fix GaussianMixture UnboundLocalError #20030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 25, 2021

Conversation

tliu68
Copy link
Contributor

@tliu68 tliu68 commented May 3, 2021

Reference Issues/PRs

Fixes issue #18216

What does this implement/fix? Explain your changes.

Updates the condition on lower_bound to ensure that best_params is set even if convergence was not reached.

Any other comments?

@tliu68
Copy link
Contributor Author

tliu68 commented May 3, 2021

@jjerphan could you please take a look?

@jjerphan
Copy link
Member

jjerphan commented May 3, 2021

Hi again @tliu68

Thanks for the PR!
Could you add a regression test for #18216 failing on main but passing on this branch?

@tliu68
Copy link
Contributor Author

tliu68 commented May 5, 2021

Hi again @tliu68

Thanks for the PR!
Could you add a regression test for #18216 failing on main but passing on this branch?

I am wondering whether you think it's fine to specify the initialization parameters directly? They are actually calculated using the part of the code in GaussianMixtureIC that fits AgglomerativeClustering, runs OneHotEncoder, and then computes Gaussian parameters from onehot. But that chunk of code seems a bit too much to me to add to test_gaussian_mixture.py. What do you think would be a better way to compute/define them here?

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few suggestions @tliu68.

Actually your contribution fixes more than just the best parameters in the case of divergence for mixture.GaussianMixture: a similar problem could also occur for mixture.BayesianGaussianMixture, where their best parameters (weight_concentration_, mean_precision_, means_, degrees_of_freedom_, covariances_, precisions_cholesky_) not being set.

Regarding your previous comment, in-line definition of parameters is fine as long as you comment like you did. 👍

Comment on lines 1045 to 1048
def test_gaussian_mixture_setting_best_params():
# test that best_params is set appropriately
# even if convergence was not reached
# also a regression test for issue #18216
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would like to create another test for BayesianMixture, but this might be hard (we need to find a set of initial parameters which would cause it to diverge). You can try to see if the parameters you've in-lined cause this problem for BayesianMixture. If it doesn't, it's fine not going down the rabbit hole to find ones that do.

Suggested change
def test_gaussian_mixture_setting_best_params():
# test that best_params is set appropriately
# even if convergence was not reached
# also a regression test for issue #18216
def test_gaussian_mixture_params_setting():
"""`GaussianMixture`'s best_parameters, `n_iter_` and `lower_bound_`
must be set appropriately in the case of divergence.
Non-regression test for: https://github.com/scikit-learn/scikit-learn/issues/18216
"""

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lorentzenchr lorentzenchr linked an issue May 22, 2021 that may be closed by this pull request
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tliu68 Thanks for the fix.
Could you also replace the 2 orccurrances of np.infty by np.inf in mixture/_base.py?

@lorentzenchr
Copy link
Member

@tliu68 Could you also put an entry in whats_new?

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lorentzenchr lorentzenchr merged commit 88be3c1 into scikit-learn:main May 25, 2021
alceballosa added a commit to alceballosa/scikit-learn that referenced this pull request Jun 26, 2021
…xture Model PR#11101, focused on allowing sample-based and k-means++ based GMM initialization.. Also made the enhancements compatible with PRs scikit-learn#17937 and scikit-learn#20030.
alceballosa added a commit to alceballosa/scikit-learn that referenced this pull request Aug 3, 2021
…sian Mixture Model PR#11101, focused on allowing sample-based and k-means++ based GMM initialization.. Also made the enhancements compatible with PRs scikit-learn#17937 and scikit-learn#20030.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GaussianMixture UnboundLocalError
4 participants