Closed
Description
Describe the bug
The formula is wrong:
return -2 * self.score(X) * X.shape[0] + self._n_parameters() * np.log(X.shape[0])
Proposed fix:
Replace formula with return self._n_parameters() * np.log(X.shape[0]) -2 * self.score(X)
Steps/Code to Reproduce
# imports
import numpy as np
from sklearn.datasets import load_iris
import pandas as pd
from sklearn.mixture import GaussianMixture as GM
clusters = [i for i in range(2,8)]
models = []
random_state = 42
n_init = 10
# train one model per cluster size
cols = ["petal width (cm)","petal length (cm)"]
models = [GM(c, n_init=n_init, init_params="random", random_state=random_state).\
fit(X[cols]) for c in clusters]
# get best model from BIC
bic = [model.bic(X[cols]) for model in models]
metrics = pd.DataFrame({"clusters":clusters,"bic":bic})
metrics["best"] = metrics["bic"] == min(metrics["bic"])
metrics
The above will output BIC using the inbuilt methods model.bic() as follows:
clusters | bic | best |
---|---|---|
2 | 364.579644 | False |
3 | 353.501255 | False |
4 | 366.865550 | False |
5 | 189.449838 | True |
6 | 212.580815 | False |
7 | 400.169872 | False |
Using the correct BIC formula (see e.g. https://en.wikipedia.org/wiki/Bayesian_information_criterion)
bics = [model._n_parameters() * np.log(X.shape[0]) -2*model.score(X[cols]) for k,model in enumerate(models)]
bics
leads to the following BIC's
[57.180072607292026,
86.96960303077948,
116.92208468419805,
145.6026996264713,
175.62029248920572,
206.73427255638083]
Expected Results
[57.180072607292026,
86.96960303077948,
116.92208468419805,
145.6026996264713,
175.62029248920572,
206.73427255638083]
Actual Results
[364.579644,
353.501255,
366.865550,
189.449838,
212.580815,
400.169872]
Versions
System:
python: 3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0]
executable: /usr/bin/python3
machine: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
Python dependencies:
pip: 21.1.3
setuptools: 57.4.0
sklearn: 1.0.2
numpy: 1.21.6
scipy: 1.4.1
Cython: 0.29.30
pandas: 1.3.5
matplotlib: 3.2.2
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
Metadata
Metadata
Assignees
Labels
No labels