Closed as not planned
Closed as not planned
Description
* Workflow:
1- Consider make_scorer()
below for a clustering metric:
from sklearn.metrics import homogeneity_score, make_scorer
def score_func(y_true, y_pred, **kwargs):
return homogeneity_score(y_true, y_pred)
scorer = make_scorer(score_func)
2- Consider the simple method optics():
# "optics" algorithm for clustering
# ---
def optics(data, labels):
# data: A dataframe with two columns (x, y)
preds = None
base_opt = OPTICS()
grid_search_params = {"min_samples":np.arange(10),
"metric":["cityblock", "cosine", "euclidean", "l1", "l2", "manhattan"],
"cluster_method":["xi", "dbscan"],
"algorithm":["auto", "ball_tree", "kd_tree", "brute"]}
grid_search_cv = GridSearchCV(estimator=base_opt,
param_grid=grid_search_params,
scoring=scorer)
grid_search_cv.fit(data)
opt = grid_search_cv.best_estimator_
opt.fit(data)
preds = opt.labels_
# return clusters corresponding to (x, y) pairs according to "optics" algorithm
return preds
Running the optics()
led to this error:
TypeError: _score() missing 1 required positional argument: 'y_true'
Even by using grid_search_cv.fit(data, labels)
instead of grid_search_cv.fit(data)
, another exception rised:
AttributeError: 'OPTICS' object has no attribute 'predict'
I thinks we cannot use make_scorer()
with a GridSearchCV
for a clustering task.
* Proposed solution:
The fit()
method of GridSearchCV
automatically handles the type of the estimator which passed to its constructor, for example, for a clustering estimator it considers labels_
instead of predict()
for scoring.