Skip to content

Using make_scorer() for a GridSearchCV scoring parameter in a clustering task #17631

Closed as not planned
@imanirajian

Description

@imanirajian

* Workflow:

1- Consider make_scorer() below for a clustering metric:

from sklearn.metrics import homogeneity_score, make_scorer

def score_func(y_true, y_pred, **kwargs):
    return homogeneity_score(y_true, y_pred)
scorer = make_scorer(score_func)

2- Consider the simple method optics():

# "optics" algorithm for clustering
# ---
def optics(data, labels):
    # data: A dataframe with two columns (x, y)
    preds = None    
    base_opt = OPTICS()
    grid_search_params = {"min_samples":np.arange(10),
                          "metric":["cityblock", "cosine", "euclidean", "l1", "l2", "manhattan"],
                          "cluster_method":["xi", "dbscan"],
                          "algorithm":["auto", "ball_tree", "kd_tree", "brute"]}
    
    grid_search_cv = GridSearchCV(estimator=base_opt,
                                  param_grid=grid_search_params,
                                  scoring=scorer)
    
    grid_search_cv.fit(data)    
    opt = grid_search_cv.best_estimator_
    opt.fit(data)
    preds = opt.labels_
    
    # return clusters corresponding to (x, y) pairs according to "optics" algorithm
    return preds

Running the optics() led to this error:
TypeError: _score() missing 1 required positional argument: 'y_true'

Even by using grid_search_cv.fit(data, labels) instead of grid_search_cv.fit(data), another exception rised:
AttributeError: 'OPTICS' object has no attribute 'predict'


I thinks we cannot use make_scorer() with a GridSearchCV for a clustering task.


* Proposed solution:

The fit() method of GridSearchCV automatically handles the type of the estimator which passed to its constructor, for example, for a clustering estimator it considers labels_ instead of predict() for scoring.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions