Skip to content

GridSearchCV cannot be paralleled when custom scoring is used #10054

Closed
@shuuchen

Description

@shuuchen

Hi,

I met a problem with the code:

    from sklearn.model_selection import GridSearchCV
    model = ensemble.RandomForestRegressor()
    param = {'n_estimators': [500, 700, 1200],
             'max_depth': [3, 5, 7],
             'max_features': ['auto'],
             'n_jobs': [-1],
             'criterion': ['mae', 'mse'],
             'random_state': [300],
             }
    from sklearn.metrics import make_scorer
    def my_custom_loss_func(ground_truth, predictions):
        diff = np.abs(ground_truth - predictions) / ground_truth
        return np.mean(diff)
    loss = make_scorer(my_custom_loss_func, greater_is_better=False)
    model_cv = GridSearchCV(model, param, cv=5, n_jobs=2, scoring=loss, verbose=1)
    model_cv.fit(X, y.ravel())

in which I used custom scoring object in GridSearchCV(...) and set n_jobs = 2.

I got the following error message:

C:\Anaconda3\python.exe C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py
Fitting 5 folds for each of 18 candidates, totalling 90 fits
Traceback (most recent call last):
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 172, in <module>
    models, scas = learn_all(X_train, y_train)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 108, in learn_all
    models[machine], scas[machine] = learn_cv(X, y)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 87, in learn_cv
    model_cv.fit(X, y.ravel())
  File "C:\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 638, in fit
    cv.split(X, y, groups)))
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 789, in __call__
    self.retrieve()
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 608, in get
    raise self._value
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 385, in _handle_tasks
    put(task)
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\pool.py", line 371, in send
    CustomizablePickler(buffer, self._reducers).dump(obj)
AttributeError: Can't pickle local object 'learn_cv.<locals>.my_custom_loss_func'

Process finished with exit code 1

It seems that if and only if n_jobs is set to 1 can the program be run.

Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions