Skip to content

WIP: First draft of a random search in GridSearchCV #455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 22 additions & 6 deletions sklearn/grid_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import time

import numpy as np
import random
import scipy.sparse as sp

from .externals.joblib import Parallel, delayed, logger
Expand Down Expand Up @@ -146,6 +147,11 @@ class GridSearchCV(BaseEstimator):
Dictionary with parameters names (string) as keys and lists of
parameter settings to try as values.

budget: int, optional
If set, a maximum limit on the number of points in the grid
to be evaluated. If set, the grid is explored randomly rather
than in any deterministic order.

loss_func: callable, optional
function that takes 2 arguments and compares them in
order to evaluate the performance of prediciton (small is good)
Expand Down Expand Up @@ -204,7 +210,7 @@ class GridSearchCV(BaseEstimator):
>>> clf = grid_search.GridSearchCV(svr, parameters)
>>> clf.fit(iris.data, iris.target)
... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
GridSearchCV(cv=None,
GridSearchCV(budget=None, cv=None,
estimator=SVR(C=1.0, cache_size=..., coef0=..., degree=...,
epsilon=..., gamma=..., kernel='rbf', probability=False,
shrinking=True, tol=...),
Expand Down Expand Up @@ -238,9 +244,9 @@ class GridSearchCV(BaseEstimator):

"""

def __init__(self, estimator, param_grid, loss_func=None, score_func=None,
fit_params=None, n_jobs=1, iid=True, refit=True, cv=None,
verbose=0, pre_dispatch='2*n_jobs',
def __init__(self, estimator, param_grid, budget=None, loss_func=None,
score_func=None, fit_params=None, n_jobs=1, iid=True,
refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs',
):
assert hasattr(estimator, 'fit') and (hasattr(estimator, 'predict')
or hasattr(estimator, 'score')), (
Expand All @@ -255,6 +261,13 @@ def __init__(self, estimator, param_grid, loss_func=None, score_func=None,

self.estimator = estimator
self.param_grid = param_grid
self.budget = budget
if self.budget:
self.rolled_out_grid = list(IterGrid(param_grid))
random.shuffle(self.rolled_out_grid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to have reproducible results, we should instead use a random_state (added and initialized in the constructor), and then call shuffle on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but in the following way:

from random import Random
from sklearn.utils import check_random_state
...

    # in __init__
    self.random_state = random_state

...

    # then in fit
    self.random_state = check_random_state(self.random_state)

    if self.budget:
        py_random_state = Random(self.random_random.rand())
        self.rolled_out_grid = list(IterGrid(param_grid))
        py_random_state.shuffle(self.rolled_out_grid)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of py_random_state? Not to change the estimator during fit? I would do
random_state = check_random_state(self.random_state) and then use random_state later. Or doesn't check_random_state make a copy? Maybe it should.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because AFAIK the numpy rng cannot shuffle inplace a python list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that was the subtlety I knew I overlooked, thanks :)

self.rolled_out_grid = self.rolled_out_grid[:self.budget]
else:
self.rolled_out_grid = None
self.loss_func = loss_func
self.score_func = score_func
self.n_jobs = n_jobs
Expand Down Expand Up @@ -298,8 +311,11 @@ def fit(self, X, y=None, **params):
% (len(y), n_samples))
y = np.asarray(y)
cv = check_cv(cv, X, y, classifier=is_classifier(estimator))

grid = IterGrid(self.param_grid)

if self.budget:
grid = self.rolled_out_grid
else:
grid = IterGrid(self.param_grid)
base_clf = clone(self.estimator)
pre_dispatch = self.pre_dispatch
out = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
Expand Down