Skip to content

[MRG+2] ENH/MNT results_ --> cv_results; test_mean_score --> mean_test_score et al. #7324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 7, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,12 @@

# TASK: print the mean and std for each candidate along with the parameter
# settings for all the candidates explored by grid search.
n_candidates = len(grid_search.results_['params'])
n_candidates = len(grid_search.cv_results_['params'])
for i in range(n_candidates):
print(i, 'params - %s; mean - %0.2f; std - %0.2f'
% (grid_search.results_['params'][i],
grid_search.results_['test_mean_score'][i],
grid_search.results_['test_std_score'][i]))
% (grid_search.cv_results_['params'][i],
grid_search.cv_results_['mean_test_score'][i],
grid_search.cv_results_['std_test_score'][i]))

# TASK: Predict the outcome on the testing set and store it in a variable
# named y_predicted
Expand Down
4 changes: 2 additions & 2 deletions doc/tutorial/text_analytics/working_with_text_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -458,9 +458,9 @@ mean score and the parameters setting corresponding to that score::
tfidf__use_idf: True
vect__ngram_range: (1, 1)

A more detailed summary of the search is available at ``gs_clf.results_``.
A more detailed summary of the search is available at ``gs_clf.cv_results_``.

The ``results_`` parameter can be easily imported into pandas as a
The ``cv_results_`` parameter can be easily imported into pandas as a
``DataFrame`` for further inspection.

.. note:
Expand Down
20 changes: 10 additions & 10 deletions doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,27 +39,27 @@ Model Selection Enhancements and API Changes
:class:`model_selection.GridSearchCV` and
:class:`model_selection.RandomizedSearchCV` utilities.

- **The enhanced `results_` attribute**
- **The enhanced ``cv_results_`` attribute**

The new ``results_`` attribute (of :class:`model_selection.GridSearchCV`
The new ``cv_results_`` attribute (of :class:`model_selection.GridSearchCV`
and :class:`model_selection.RandomizedSearchCV`) introduced in lieu of the
``grid_scores_`` attribute is a dict of 1D arrays with elements in each
array corresponding to the parameter settings (i.e. search candidates).

The ``results_`` dict can be easily imported into ``pandas`` as a
The ``cv_results_`` dict can be easily imported into ``pandas`` as a
``DataFrame`` for exploring the search results.

The ``results_`` arrays include scores for each cross-validation split
(with keys such as ``test_split0_score``), as well as their mean
(``test_mean_score``) and standard deviation (``test_std_score``).
The ``cv_results_`` arrays include scores for each cross-validation split
(with keys such as ``'split0_test_score'``), as well as their mean
(``'mean_test_score'``) and standard deviation (``'std_test_score'``).

The ranks for the search candidates (based on their mean
cross-validation score) is available at ``results_['test_rank_score']``.
cross-validation score) is available at ``cv_results_['rank_test_score']``.

The parameter values for each parameter is stored separately as numpy
masked object arrays. The value, for that search candidate, is masked if
the corresponding parameter is not applicable. Additionally a list of all
the parameter dicts are stored at ``results_['params']``.
the parameter dicts are stored at ``cv_results_['params']``.

- **Parameters ``n_folds`` and ``n_iter`` renamed to ``n_splits``**

Expand Down Expand Up @@ -235,7 +235,7 @@ Enhancements
- The :func: `ignore_warnings` now accept a category argument to ignore only
the warnings of a specified type. By `Thierry Guillemot`_.

- The new ``results_`` attribute of :class:`model_selection.GridSearchCV`
- The new ``cv_results_`` attribute of :class:`model_selection.GridSearchCV`
(and :class:`model_selection.RandomizedSearchCV`) can be easily imported
into pandas as a ``DataFrame``. Ref :ref:`model_selection_changes` for
more information.
Expand Down Expand Up @@ -419,7 +419,7 @@ API changes summary

- The ``grid_scores_`` attribute of :class:`model_selection.GridSearchCV`
and :class:`model_selection.RandomizedSearchCV` is deprecated in favor of
the attribute ``results_``.
the attribute ``cv_results_``.
Ref :ref:`model_selection_changes` for more information.
(`#6697 <https://github.com/scikit-learn/scikit-learn/pull/6697>`_) by
`Raghav R V`_.
Expand Down
8 changes: 4 additions & 4 deletions examples/model_selection/grid_search_digits.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@
print()
print("Grid scores on development set:")
print()
means = clf.results_['test_mean_score']
stds = clf.results_['test_std_score']
for i in range(len(clf.results_['params'])):
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this chance to address an old review ;)

print("%0.3f (+/-%0.03f) for %r"
% (means[i], stds[i] * 2, clf.results_['params'][i]))
% (mean, std * 2, params))
print()

print("Detailed classification report:")
Expand Down
12 changes: 6 additions & 6 deletions examples/model_selection/randomized_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,12 @@
# Utility function to report best scores
def report(results, n_top=3):
for i in range(1, n_top + 1):
candidates = np.flatnonzero(results['test_rank_score'] == i)
candidates = np.flatnonzero(results['rank_test_score'] == i)
for candidate in candidates:
print("Model with rank: {0}".format(i))
print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
results['test_mean_score'][candidate],
results['test_std_score'][candidate]))
results['mean_test_score'][candidate],
results['std_test_score'][candidate]))
print("Parameters: {0}".format(results['params'][candidate]))
print("")

Expand All @@ -68,7 +68,7 @@ def report(results, n_top=3):
random_search.fit(X, y)
print("RandomizedSearchCV took %.2f seconds for %d candidates"
" parameter settings." % ((time() - start), n_iter_search))
report(random_search.results_)
report(random_search.cv_results_)

# use a full grid over all parameters
param_grid = {"max_depth": [3, None],
Expand All @@ -84,5 +84,5 @@ def report(results, n_top=3):
grid_search.fit(X, y)

print("GridSearchCV took %.2f seconds for %d candidate parameter settings."
% (time() - start, len(grid_search.results_['params'])))
report(grid_search.results_)
% (time() - start, len(grid_search.cv_results_['params'])))
report(grid_search.cv_results_)
2 changes: 1 addition & 1 deletion examples/plot_compare_reduction.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
digits = load_digits()
grid.fit(digits.data, digits.target)

mean_scores = np.array(grid.results_['test_mean_score'])
mean_scores = np.array(grid.cv_results_['mean_test_score'])
# scores are in the order of param_grid iteration, which is alphabetical
mean_scores = mean_scores.reshape(len(C_OPTIONS), -1, len(N_FEATURES_OPTIONS))
# select score for best C
Expand Down
4 changes: 2 additions & 2 deletions examples/svm/plot_rbf_parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,8 +171,8 @@ def __call__(self, value, clip=None):
plt.yticks(())
plt.axis('tight')

scores = grid.results_['test_mean_score'].reshape(len(C_range),
len(gamma_range))
scores = grid.cv_results_['mean_test_score'].reshape(len(C_range),
len(gamma_range))

# Draw heatmap of the validation accuracy as a function of gamma and C
#
Expand Down
2 changes: 1 addition & 1 deletion examples/svm/plot_svm_scale_c.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@
cv=ShuffleSplit(train_size=train_size,
n_splits=250, random_state=1))
grid.fit(X, y)
scores = grid.results_['test_mean_score']
scores = grid.cv_results_['mean_test_score']

scales = [(1, 'No scaling'),
((n_samples * train_size), '1/n_samples'),
Expand Down
86 changes: 44 additions & 42 deletions sklearn/model_selection/_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -573,17 +573,18 @@ def _fit(self, X, y, labels, parameter_iterable):
stds = np.sqrt(np.average((test_scores - means[:, np.newaxis]) ** 2,
axis=1, weights=weights))

results = dict()
cv_results = dict()
for split_i in range(n_splits):
results["test_split%d_score" % split_i] = test_scores[:, split_i]
results["test_mean_score"] = means
results["test_std_score"] = stds
cv_results["split%d_test_score" % split_i] = test_scores[:,
split_i]
cv_results["mean_test_score"] = means
cv_results["std_test_score"] = stds

ranks = np.asarray(rankdata(-means, method='min'), dtype=np.int32)

best_index = np.flatnonzero(ranks == 1)[0]
best_parameters = candidate_params[best_index]
results["test_rank_score"] = ranks
cv_results["rank_test_score"] = ranks

# Use one np.MaskedArray and mask all the places where the param is not
# applicable for that candidate. Use defaultdict as each candidate may
Expand All @@ -597,12 +598,12 @@ def _fit(self, X, y, labels, parameter_iterable):
# Setting the value at an index also unmasks that index
param_results["param_%s" % name][cand_i] = value

results.update(param_results)
cv_results.update(param_results)

# Store a list of param dicts at the key 'params'
results['params'] = candidate_params
cv_results['params'] = candidate_params

self.results_ = results
self.cv_results_ = cv_results
self.best_index_ = best_index
self.n_splits_ = n_splits

Expand All @@ -620,30 +621,31 @@ def _fit(self, X, y, labels, parameter_iterable):

@property
def best_params_(self):
check_is_fitted(self, 'results_')
return self.results_['params'][self.best_index_]
check_is_fitted(self, 'cv_results_')
return self.cv_results_['params'][self.best_index_]

@property
def best_score_(self):
check_is_fitted(self, 'results_')
return self.results_['test_mean_score'][self.best_index_]
check_is_fitted(self, 'cv_results_')
return self.cv_results_['mean_test_score'][self.best_index_]

@property
def grid_scores_(self):
warnings.warn(
"The grid_scores_ attribute was deprecated in version 0.18"
" in favor of the more elaborate results_ attribute."
" in favor of the more elaborate cv_results_ attribute."
" The grid_scores_ attribute will not be available from 0.20",
DeprecationWarning)

check_is_fitted(self, 'results_')
check_is_fitted(self, 'cv_results_')
grid_scores = list()

for i, (params, mean, std) in enumerate(zip(
self.results_['params'],
self.results_['test_mean_score'],
self.results_['test_std_score'])):
scores = np.array(list(self.results_['test_split%d_score' % s][i]
self.cv_results_['params'],
self.cv_results_['mean_test_score'],
self.cv_results_['std_test_score'])):
scores = np.array(list(self.cv_results_['split%d_test_score'
% s][i]
for s in range(self.n_splits_)),
dtype=np.float64)
grid_scores.append(_CVScoreTuple(params, mean, scores))
Expand Down Expand Up @@ -763,22 +765,22 @@ class GridSearchCV(BaseSearchCV):
fit_params={}, iid=..., n_jobs=1,
param_grid=..., pre_dispatch=..., refit=...,
scoring=..., verbose=...)
>>> sorted(clf.results_.keys())
>>> sorted(clf.cv_results_.keys())
... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
['param_C', 'param_kernel', 'params', 'test_mean_score',...
'test_rank_score', 'test_split0_score', 'test_split1_score',...
'test_split2_score', 'test_std_score']
['mean_test_score', 'param_C', 'param_kernel', 'params',...
'rank_test_score', 'split0_test_score', 'split1_test_score',...
'split2_test_score', 'std_test_score']

Attributes
----------
results_ : dict of numpy (masked) ndarrays
cv_results_ : dict of numpy (masked) ndarrays
A dict with keys as column headers and values as columns, that can be
imported into a pandas ``DataFrame``.

For instance the below given table

+------------+-----------+------------+-----------------+---+---------+
|param_kernel|param_gamma|param_degree|test_split0_score|...|...rank..|
|param_kernel|param_gamma|param_degree|split0_test_score|...|rank_....|
+============+===========+============+=================+===+=========+
| 'poly' | -- | 2 | 0.8 |...| 2 |
+------------+-----------+------------+-----------------+---+---------+
Expand All @@ -789,7 +791,7 @@ class GridSearchCV(BaseSearchCV):
| 'rbf' | 0.2 | -- | 0.9 |...| 1 |
+------------+-----------+------------+-----------------+---+---------+

will be represented by a ``results_`` dict of::
will be represented by a ``cv_results_`` dict of::

{
'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],
Expand All @@ -798,11 +800,11 @@ class GridSearchCV(BaseSearchCV):
mask = [ True True False False]...),
'param_degree': masked_array(data = [2.0 3.0 -- --],
mask = [False False True True]...),
'test_split0_score' : [0.8, 0.7, 0.8, 0.9],
'test_split1_score' : [0.82, 0.5, 0.7, 0.78],
'test_mean_score' : [0.81, 0.60, 0.75, 0.82],
'test_std_score' : [0.02, 0.01, 0.03, 0.03],
'test_rank_score' : [2, 4, 3, 1],
'split0_test_score' : [0.8, 0.7, 0.8, 0.9],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming with @amueller: these names work better for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think that's better

'split1_test_score' : [0.82, 0.5, 0.7, 0.78],
'mean_test_score' : [0.81, 0.60, 0.75, 0.82],
'std_test_score' : [0.02, 0.01, 0.03, 0.03],
'rank_test_score' : [2, 4, 3, 1],
'params' : [{'kernel': 'poly', 'degree': 2}, ...],
}

Expand All @@ -821,10 +823,10 @@ class GridSearchCV(BaseSearchCV):
Parameter setting that gave the best results on the hold out data.

best_index_ : int
The index (of the ``results_`` arrays) which corresponds to the best
The index (of the ``cv_results_`` arrays) which corresponds to the best
candidate parameter setting.

The dict at ``search.results_['params'][search.best_index_]`` gives
The dict at ``search.cv_results_['params'][search.best_index_]`` gives
the parameter setting for the best model, that gives the highest
mean score (``search.best_score_``).

Expand Down Expand Up @@ -1005,14 +1007,14 @@ class RandomizedSearchCV(BaseSearchCV):

Attributes
----------
results_ : dict of numpy (masked) ndarrays
cv_results_ : dict of numpy (masked) ndarrays
A dict with keys as column headers and values as columns, that can be
imported into a pandas ``DataFrame``.

For instance the below given table

+--------------+-------------+-------------------+---+---------------+
| param_kernel | param_gamma | test_split0_score |...|test_rank_score|
| param_kernel | param_gamma | split0_test_score |...|rank_test_score|
+==============+=============+===================+===+===============+
| 'rbf' | 0.1 | 0.8 |...| 2 |
+--------------+-------------+-------------------+---+---------------+
Expand All @@ -1021,17 +1023,17 @@ class RandomizedSearchCV(BaseSearchCV):
| 'rbf' | 0.3 | 0.7 |...| 1 |
+--------------+-------------+-------------------+---+---------------+

will be represented by a ``results_`` dict of::
will be represented by a ``cv_results_`` dict of::

{
'param_kernel' : masked_array(data = ['rbf', rbf', 'rbf'],
mask = False),
'param_gamma' : masked_array(data = [0.1 0.2 0.3], mask = False),
'test_split0_score' : [0.8, 0.9, 0.7],
'test_split1_score' : [0.82, 0.5, 0.7],
'test_mean_score' : [0.81, 0.7, 0.7],
'test_std_score' : [0.02, 0.2, 0.],
'test_rank_score' : [3, 1, 1],
'split0_test_score' : [0.8, 0.9, 0.7],
'split1_test_score' : [0.82, 0.5, 0.7],
'mean_test_score' : [0.81, 0.7, 0.7],
'std_test_score' : [0.02, 0.2, 0.],
'rank_test_score' : [3, 1, 1],
'params' : [{'kernel' : 'rbf', 'gamma' : 0.1}, ...],
}

Expand All @@ -1050,10 +1052,10 @@ class RandomizedSearchCV(BaseSearchCV):
Parameter setting that gave the best results on the hold out data.

best_index_ : int
The index (of the ``results_`` arrays) which corresponds to the best
The index (of the ``cv_results_`` arrays) which corresponds to the best
candidate parameter setting.

The dict at ``search.results_['params'][search.best_index_]`` gives
The dict at ``search.cv_results_['params'][search.best_index_]`` gives
the parameter setting for the best model, that gives the highest
mean score (``search.best_score_``).

Expand Down
Loading