Enhancement: Learning curves

I hope this is not a duplicate issue, but it's something I've been thinking about for a while.  It would be nice to create a utility function to generate learning curves: both for hyperparameter value vs. score/error, and for number of training samples vs. score/error.  It's something I do by-hand very often.  I think an interface similar to grid search would work well, with an interface that looks like this:

Some setup code:

``` python
from sklearn.linear_model import RidgeClassifier
clf = RidgeClassifier()

from sklearn.datasets import load_digits
digits = load_digits()
X, y = digits.data, digits.target
```

Here's the first function we should create (arguments have the same meaning as in `GridSearchCV`)

``` python
alpha_range = np.logspace(-2, 2, 50)
train_score, test_score = validation_curve(clf,
                               param_grid={'alpha':alpha_range},
                               scoring='accuracy', cv=10)

import matplotlib.pyplot as plt
plt.semilogx(alpha_range, train_score, label='train')
plt.semilogx(alpha_range, test_score, label='test')
```

Here's the second function we should create (same argument meanings):

``` python
N_range = np.arange(10, int(0.8 * X.shape[0]), 10)
train_score, test_score = learning_curve(clf, N_range,
                               scoring='accuracy', cv=10)

plt.plot(N_range, train_score, label='train')
plt.plot(N_range, test_score, label='test')
```

I use this sort of thing all the time both in tutorials and in practice; it would be nice to have this functionality available in a convenience routine.  Any thoughts on this?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enhancement: Learning curves #2584

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Enhancement: Learning curves #2584

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions