|
| 1 | +.. _learning_curves: |
| 2 | + |
| 3 | +===================================================== |
| 4 | +Validation curves: plotting scores to evaluate models |
| 5 | +===================================================== |
| 6 | + |
| 7 | +.. currentmodule:: sklearn.learning_curve |
| 8 | + |
| 9 | +Every estimator has its advantages and drawbacks. Its generalization error |
| 10 | +can be decomposed in terms of bias, variance and noise. The **bias** of an |
| 11 | +estimator is its average error for different training sets. The **variance** |
| 12 | +of an estimator indicates how sensitive it is to varying training sets. Noise |
| 13 | +is a property of the data. |
| 14 | + |
| 15 | +In the following plot, we see a function :math:`f(x) = \cos (\frac{3}{2} \pi x)` |
| 16 | +and some noisy samples from that function. We use three different estimators |
| 17 | +to fit the function: linear regression with polynomial features of degree 1, |
| 18 | +4 and 15. We see that the first estimator can at best provide only a poor fit |
| 19 | +to the samples and the true function because it is too simple (high bias), |
| 20 | +the second estimator approximates it almost perfectly and the last estimator |
| 21 | +approximates the training data perfectly but does not fit the true function |
| 22 | +very well, i.e. it is very sensitive to varying training data (high variance). |
| 23 | + |
| 24 | +.. figure:: ../auto_examples/images/plot_polynomial_regression_1.png |
| 25 | + :target: ../auto_examples/plot_polynomial_regression.html |
| 26 | + :align: center |
| 27 | + :scale: 50% |
| 28 | + |
| 29 | +Bias and variance are inherent properties of estimators and we usually have to |
| 30 | +select learning algorithms and hyperparameters so that both bias and variance |
| 31 | +are as low as possible (see `Bias-variance dilemma |
| 32 | +<http://en.wikipedia.org/wiki/Bias-variance_dilemma>`_). Another way to reduce |
| 33 | +the variance of a model is to use more training data. However, you should only |
| 34 | +collect more training data if the true function is too complex to be |
| 35 | +approximated by an estimator with a lower variance. |
| 36 | + |
| 37 | +In the simple one-dimensional problem that we have seen in the example it is |
| 38 | +easy to see whether the estimator suffers from bias or variance. However, in |
| 39 | +high-dimensional spaces, models can become very difficult to visualize. For |
| 40 | +this reason, it is often helpful to use the tools described below. |
| 41 | + |
| 42 | +.. topic:: Examples: |
| 43 | + |
| 44 | + * :ref:`example_plot_polynomial_regression.py` |
| 45 | + * :ref:`example_plot_validation_curve.py` |
| 46 | + * :ref:`example_plot_learning_curve.py` |
| 47 | + |
| 48 | + |
| 49 | +.. _validation_curve: |
| 50 | + |
| 51 | +Validation curve |
| 52 | +================ |
| 53 | + |
| 54 | +To validate a model we need a scoring function (see :ref:`model_evaluation`), |
| 55 | +for example accuracy for classifiers. The proper way of choosing multiple |
| 56 | +hyperparameters of an estimator are of course grid search or similar methods |
| 57 | +(see :ref:`grid_search`) that select the hyperparameter with the maximum score |
| 58 | +on a validation set or multiple validation sets. Note that if we optimized |
| 59 | +the hyperparameters based on a validation score the validation score is biased |
| 60 | +and not a good estimate of the generalization any longer. To get a proper |
| 61 | +estimate of the generalization we have to compute the score on another test |
| 62 | +set. |
| 63 | + |
| 64 | +However, it is sometimes helpful to plot the influence of a single |
| 65 | +hyperparameter on the training score and the validation score to find out |
| 66 | +whether the estimator is overfitting or underfitting for some hyperparameter |
| 67 | +values. |
| 68 | + |
| 69 | +The function :func:`validation_curve` can help in this case:: |
| 70 | + |
| 71 | + >>> import numpy as np |
| 72 | + >>> from sklearn.learning_curve import validation_curve |
| 73 | + >>> from sklearn.datasets import load_iris |
| 74 | + >>> from sklearn.linear_model import Ridge |
| 75 | + |
| 76 | + >>> np.random.seed(0) |
| 77 | + >>> iris = load_iris() |
| 78 | + >>> X, y = iris.data, iris.target |
| 79 | + >>> indices = np.arange(y.shape[0]) |
| 80 | + >>> np.random.shuffle(indices) |
| 81 | + >>> X, y = X[indices], y[indices] |
| 82 | + |
| 83 | + >>> train_scores, valid_scores = validation_curve(Ridge(), X, y, "alpha", |
| 84 | + ... np.logspace(-7, 3, 3)) |
| 85 | + >>> train_scores # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE |
| 86 | + array([[ 0.94..., 0.92..., 0.92...], |
| 87 | + [ 0.94..., 0.92..., 0.92...], |
| 88 | + [ 0.47..., 0.45..., 0.42...]]) |
| 89 | + >>> valid_scores # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE |
| 90 | + array([[ 0.90..., 0.92..., 0.94...], |
| 91 | + [ 0.90..., 0.92..., 0.94...], |
| 92 | + [ 0.44..., 0.39..., 0.45...]]) |
| 93 | + |
| 94 | +If the training score and the validation score are both low, the estimator will |
| 95 | +be underfitting. If the training score is high and the validation score is low, |
| 96 | +the estimator is overfitting and otherwise it is working very well. A low |
| 97 | +training score and a high validation score is usually not possible. All three |
| 98 | +cases can be found in the plot below where we vary the parameter |
| 99 | +:math:`\gamma` of an SVM on the digits dataset. |
| 100 | + |
| 101 | +.. figure:: ../auto_examples/images/plot_validation_curve_1.png |
| 102 | + :target: ../auto_examples/plot_validation_curve.html |
| 103 | + :align: center |
| 104 | + :scale: 50% |
| 105 | + |
| 106 | + |
| 107 | +.. _learning_curve: |
| 108 | + |
| 109 | +Learning curve |
| 110 | +============== |
| 111 | + |
| 112 | +A learning curve shows the validation and training score of an estimator |
| 113 | +for varying numbers of training samples. It is a tool to find out how much |
| 114 | +we benefit from adding more training data and whether the estimator suffers |
| 115 | +more from a variance error or a bias error. If both the validation score and |
| 116 | +the training score converge to a value that is too low with increasing |
| 117 | +size of the training set, we will not benefit much from more training data. |
| 118 | +In the following plot you can see an example: naive Bayes roughly converges |
| 119 | +to a low score. |
| 120 | + |
| 121 | +.. figure:: ../auto_examples/images/plot_learning_curve_1.png |
| 122 | + :target: ../auto_examples/plot_learning_curve.html |
| 123 | + :align: center |
| 124 | + :scale: 50% |
| 125 | + |
| 126 | +We will probably have to use an estimator or a parametrization of the |
| 127 | +current estimator that can learn more complex concepts (i.e. has a lower |
| 128 | +bias). If the training score is much greater than the validation score for |
| 129 | +the maximum number of training samples, adding more training samples will |
| 130 | +most likely increase generalization. In the following plot you can see that |
| 131 | +the SVM could benefit from more training examples. |
| 132 | + |
| 133 | +.. figure:: ../auto_examples/images/plot_learning_curve_2.png |
| 134 | + :target: ../auto_examples/plot_learning_curve.html |
| 135 | + :align: center |
| 136 | + :scale: 50% |
| 137 | + |
| 138 | +We can use the function :func:`learning_curve` to generate the values |
| 139 | +that are required to plot such a learning curve (number of samples |
| 140 | +that have been used, the average scores on the training sets and the |
| 141 | +average scores on the validation sets):: |
| 142 | + |
| 143 | + >>> from sklearn.learning_curve import learning_curve |
| 144 | + >>> from sklearn.svm import SVC |
| 145 | + |
| 146 | + >>> train_sizes, train_scores, valid_scores = learning_curve( |
| 147 | + ... SVC(kernel='linear'), X, y, train_sizes=[50, 80, 110], cv=5) |
| 148 | + >>> train_sizes # doctest: +NORMALIZE_WHITESPACE |
| 149 | + array([ 50, 80, 110]) |
| 150 | + >>> train_scores # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE |
| 151 | + array([[ 0.98..., 0.98 , 0.98..., 0.98..., 0.98...], |
| 152 | + [ 0.98..., 1. , 0.98..., 0.98..., 0.98...], |
| 153 | + [ 0.98..., 1. , 0.98..., 0.98..., 0.99...]]) |
| 154 | + >>> valid_scores # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE |
| 155 | + array([[ 1. , 0.93..., 1. , 1. , 0.96...], |
| 156 | + [ 1. , 0.96..., 1. , 1. , 0.96...], |
| 157 | + [ 1. , 0.96..., 1. , 1. , 0.96...]]) |
| 158 | + |
0 commit comments