|
| 1 | + |
| 2 | +Let's discuss the method Ashenfelter used |
| 3 | +to build his model, linear regression. |
| 4 | +We'll start with one-variable linear regression, which |
| 5 | +just uses one independent variable to predict |
| 6 | +the dependent variable. |
| 7 | +This figure shows a plot of one of the independent variables, |
| 8 | +average growing season temperature, |
| 9 | +and the dependent variable, wine price. |
| 10 | +The goal of linear regression is to create a predictive line |
| 11 | +through the data. |
| 12 | +There are many different lines that |
| 13 | +could be drawn to predict wine price using average |
| 14 | +growing season temperature. |
| 15 | +A simple option would be a flat line at the average price, |
| 16 | +in this case 7.07. |
| 17 | +The equation for this line is y equals 7.07. |
| 18 | +This linear regression model would predict 7.07 regardless |
| 19 | +of the temperature. |
| 20 | +But it looks like a better line would |
| 21 | +have a positive slope, such as this line in blue. |
| 22 | +The equation for this line is y equals 0.5*(AGST) -1.25. |
| 23 | +This linear regression model would predict a higher price |
| 24 | +when the temperature is higher. |
| 25 | +Let's make this idea a little more formal. |
| 26 | +In general form a one-variable linear regression model |
| 27 | +is a linear equation to predict the dependent variable, y, |
| 28 | +using the independent variable, x. |
| 29 | +Beta 0 is the intercept term or intercept coefficient, |
| 30 | +and Beta 1 is the slope of the line or coefficient |
| 31 | +for the independent variable, x. |
| 32 | +For each observation, i, we have data |
| 33 | +for the dependent variable Yi and data |
| 34 | +for the independent variable, Xi. |
| 35 | +Using this equation we make a prediction beta 0 plus Beta |
| 36 | +1 times Xi for each data point, i. |
| 37 | +This prediction is hopefully close to the true outcome, Yi. |
| 38 | +But since the coefficients have to be the same for all data |
| 39 | +points, i, we often make a small error, |
| 40 | +which we'll call epsilon i. |
| 41 | +This error term is also often called a residual. |
| 42 | +Our errors will only all be 0 if all our points lie perfectly |
| 43 | +on the same line. |
| 44 | +This rarely happens, so we know that our model will probably |
| 45 | +make some errors. |
| 46 | +The best model or best choice of coefficients Beta 0 and Beta 1 |
| 47 | +has the smallest error terms or smallest residuals. |
| 48 | +This figure shows the blue line that we drew in the beginning. |
| 49 | +We can compute the residuals or errors |
| 50 | +of this line for each data point. |
| 51 | +For example, for this point the actual value is about 6.2. |
| 52 | +Using our regression model we predict about 6.5. |
| 53 | +So the error for this data point is negative 0.3, |
| 54 | +which is the actual value minus our prediction. |
| 55 | +As another example for this point, |
| 56 | +the actual value is about 8. |
| 57 | +Using our regression model we predict about 7.5. |
| 58 | +So the error for this data point is about 0.5. |
| 59 | +Again the actual value minus our prediction. |
| 60 | +One measure of the quality of a regression line |
| 61 | +is the sum of squared errors, or SSE. |
| 62 | +This is the sum of the squared residuals or error terms. |
| 63 | +Let n equal the number of data points that we have in our data |
| 64 | +set. |
| 65 | +Then the sum of squared errors is |
| 66 | +equal to the error we make on the first data point squared |
| 67 | +plus the error we make on the second data point squared |
| 68 | +plus the errors that you make on all data points |
| 69 | +up to the n-th data point squared. |
| 70 | + |
| 71 | +We can compute the sum of squared errors |
| 72 | +for both the red line and the blue line. |
| 73 | +As expected the blue line is a better fit than the red line |
| 74 | +since it has a smaller sum of squared errors. |
| 75 | +The line that gives the minimum sum of squared errors |
| 76 | +is shown in green. |
| 77 | +This is the line that our regression model will find. |
| 78 | +Although sum of squared errors allows us to compare lines |
| 79 | +on the same data set, it's hard to interpret for two reasons. |
| 80 | +The first is that it scales with n, the number of data points. |
| 81 | +If we built the same model with twice as much data, |
| 82 | +the sum of squared errors might be twice as big. |
| 83 | +But this doesn't mean it's a worse model. |
| 84 | +The second is that the units are hard to understand. |
| 85 | +Some of squared errors is in squared units |
| 86 | +of the dependent variable. |
| 87 | +Because of these problems, Root Means Squared Error, or RMSE, |
| 88 | +is often used. |
| 89 | +This divides sum of squared errors by n |
| 90 | +and then takes a square root. |
| 91 | +So it's normalized by n and is in the same units |
| 92 | +as the dependent variable. |
| 93 | +Another common error measure for linear regression is R squared. |
| 94 | +This error measure is nice because it compares the best |
| 95 | +model to a baseline model, the model that does not |
| 96 | +use any variables, or the red line from before. |
| 97 | +The baseline model predicts the average value |
| 98 | +of the dependent variable regardless |
| 99 | +of the value of the independent variable. |
| 100 | +We can compute that the sum of squared errors for the best fit |
| 101 | +line or the green line is 5.73. |
| 102 | +And the sum of squared errors for the baseline |
| 103 | +or the red line is 10.15. |
| 104 | +The sum of squared errors for the baseline model |
| 105 | +is also known as the total sum of squares, commonly referred |
| 106 | +to as SST. |
| 107 | +Then the formula for R squared is |
| 108 | +R squared equals 1 minus sum of squared errors divided |
| 109 | +by total sum of squares. |
| 110 | +In this case it equals 1 minus 5.73 |
| 111 | +divided by 10.15 which equals 0.44. |
| 112 | +R squared is nice because it captures |
| 113 | +the value added from using a linear regression |
| 114 | +model over just predicting the average outcome for every data |
| 115 | +point. |
| 116 | +So what values do we expect to see for R squared? |
| 117 | +Well both the sum of squared errors |
| 118 | +and the total sum of squares have |
| 119 | +to be greater than or equal to zero because they're |
| 120 | +the sum of squared terms so they can't be negative. |
| 121 | +Additionally the sum of squared errors has to be less than |
| 122 | +or equal to the total sum of squares. |
| 123 | +This is because our linear regression model could just |
| 124 | +set the coefficient for the independent variable to 0 |
| 125 | +and then we would have the baseline model. |
| 126 | +So our linear regression model will never |
| 127 | +be worse than the baseline model. |
| 128 | +So in the worst case the sum of squares errors |
| 129 | +equals the total sum of squares, and our R |
| 130 | +squared is equal to 0. |
| 131 | +So this means no improvement over the baseline. |
| 132 | +In the best case our linear regression model |
| 133 | +makes no errors, and the sum of squared errors is equal to 0. |
| 134 | +And then our R squared is equal to 1. |
| 135 | +So an R squared equal to 1 or close to 1 |
| 136 | +means a perfect or almost perfect predictive model. |
| 137 | +R squared is nice because it's unitless and therefore |
| 138 | +universally interpretable between problems. |
| 139 | +However, it can still be hard to compare between problems. |
| 140 | +Good models for easy problems will |
| 141 | +have an R squared close to 1. |
| 142 | +But good models for hard problems |
| 143 | +can still have an R squared close to zero. |
| 144 | +Throughout this course we will see |
| 145 | +examples of both types of problems. |
0 commit comments