Feb 17, 2019 MMW SCC Handout Least Squares Line
Feb 17, 2019 MMW SCC Handout Least Squares Line
Feb 17, 2019 MMW SCC Handout Least Squares Line
THE LEAST SQUARES LINE (other names ‘Best-Fit Line’ or ‘Regression Line’)
Problem:
A sales manager noticed that the annual sales of his employees increase with years of experience. To estimate the annual sales for his
potential new sales person he collected data concerning annual sales and years of experience of his current employees: We’ll use his data to
create a formula that will help him estimate annual sales based on years of experience.
Years of experience 1 3 8 10 13
150
Annual sales in thousands 80 100 110 120 140 140
130
120
Sales
110
100
Work: 90
80
Figure to the right shows scatter graph of his data. Each point represents data on
70
one single current employee. For example: the first employee has 1 year of
60
experience and made 80 thousands in sales, so he is represented by the point with
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
coordinates (1,80).
Years
We’ll create equation of the least squares line, which is also called best-fit
line or regression line. The line is passing in between our points while the
sum of the squares of the vertical distances from the data points to the line
is as small as possible. The picture to the left shows the least square line
passing in between our points, and the distances d1, d2, d3, d4, and d5. The
equation of the least square line is found by minimizing the sum:
(d1 ) 2 (d 2 ) 2 (d 3 ) 2 (d 4 ) 2 (d 5 ) 2
The procedure will be omitted in this paper. The final result of the
minimization are formulas that let us calculate coefficients a and b for
equation of the least square line y ax b . This equation may be used for
the prediction of sales.
In this formula n stands for number of observed cases. In our case that is 5 employees. So: n5
Symbol is called “sigma” and stands for the “sum of all”. In our case:
x 2
12 32 82 10 2 132 343 (sum of squares of x values)
That means that the equation of the least square line is y 4.388 x 79.284
Conclusion:
We created the formula y 4.388 x 79.284 that may be used to predict sales in thousands for a future employee. For example, formula
may predict that an employee with x=15 years of experience will generate:
y 4.388 15 79.284 145 thousands in sales per year.
The value of r that is close to 1 indicates that our formula will give us a reliable prediction for sales level, based on years of experience of the
employee.
The equation may be used as a source for reliable prediction if the correlation coefficient is a number that is close to 1 or 1 . That means
that your observed values are close to the least square line (second and fourth picture above).
If not, the value of the correlation coefficient is closer to 0. Such a small value for the coefficient of correlation indicates that the observed
data are widely spread around, so our formula is not reliable source of prediction (like on first and third picture above).
It is usually more convenient to numerically measure reliability of our formula using the square of correlation coefficient. Some books and
computer software are using symbol R 2 for it (read as ‘r square’) In our example;
R 2 0.9712 0.943
R square is always a number between 0 and 1. Values of R 2 that are close to 1 indicate reliable formula.
3
Problem:
A sales manager noticed that the annual sales of his employees increase with years of experience. We’ll use Excel to graph his data and create
a formula that will help him estimate annual sales based on years of experience.
Years of experience 1 3 8 10 13
Annual sales in thousands 80 100 110 120 140
The final picture shown below features the equation of the least squares line y 4.3878 x 79.286 and the value of R square R 2 0.9434
Practice questions:
1) Find the equation of the least squares line and the coefficient of correlation for the data shown:
x 3 5 7 8
y 8.7 7.9 6.2 5.8
2) The table to the right shows the relationship between MPG (miles per gallon) and HP x=HP y=MPG
(horsepower) for some sports cars produced in year 2015. Mitcubishi Lancer 148 29
Find the equation of the least squares line and use it to estimate MPG for Chevrolet Corvette that
Mazda Miata 158 23
has 460 HP engine.
Subaru WRX 268 24
Porche Boxter 315 23
Chevy Camaro 323 20
Ferrari Spider 570 14
3) Millers are buying a house in NWI so they’d like to estimate yearly house taxes in Valparaiso. Using data from ReMax web site they
obtained 2014 taxes for 5 houses in that city. Use the table to create the least squares line equation and its coefficient of correlation. Use this
equation to estimate yearly taxes for a $200,000 house in Valparaiso.
Advertising expenditures x 0 1 2 3 4 5 6
Monthly sales y 3 9 11 15 17 20 27
Create the least squares line. Examine if advertising expenditures appear to have strong effect on monthly sales by calculating and
interpreting R 2 , and if so, predict the monthly sales if 11 thousand dollars is spent on advertising.
5) A hospital conducted study to determine relation between age and blood pressure of their patients. The table bellow shows collected data.
Find the equation of the least squares line and use it to calculate blood pressure of a 50 years old patient.
Age x 43 48 56 61 67 70
Pressure y 128 120 135 143 141 152
6) Find the equation of the least squares line for the data shown below:
x 0 5 10 20
y 8 7 5 2
7) A researcher wishes to see whether there is a relationship between number of hours of study and test scores on exam, so she collected
data shown in the table bellow. Find the equation of the least squares line and use it to calculate how many hours should a student study to
obtain 93 percent on the test.
Hours of study x 6 2 1 5 2 3
Score on test y 82 63 57 88 68 75
8) The table bellow compares rents for one-bedroom and two-bedroom apartments in 7 different cities. Find the equation of the least
squares line and R square. A one-bedroom rent in a doorman building in Lower Manhattan averaged $3000 in august 2007. Calculate a two
bedroom rent using the obtained formula.
9) Thanks to the progress of science the cancer survival rate is improving over time. The table shows 10-year survival rate for ovary cancer
in England, diagnosed in year 1971, 1981, 1991 and 2001. According to the table 27% of the patients diagnosed in year 2001 survived over
10 years. Find the equation of the least squares line and R square. Use it to estimate the 10-year ovary cancer survival rate for the patients
who are diagnosed in year 2015.
10) Millers are buying a house in NWI so they’d like to estimate yearly house taxes in Crown Point. Using data from ReMax web site they
obtained 2014 taxes for 6 houses in that city. Use the table to create the least squares line equation. Use this equation to estimate yearly taxes
for a $200,000 house in Crown Point.