Lec 12

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Lecture 12: Introduction to Statistics by Dr.

Javed Iqbal

Coefficient of Determination (R-squared):

Total variation in y = variation in y that is explained by regression + variation in y
that is not explained by regression
Variation in y explained by regression 𝑆𝑆𝑅
Coefficient of Determination (R2) = =
Total variation in y 𝑆𝑆𝑇

Interpretation: Proportion of variation in y that is explained by regression

Range of R2 values : 0 ≤ 𝑅2 ≤ 1

R2 = 1 when all the scatter points fall on the regression line

Computing formula for R2: Weiss p-663-664. Ex:14.100

(𝑺𝒙𝒚 )𝟐

(∑ 𝒙) 𝟐 (∑ 𝒚)𝟐 (∑ 𝒙)(∑ 𝒚)
𝑺𝒙𝒙 = ∑ 𝒙𝟐 − , 𝑺𝑺𝑻 = 𝑺𝒚𝒚 = ∑ 𝒚𝟐 − , 𝑺𝒙𝒚 = ∑ 𝒙𝒚 − ,
𝒏 𝒏 𝒏
R2 = = 0.853: 85.3% variation in selling price of Orion cars is explained by
their age.
Hence the age of a car is a very good predictor of its price.
Do it for Ex 14.60 (Home price, square feet data)
n = 9, ∑ 𝑥 = 20682, ∑ 𝑦 = 3487.1, ∑ 𝑥𝑦 = 9254378, ∑ 𝑥 2 =
57414186, ∑ 𝑦 2 = 1590653,

𝑆𝑥𝑥 = 57414186 − = 9886950
𝑆𝑥𝑦 = 9254378 − = 1241023
𝑆𝑆𝑇 = 𝑆𝑦𝑦 = 1590653 − = 239556.4
12410232 155774.7
𝑆𝑆𝑅 = = 155774.7 𝑅2 = = 0.650
9886950 239556.4
65% variation in sale prices of homes is explained by their living area so living
area is a good predictor of house prices

Correlation Coefficient: A concept and measure closely related to R2 is the

coefficient of linear correlation (r). This coefficient is the square root of the
coefficient of determination.
𝑟 = √𝑅 2
Note that −1 ≤ 𝑟 ≤ 1.

The sign of the correlation coefficient is same as the sign of the slope coefficient (b1)
in regression.

Weiss Fig 14.18, p-669 for a visual idea of correlation and interpretation.

Guideline for interpreting correlation coefficient:

Note that while R2 is the square of correlation (r) between y and x, the interpretation
of R2 is very different from the r.

Example 14.13, p-670: For the Orion example, R2 was 0.853, hence magnitude of
correlation coefficient is: √0.853 = 0.924. As the sign of relationship between age
and price of car is negative hence r = -0.924. This indicates very strong negative
correlation between age and price of Orion.

Anderson Ex 7, p-572: For this data find and interpret the coefficient of
determination and coefficient of correlation.
(𝑆𝑥𝑦 )2 (568)2
𝑆𝑥𝑥 = 142, 𝑆𝑥𝑦 = 568, 𝑆𝑦𝑦 = 2442, 𝑆𝑆𝑅 = = = 2272
𝑆𝑥𝑥 142
R2 = SSR/ SST = 2272/2442 = 0.930
R2 = 0.93: 93% variation in sales is explained by years of experience of
salesperson through this model.

Correlation Coefficient r = 0.96. There is very strong positive correlation between

sales and years of experience of salesperson.
CW practice Homes Sales data of Weiss Ex 14.60 and Weiss Ex14.100. Verify that r = 0.8063

Both Excel and R regression output give R2.

Properties of correlation coefficient (r):

1). It is independent of units and scale i.e. it is a unit free number.

2). Its value falls in the range −1 ≤ 𝑟 ≤ 1.
3). The sign of r is same as sign of slope coefficient.
4). Coefficient of determination is square of correlation coefficient. Yet the interpretation is very
5). Correlation does not imply causation e.g. eating ice cream and drowning incidents both may
be highly correlated, but no causation is implied as both may be affected by common cause i.e.
warm summer weather.
6). Correlation coefficient is a measure of linear association only. Perfect yet nonlinear relationship
may give rise to zero correlation.

Example: Find correlation coefficient b/w x and y and verify r = 0

X -2 -1 0 1 2
Y 4 1 0 1 4
Yet there is perfect (but nonlinear) relationship between x and y.

-3 -2 -1 0 1 2 3

d). Find the coefficient of determination and interpret it

e). Find the correlation coefficient and interpret it.

You might also like