Chapter 8. Correlation and Regression Analyses

Chapter 8.
CORRELATION AND REGRESSION ANALYSES
At the end of this chapter, the students should be able to:

1. Illustrate the nature of bivariate data.
2. Construct a scatter plot.
3. Describe shape (form), trend (direction), and variation (strength) based on a scatter plot.
4. Estimate strength of association between the variables based on a scatter plot.
5. Calculate the Pearson’s sample correlation coefficient.
6. Solve problems involving correlation analysis.
7. Identify the independent and dependent variables.
8. Calculate the slope and y-intercept of the regression line.
9. Interpret the calculated slope and y-intercept of the regression line.
10. Draw the best-fit line on a scatter plot.
11. Predict the value of the dependent variable given the value of the independent variable.
12. Solve problems involving regression analysis.
8.1 CORRELATION ANALYSIS
Correlation analysis which is used to quantify the association between two continuous
variables say, between an independent and a dependent variable. It is important to know that
simple statistics can show a great deal of information, but it is more significant to examine
relationships within the data. Through correlation measures and hypothesis testing, the
relationships can be studied completely. Regression and correlation analysis can be used to
describe the nature and strong effectivity between two continuous variables.
Understanding Bivariate Data

Bivariate data are sets of data with two quantitative variables. These sets of data are
measured from one set of samples or group of individuals, that is for everyone from the sample,
two sets of data are gathered. For example, individual’s age and IQ are measured, engine size
and mileage of cars, student’s GPA and rating in board exam, etc. Oftentimes, bivariate data are
useful to determine linear relationship and association between variables. In this case, a measure
of the strength and direction of linear association between two variables is known as Correlation.
Strength determines how strong the relationship is between variables. In practice strength of
linear relationship can be perfect, strong, moderate, and weak or no correlation. Direction can
be positive or negative. Positive relationship is when the value of one variable increases the other
variable also increases like age and memory, income and expense, length of time studying and
score in the exam and education and income level. Meanwhile, temperature and # of bottled
water sold, # of absences and grade, and hours spent in the mall and savings in bank have
negative relationship. Negative relationship exists when the value of one variable increases the
other variable decreases.
To visually identify the relationship between variables, the Scatter plot diagram can be
drawn. It is a graphical method is used to determine correlation between two quantitative
variables. Scatter plot is like a line chart. A horizontal and vertical axis is sketched where the data
points are plotted. The pattern of the points is studied if a correlation exists. Positive correlation
exists when the points resemble a line that is leaning to the right upward. Negative correlation
exists when the points resemble a line that is leaning to the left upward.
Example 1. Below are the ages and weight of 10 randomly selected elementary pupils.
Age (X) 6 8 7 9 7 10 12 12 11 8
Weight (Y) 44 48 49 51 46 52 54 55 56 49
60
58
56
WEIGHT (Y)
54
52
50
48
46
44
42
40
4 6 8 10 12 14
AGE (X)
The scatter plot shows a strong positive correlation since the pattern of the points is leaning to
the right upward.
Below are examples of scatter plots that can be used as guide in interpreting correlations.
6
6
5
5
4
4
3
Y
3
Y
2
2
1
1
0
0 5 10 15 0
X 0 5 10 15
X
Fig 1. Perfect Positive Correlation Fig 2. Perfect Negative Correlation

6 6
5 5
4 4
3 3
Y
Y
2 2
1 1
0 0
0 5 10 15 0 5 10 15
X X
Fig 3. Very Strong Positive Correlation Fig 4. Very Strong Negative Correlation
4.5 4.5
4 4
3.5 3.5
3 3
2.5 2.5
Y
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 5 10 15 0 5 10 15
X X
Fig 5. Strong Positive Correlation Fig 6. Strong Negative Correlation

4 4
3.5 3.5
3 3
2.5 2.5
2
Y
2
Y
1.5 1.5
1 1
0.5
0.5
0
0
0 5 10 15
0 5 10 15
X
X
Fig 7. Moderate Positive Correlation Fig 8. Moderate Negative Correlation
4.5
4.5
4
4
3.5
3.5
3
3
2.5
2.5
Y
2
Y
2
1.5
1.5
1
1
0.5
0.5
0
0
0 5 10 15
0 5 10 15
X
X
Fig 9. Weak Positive Correlation Fig 10. Weak Negative Correlation

4.5 5
4 4.5
3.5 4
3 3.5
3
2.5
2.5
Y
Y
2
2
1.5 1.5
1 1
0.5 0.5
0 0
0 5 10 15 0 5 10 15
X X
Fig 11. No Correlation Fig 12. No Correlation
Steps in Constructing Scatter Plot

1. Draw rectangular coordinate system. Label the lines “X” for horizontal line and “Y” for the
vertical line.
12
10
8
Y
6
0 2 4 6 8 10 12
2. Plot the data points.
X 2 3 4 5 5 6 8 10 12 8
Y 2 4 3 5 6 7 8 9 11 9
12
10
8
Y
6
0 2 4 6 8 10 12
X
3. Interpret the graph.
Clearly, the scatterplot diagram can help us determine the relationship between two
variables. However, Pearson Correlation Coefficient (r) give us an a more or less exact measure
of relationship since Pearson’s r is a statistic that measures the correlation between two
variables. Its value ranges from -1 to +1. Positive value indicates a positive correlation while
negative value is a sign of negative correlation. When 𝑟 = ±1, it means a perfect correlation.
When r approaches to ±1 it indicates a strong correlation. And when r = 0 it means no or zero
correlation. The table below presents the value of r and its verbal interpretation as suggested by
Evans (1996).
Interpreting Correlation (Evans, 1996)
r Verbal Interpretation
-1 Perfect Negative Correlation
-0.8 to -0.99 Very Strong Negative Correlation
-0.6 to -0.79 Strong Negative Correlation
-0.4 to -0.59 Moderate Negative Correlation
-0.2 to -0.39 Weak Negative Correlation
-0.01 to -0.19 Very Weak Negative Correlation
0 No Correlation
0.01 to 0.19 Very Weak Positive Correlation
0.2 to 0.39 Weak Positive Correlation
0.4 to 0.59 Moderate Positive Correlation
0.6 to 0.79 Strong Positive Correlation
0.8 to 0.99 Very Strong Positive Correlation
1 Perfect Positive Correlation
The Pearson correlation Coefficient r is computed using the formula:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
Where:
𝑛 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 ∑ 𝑥 2 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑥
∑ 𝑥 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑥 (∑ 𝑥 )2 = 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑥
∑ 𝑦 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑦 ∑ 𝑦 2 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑦
∑ 𝑥𝑦 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑥 𝑎𝑛𝑑 𝑦
(∑ 𝑦 )2 = 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑦

Example 1. Using the data of Example 1 of 8.1.1
Age (X) 6 8 7 9 7 10 12 12 11 8
Weight (Y) 44 48 49 51 46 52 54 55 56 49
Compute the Pearson correlation r and interpret.
Solution: To make the solution easier, we use a table consisting of the following columns, x, y,
x2, y2 and xy.
X Y X2 Y2 XY
6 44 36 1936 264
8 48 64 2304 384
7 49 49 2401 343
9 51 81 2601 459
7 46 49 2116 322
10 52 100 2704 520
12 54 144 2916 648
12 55 144 3025 660
11 56 121 3136 616
8 49 64 2401 392
SUM 90 504 852 25540 4608
From the solution table, we have;
𝑛 = 10 ∑ 𝑥 2 = 852
∑ 𝑥 = 90 (∑ 𝑥 )2 = 902 = 8100
∑ 𝑦 = 504 ∑ 𝑦 2 = 25540
∑ 𝑥𝑦 = 4608 (∑ 𝑦 )2 = 5042 = 254016

𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ]
10(4608)−(90)(504) 720
𝑟= =
√[10(852)−8100][10(25540)−254016] √(420)(1384)
𝑟 = 0.944
The computed r = 0.944 which indicate a very strong positive correlation between
student’s age and weight. We can conclude that older pupils weigh more than the younger ones.
Example 2. A study was conducted to determine linear association between car’s weight and
mileage. The data is presented below:
Weight (kg) Mileage (km/L)
1080 14
988 20
1140 16
1250 12
1178 12
980 18
1050 15
1095 16
1225 11
1180 13
1010 17
1160 12
Solution: Let X = weight and Y = mileage
X Y X2 Y2 XY
1080 14 1166400 196 15120
988 20 976144 400 19760
1140 16 1299600 256 18240
1250 12 1562500 144 15000
1178 12 1387684 144 14136
980 18 960400 324 17640
1050 15 1102500 225 15750
1095 16 1199025 256 17520
1225 11 1500625 121 13475
1180 13 1392400 169 15340
1010 17 1020100 289 17170
1160 12 1345600 144 13920

SUM 13336 176 14912978 2668 193071
𝑛 = 12 ∑ 𝑥 2 = 14912978
∑ 𝑥 = 13336 (∑ 𝑥 )2 = 133362 =177848896
∑ 𝑦 = 176 ∑ 𝑦 2 = 2668
∑ 𝑥𝑦 = 4608 (∑ 𝑦 )2 = 1762 = 30976
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ]
12(193071)−(13336)(176)
𝑟=
√[12(14912978)−177848896][12(2668)−30976]
𝑟 = −0.893
The computed r = -0893 which indicate a strong negative correlation between car’s
weight and mileage. This implies that as car’s weight tends to increase, the mileage tends to
decrease. It can also be concluded that heavier cars, in general, consume more gasoline.
Example 3. The data represents the Self Efficacy Score (SES) and Intelligence Quotient of 10
randomly selected teenagers.
SES IQ
35 104
46 125
48 100
55 112
52 120
39 117
48 105
30 116
50 108
45 108
Solution: Let X = SES and Y = IQ
X Y X2 Y2 XY
35 104 1225 10816 3640
46 125 2116 15625 5750
48 100 2304 10000 4800
55 112 3025 12544 6160
52 120 2704 14400 6240
39 117 1521 13689 4563
48 105 2304 11025 5040
30 116 900 13456 3480
50 108 2500 11664 5400
45 108 2025 11664 4860
SUM 448 1115 20624 124883 49933
𝑛 = 10 ∑ 𝑥 2 = 20624
∑ 𝑥 = 448 (∑ 𝑥 )2 = 4482 = 20074
∑ 𝑦 = 1115 ∑ 𝑦 2 = 124883
∑ 𝑥𝑦 = 49933 (∑ 𝑦 )2 = 11152 = 1243225
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ]
10(49933)−(448)(1115)
𝑟=
√[10(20624)−20074][10(124883)−1243225]
𝑟 = −0.034
The computed r = -0.034 indicates a very weak negative correlation between teenagers’
Self Efficacy Score and Intelligence Quotient or since the computer r (-0.034) is very close to zero
(0), we can say that there is no correlation between SES and IQ. Therefore, we can conclude that
there is no linear association between teenagers’ SES and IQ.
Practice Exercise 8.1.1
Direction: Interpret the following scatter diagram.
25
20
1.
15
10
0
0 10 20 30 40
25
20
15
2.
10
0
0 10 20 30 40
25
20
15
3.
10
0
0 10 20 30 40
25
20
15
10
4.
5
0
0 10 20 30 40
25
5.
20
15
10
0
0 10 20 30 40
Direction: Interpret the following correlation coefficient r.
1. 𝑟 = 0.781
2. 𝑟 = −0.95
3. 𝑟 = −0.389
4. 𝑟 = 0.53
5. 𝑟 = −0.49
6. 𝑟 = −0.88
7. 𝑟 = −0.001
8. 𝑟 = 1.0
9. 𝑟 = 0.27
10. 𝑟 = −0.56
Direction: For each of the given problems,
(a) sketch and interpret the scatter diagram,
(b) compute the correlation coefficient
(c) draw the necessary conclusion.
1. Given are the scores of ten randomly selected Grade 11 students in their long quiz in Statistics
and Basic Math.
Statistics(X) 18 15 13 16 13 10 13 15 10 14
Basic
Math(Y) 19 17 14 15 14 11 12 14 17 13
2. Chapman and Demeritt (Elements of Forest Mensuration, 2nd ed., Albany, NY, J.B. Lyon
Company [now Williams Press], 1936) reported diameters (in inches) and ages (in years) of oak
trees.
Age(X) 4 5 8 8 8 10 10 12 13 30
Diameter(Y) 0.8 0.8 1 2 3 2 3.5 4.9 3.5 6
3. Below are the prices (pesos per kg) and supply (in kg) of Dragon fruit of in 10 supermarkets in
Cavite
Supply
(X) 128 132 95 105 125 112 132 100 140 130
85 90 120 115 110 110 100 120 95 90
Price (Y)
4. A study was conducted to determine the relationship between daily allowance and weekly
expenses on cellphone load. The data is presented below.
Allowance
(X) 1500 1200 800 750 600 750 1000 1000 900 700
Expenses 100 150 120 100 110 150 150 120 150 120
(Y)
5. A recent study claims that the number of casinos and crime in a certain city are linearly related
such that city with more number of casinos have higher crime rate. To test the claim, a group of
researchers conducted a research on 8 major cities in CALABARZON and gathered the following
information:
Cities A B C D E F G H
Number of
Casinos 7 9 12 11 14 5 7 9
Crime rate 1.8 1.6 2.2 2.4 2.3 1.1 1.4 1.9
8.2 REGRESSION ANALYSIS
Regression Analysis is a statistical approach use to determine relationship between variables-

the dependent and independent variables. In most cases, a technique concerned in predicting a
value of the dependent variable (Y) for some values of independent variable (X). With sufficient
data the Regression Analysis a researcher can predict the academic performance as measured by
GPA of a college student given his entrance exam score or high school grade, the height of an
adult given his length when born, the growth of mold spores using amount of moisture, and the
like. Regression is also known as a powerful curve/line fitting technique because it can generate
an equation that would best-fit the data points. The most common regression technique is the
Simple Linear Regression. This is a regression model that determines linear relationship between
one dependent variable and one independent variable. This technique aims to estimate an
equation of the line that would best fit the data points. The regression equation is in the form:
𝑦̂ = 𝑎 + 𝑏𝑥
Where:
𝑎 = 𝑡ℎ𝑒 𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝑏 = 𝑡ℎ𝑒 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑖𝑛𝑒
Formula to estimate the slope of the regression line:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
[𝑛 ∑ 𝑥2 − (∑ 𝑥)2 ]
And the formula to estimate the y-intercept is:
𝑎 = 𝑦̅ + 𝑏𝑥̅
Example 1. Given the following data for a mother’s height and her daughter’s height in inches:
Mother’s 63 67 64 60 65 67 59 60
Height
Daughter’s 63.6 64.7 65.3 61 65.4 67.4 60.9 63.1
Height
a. Find the best fit linear equation that relates the mother’s height to her daughter’s height.
b. Sketch the regression line in the scatter plot.
c. What is the best predicted height for a daughter whose mother’s height is 66 inches tall?
Solution:
Mother's Daughter's
x2 y2 xy
Height (X) Height (Y)
63 63.6 3969 4045 4006.8

67 64.7 4489 4186.1 4334.9
64 65.3 4096 4264.1 4179.2
60 61 3600 3721 3660
65 65.4 4225 4277.2 4251
67 67.4 4489 4542.8 4515.8
59 60.9 3481 3708.8 3593.1
60 63.1 3600 3981.6 3786
505 511.4 31949 32726.48 32326.8
SUM
MEAN 63.125 63.925
Solving for the slope of the regression line:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 8(32326.8) − (505)(511.4)
𝑏= =
[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ] 8(31949) − (5052 )
357.4
𝑏= = 0.63
567
Calculating the y-intercept:
𝑎 = 𝑦̅ − 𝑏𝑥̅ = 63.925 − 0.63(63.125)
𝑎 = 24.14
Answer:
a. The equation of the regression line is

𝐷𝑎𝑢𝑔ℎ𝑡𝑒𝑟 ′ 𝑠 𝐻𝑒𝑖𝑔ℎ𝑡 (𝑌̂) = 24.13 + 0.63 𝑀𝑜𝑡ℎ𝑒𝑟 ′ 𝑠 𝐻𝑒𝑖𝑔ℎ𝑡 (𝑥) or simply
𝑦̂ = 24.14 + 0.63 𝑥
b. Sketch of the regression line
68
67
y = 0.63x + 24.16
66
65
64
63
62
61
60
58 59 60 61 62 63 64 65 66 67 68
c. What is the best predicted height for a daughter whose mother’s height is 66 inches tall?
Daughter’s Height = 24.14 + 0.63 * 66

=65.72 inches
The predicted height for a daughter whose mother’s height is 66 inches is 65.74 inches.
Interpreting the slope and y-intercept of the regression line.

From the equation of the regression line 𝑦̂ = 𝑎 + 𝑏𝑥; the slope b can be interpreted as
the amount of change in y for every one-unit change in x. When b is positive we can say that y is
directly proportional to x, that means when x variable increases its value the value of y variable
also increases, and if b is negative it implies that y is indirectly proportional to x such that
whenever x variable increases the value of y variable decreases.
The y-intercept from the equation of the regression line is the predicted value of y
whenever x = 0. However, in practice, y-intercept is only meaningful when the data contains 0
value for variable x or when 0 value for variable x is allowable.
Example 2. From Example 1 of 6.2.1, the equation of the regression line is
𝐷𝑎𝑢𝑔ℎ𝑡𝑒𝑟 ′ 𝑠 𝐻𝑒𝑖𝑔ℎ𝑡 (𝑦̂) = 24.14 + 0.63 𝑀𝑜𝑡ℎ𝑒𝑟 ′ 𝑠 𝐻𝑒𝑖𝑔ℎ𝑡 (𝑥)
From the result, the slope of the line b=0.63, suggests that for every 1-inch change in
mother’s height, there is a 0.63-inch change in daughter’s height.
Furthermore, since b is positive, we can say that a daughter’s height is directly
proportional to her mother’s height. We can say that daughter’s height is affected by their
mother’s height. Thus, we can conclude that taller mother has taller daughter.
The y-intercept equals 24.14 is meaningless because we cannot assume a zero value for
mother’s height.
Example 3. Using the data in Example 2, (a) determine and interpret the equation of the
regression line, and (b) estimate the mileage of a car which is 1000 kg in weight. Use weight as
the independent variable and mileage the dependent variable.
Solution: Let X = weight and Y = mileage
X Y X2 Y2 XY
1080 14 1166400 196 15120
988 20 976144 400 19760
1140 16 1299600 256 18240
1250 12 1562500 144 15000
1178 12 1387684 144 14136
980 18 960400 324 17640
1050 15 1102500 225 15750
1095 16 1199025 256 17520
1225 11 1500625 121 13475
1180 13 1392400 169 15340
1010 17 1020100 289 17170
1160 12 1345600 144 13920

SUM 13336 176 14912978 2668 193071
MEAN 1111.33 14.67
Solving for the slope of the regression line:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 12(193071) − (13336)(176)
𝑏= =
[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ] 12(14912978) − (133362 )
−30284
𝑏= = −0.027
11060840
Calculating the y-intercept:
𝑎 = 𝑦̅ − 𝑏𝑥̅ = 14.67 − (−0.027)(1111.33)
𝑎 = 45.07
Answer: The equation of the regression line is

𝑀𝑖𝑙𝑒𝑎𝑔𝑒 (𝑦̂) = 45.07 − 0.027 𝑤𝑒𝑖𝑔ℎ𝑡 (𝑥) or 𝑦̂ = 24.16 + 0.63 𝑥
The slope of the regression b=-0.027 suggests that car’s mileage is indirectly proportional
to its weight. This also indicates that the increase of car’s weight by 1 kg, might decrease the car’s
mileage by 0.027 km per liter.
Estimating the mileage of car which weight is 1000k, we have;
𝑀𝑖𝑙𝑒𝑎𝑔𝑒 (𝑦̂) = 45.07 − 0.027(1000) = 𝟏𝟖. 𝟎𝟕
Therefore, the estimated mileage of a 1000 kg car is 18.07 km/L.
Performing Correlation and Regression using Calculator
Source: https://edu.casio.com/support/qsg/pdf/991EX_570EX/05_CASIO_QuickStartGuide_fx-991EX_fx-
570EX_STATISTICS.pdf
Performing correlation and regression analysis using Excel
1. Click “Data Analysis” Icon and select Regression
2. Input the “y” data range and the “X” data range and choose the cell for output
range
3. The output shows the value of the correlation coefficient “r” and the values of y-
intercept “a” and the slope of the “b”.
From the output:
𝑟 = 0.893986 = 0.89
𝑎 = 24.1351 𝑎𝑛𝑑 𝑏 = 0.6303
The equation of the regression line is :
𝐷𝑎𝑢𝑔ℎ𝑡𝑒𝑟 ′ 𝑠 ℎ𝑒𝑖𝑔ℎ𝑡 (𝑦) = 24.1351 + 0.6303 𝑀𝑜𝑡ℎ𝑒𝑟 ′ 𝑠 ℎ𝑒𝑖𝑔ℎ𝑡 (𝑥)
Direction: Identify the two variables considered in each statement and indicate which is the
independent variable and the dependent variable.
1. The score in an examination of a student tends to increase as he/she studies longer.
2. The number of hours a child plays his computer/online games makes him more prone to eye
problems in the near future.
3. An increase in the number of dengue patients in a locality increased the number Dengue
Awareness programs in the locality.
4. Less incidents of stress in a workplace produce productive employees.
5. Popularity of politicians is dependent on his/her exposure on televisions and social media.
6. The life of a light bulb depends on the length of time it is turned on daily.
7. The number of enrollees in a university increased as the number of board passers from the
university increased.
8. The number of foreign tourists increased as islands in the country were developed.
9. The sale of cars decreased with the implementation of the new taxation scheme for car
sales.
10. The incident of motorcycle incidents increases as the sales of motorcycles increased.

Direction. Solve the following problems as indicated.
1. Chapman and Demeritt (Elements of Forest Mensuration, 2nd ed., Albany, NY, J.B. Lyon
Company [now Williams Press], 1936) reported diameters (in inches) and ages (in years) of oak
trees.
Age(X) 4 5 8 8 8 10 10 12 13 30
Diameter(Y) 0.8 0.8 1 2 3 2 3.5 4.9 3.5 6
a. Estimate the equation of the regression line. What conclusion can be made?
b. Sketch the graph of the computed regression line.
c. What is the estimated diameter of a 20-year old oak tree?
2. Giovanni L. Nazareno, a business man from Cavite, owns 10 fast-food restaurants in 10 towns
of Cavite. He wants to know if the town’s population affects the monthly sales. The monthly
average sales, in millions of pesos, and the town’s population, in hundred thousand, is given
below:
TOWN A B C D E F G H I J
POPULATION 3.25 7.72 8.65 9.74 5.76 4.38 6.41 8.53 9.12 6.15
SALES 1.12 1.56 1.75 1.98 1.64 1.21 1.48 1.73 2.07 1.78
a. Find the best fitted equation of the regression line.
b. How does population affect the sales of the fast-food restaurant?
c. If the population of Town A will be increased by 120000, what is the expected monthly
sales.
3. The systolic and diastolic pressure readings of 12 randomly selected senior citizens (aged 60-
70 years old) were recorded.
Systolic 135 130 135 140 120 125 120 130 130 144 143 140 125 150
Diastolic 102 100 105 110 80 90 80 95 80 98 105 112 88 120
Using the output of Microsoft Excel,
(a) Estimate the equation of the regression line.
(b) How does senior citizens systolic blood pressure relate to diastolic blood pressure?
4. Data on Biological Oxygen Demand, Dissolved Oxygen and Diversity is available for 16 sites
on the Calder Catchment. It is hypothesized that the level of Diversity depends on the level of
BOD - the higher the level of BOD, the more polluted the river and the less Diversity of life
(insects, fish, plants etc.)
Sites BOD Diversity

1 2.1 5.3
2 1.3 5.1
3 3.5 2.5
4 3.1 3
5 1.9 5.6
6 1.3 5.3
7 2.3 3.1
8 1.8 4.6
9 1.5 5.6
10 3.5 2.6
11 2.7 3.1
12 1.1 7.1
13 9.3 1.4
14 7.4 1.8
15 12.3 1
16 1.4 6.3
Using the output of Microsoft Excel,
a. Determine the best fit linear equation that relates Diversity to BOD.
b. Is the hypothesis true? What is conclusion can be made?

SUPPLEMENTAL MATERIALS FOR CHAPTER 8.
https://www.khanacademy.org/math/ap-statistics/bivariate-data-ap/correlation-coefficient-
r/v/correlation-coefficient-intuition-examples
https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-
data/more-on-regression/v/regression-line-example
https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-
Regression_print.html
Answer to Practice Exercise in Chapter 8
1. Moderate negative correlation

2. Very strong positive correlation
3. Very weak positive correlation
4. Very strong negative correlation
5. No correlation
1. Strong positive correlation 2. Very strong negative correlation

3. Weak negative correlation 4. Moderate positive correlation
5. Moderate negative correlation 6. Very strong negative correlation
7. No correlation 8. Perfect positive correlation
9. Weak positive correlation 10. Moderate negative correlation
. 1. scatter plot
Correlation coefficient: r = 0.521

Interpretation: There is moderate positive correlation between the scores in
Statistics and Basic Math. This implies that as score in Statistics tends to
increase, the score in Basic Math tends to increase as well. It can also be concluded
that there is a direct relationship between the scores in Statistics and Basic Math.
2. scatter plot
Interpretation: There is very strong positive correlation between the age and
diameter of oak trees. This implies that as age of oak trees tends to increase,
the diameter of oak trees tends to increase as well. It can also be concluded that
there is a direct relationship between the age and diameter of oak trees.
3. scatter plot
Correlation coefficient: r = -0.863

Interpretation: There is very strong negative correlation between the supply and
price of dragon fruit. This implies that as supply of dragon fruit tends to increase,
the price of the dragon fruit tends to decrease. It can also be concluded that higher
supply of dragon fruit, in general, its prices become lower.
4. scatter plot
Interpretation: There is no correlation between the students’ daily allowance and

weekly expenses on load. This implies that there is no linear association between
the students’ daily allowance and weekly expenses on load.
5. scatter plot
Interpretation: There is very strong positive correlation between the number of

casinos and crime in a certain city. This implies that as number of casinos tends to
increase, the crime tends to increase as well. It can also be concluded that there is
a direct relationship between number of casinos and crime in a certain city.
Independent variable Dependent variable

1. Time to study Score in exam
2. Number of hours play Number of eye problem
3. Number of dengue awareness Number of dengue patient
4. Number of incidents of stress Number of product produce
5. Exposure on TV and social media Popularity of politician
6. Length of time turned on Life of light bulb
7. Number of enrollees Number of board passer
8. Number of islands developed Number of foreign tourists
9. New taxation scheme Sales of car
10. Sales of motorcycle Motorcycle incidents
1. a. equation of the regression line:
𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟 (𝑦̂) = 0.554 + 0.203(𝑎𝑔𝑒)
Conclusion: From the result, the slope of the line b=0.203, suggests that for every 1 year
change in age of the oak tree, there is a 0.203-inch change in its diameter.
The y-intercept equals 0.554 is meaningless because we cannot assume a zero value for the
age of the tree.
b. Graph
c. Estimated diameter of a 20 year-old oak
𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟 (𝑦̂) = 0.554 + 0.203(20) = 𝟒. 𝟔𝟏𝟒 𝒊𝒏𝒄𝒉𝒆𝒔
Therefore, the estimated diameter of a 20 year-old oak is 4.614 inches.
2. a. The best fitted equation of the regression line.
𝑠𝑎𝑙𝑒𝑠 (𝑦̂) = 0.758 + 0.125(𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛)
b. How does population affect the sales of the fast-food restaurant?
For every 1 unit increase in the population on the average 0.125 increase in its sales.
c. Expected monthly sales if the population of Town A will be increased by 120000
𝑠𝑎𝑙𝑒𝑠 (𝑦̂) = 0.758 + 0.125(4.45) = 1.314
Therefore the expected monthly sales if the population of Town A will be increased by
120000 is P1,314,000.
𝑠𝑦𝑠𝑡𝑜𝑙𝑖𝑐 (𝑦̂) = 71.61 + 0.63(𝑑𝑖𝑎𝑠𝑡𝑜𝑙𝑖𝑐)
b. How does senior citizens systolic blood pressure relate to diastolic blood pressure?
For every one unit increase in diastolic blood pressure, there will be an average increase of
0.63 in the systolic blood pressure.
𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦(𝑦̂) = 5.60 − 0.46(𝐵𝑂𝐷)
b. The hypothesis is true. It can be concluded that for every one level increase in BOD, the Diversity
of life decreases by 0.46.

Chapter 8. Correlation and Regression Analyses

Uploaded by

Copyright:

Available Formats

Chapter 8. Correlation and Regression Analyses

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 8. Correlation and Regression Analyses

Uploaded by

Copyright:

Available Formats

Chapter 8.

CORRELATION AND REGRESSION ANALYSES

At the end of this chapter, the students should be able to:

8.1 CORRELATION ANALYSIS

Understanding Bivariate Data

Fig 1. Perfect Positive Correlation Fig 2. Perfect Negative Correlation

Fig 5. Strong Positive Correlation Fig 6. Strong Negative Correlation

Fig 7. Moderate Positive Correlation Fig 8. Moderate Negative Correlation

Fig 9. Weak Positive Correlation Fig 10. Weak Negative Correlation

Fig 11. No Correlation Fig 12. No Correlation

Steps in Constructing Scatter Plot

3. Interpret the graph.

-1 Perfect Negative Correlation

-0.8 to -0.99 Very Strong Negative Correlation

-0.6 to -0.79 Strong Negative Correlation

-0.4 to -0.59 Moderate Negative Correlation

-0.2 to -0.39 Weak Negative Correlation

-0.01 to -0.19 Very Weak Negative Correlation

0.01 to 0.19 Very Weak Positive Correlation

0.2 to 0.39 Weak Positive Correlation

0.4 to 0.59 Moderate Positive Correlation

0.6 to 0.79 Strong Positive Correlation

0.8 to 0.99 Very Strong Positive Correlation

1 Perfect Positive Correlation

The Pearson correlation Coefficient r is computed using the formula:

𝑛 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 ∑ 𝑥 2 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑥

∑ 𝑥 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑥 (∑ 𝑥 )2 = 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑥

∑ 𝑦 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑦 ∑ 𝑦 2 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑦

∑ 𝑥𝑦 = 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑥 𝑎𝑛𝑑 𝑦

(∑ 𝑦 )2 = 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑦

Compute the Pearson correlation r and interpret.

From the solution table, we have;

∑ 𝑥𝑦 = 4608 (∑ 𝑦 )2 = 5042 = 254016

Weight (kg) Mileage (km/L)

1160 12 1345600 144 13920

From the solution table, we have;

∑ 𝑥 = 13336 (∑ 𝑥 )2 = 133362 =177848896

∑ 𝑥𝑦 = 4608 (∑ 𝑦 )2 = 1762 = 30976

Solution: Let X = SES and Y = IQ

∑ 𝑥 = 448 (∑ 𝑥 )2 = 4482 = 20074

∑ 𝑥𝑦 = 49933 (∑ 𝑦 )2 = 11152 = 1243225

Direction: Interpret the following scatter diagram.

Practice Exercise 8.1.2

Direction: Interpret the following correlation coefficient r.

Practice Exercise 8.1.3

Direction: For each of the given problems,

(a) sketch and interpret the scatter diagram,

(b) compute the correlation coefficient

(c) draw the necessary conclusion.

Regression Analysis is a statistical approach use to determine relationship between variables-

𝑏 = 𝑡ℎ𝑒 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑖𝑛𝑒

Formula to estimate the slope of the regression line:

And the formula to estimate the y-intercept is:

63 63.6 3969 4045 4006.8

Calculating the y-intercept:

𝑎 = 𝑦̅ − 𝑏𝑥̅ = 63.925 − 0.63(63.125)

a. The equation of the regression line is

Daughter’s Height = 24.14 + 0.63 * 66

Interpreting the slope and y-intercept of the regression line.