Pearson R
Pearson R
Pearson R
Pearson’s r
CORRELATION
2
OBJECTIVES:
At the end of the lesson the students should be able to:
3
Correlation is a statistical method used to determine whether a
relationship between variables exists. Regression is a statistical
method used to describe the nature of the relationship between
variables, that is, positive or negative, linear or nonlinear.
4
Are two variables related? If so, what is the strength of the
relationship?
5
For example, there are many variables that contribute to
heart disease, among them lack of exercise, smoking, heredity,
age, stress, and diet. Of these variables, some are more important
than others; therefore, a physician who wants to help a patient
must know which factors are most important.
6
What type of relationship exists?
7
In a simple relationship, there are two variables—an
independent variable, also called an explanatory
variable or a predictor variable, and a dependent
variable, also called a response variable.
8
Example of Simple Regression
For example, a manager may wish to see whether the
number of years the salespeople have been working for
the company has anything to do with the amount of
sales they make. This type of study involves a simple
relationship, since there are only two variables—years of
experience and amount of sales.
9
Multiple regression, two or more independent
variables are used to predict one dependent variable
11
In a negative relationship, as one variable
increases, the other variable decreases, and
vice versa.
12
Pearson’s r Correlation Coefficient
Statisticians use a measure called the correlation
coefficient to determine the strength of the linear
relationship between two variables. There are several
types of correlation coefficients. The one explained in this
lesson is called the Pearson product moment correlation
coefficient (PPMC), named after statistician Karl Pearson,
who pioneered the research in this area.
13
Pearson’s r Correlation Coefficient
The correlation coefficient computed from the
sample data measures the strength and direction of a
linear relationship between two variables. The symbol for
the sample correlation coefficient is . The symbol for the
population correlation coefficient is (Greek letter rho).
14
Pearson’s r Correlation Coefficient
The range of the correlation coefficient is from -1 to +1. If
there is a strong positive linear relationship between the
variables, the value of r will be close to +1 . If there is a strong
negative linear relationship between the variables, the value of r
will be close to -1 . When there is no linear relationship between
the variables or only a weak relationship, the value of r will be
close to 0.
15
The graphs show the relationship
between the correlation coefficients
and their corresponding scatter plots.
Notice that as the value of the
correlation coefficient increases from
(parts a, b, and c), data values
become closer to an increasingly
stronger positive relationship. As the
value of the correlation coefficient
decreases from (parts d, e, and f), the
data values also become closer to a
straight line and again this suggests a
stronger negative relationship.
16
Formula for the Correlation Coefficient r
17
EXAMPLE:
Absences and Final Grades. Compute the value of the correlation coefficient
for the data obtained in the study of the number of absences and the final
grade of the seven students in the statistics class.
No. of Final
Student absences
(x)
grade
(y)
xy x2 y2 (7) (3,745) - (57) (511)
r=
A 6 82 492 36 6,724 [7 (579) – (57)2][7 (38,993) – (511)2]
B 2 86 172 4 7,396
C 15 43 645 225 1,849 r = - 0.944
D 9 74 666 81 5,476
E 12 58 696 144 3,364 The value r of suggests a strong negative
F 5 90 450 25 8,100 relationship between a student’s final grade
and the number of absences a student has.
G 8 78 624 64 6,084 That is, the more absences a student has, the
lower is his or her grade.
Total 57 511 3,745 579 38,993
18
The Significance of the Correlation
Coefficient
A bivariate data – is data on each two
• The variables and are variables, where each value of one
linearly related. variable is paired with the value of
other variable.
• The variables are random
variables.
Example: absences of students to
• The two variables have a grade, calories to cholesterol, age
bivariate normal to blood pressure, height to
distribution. weight, etc.
19
Compute the value of the correlation Step 1:
coefficient for the data obtained in the 𝐻𝑜: ρ = 0
study of the number of absences and the There is no correlation between the and variables in the
population.
final grade of the seven students in the
𝐻𝑎: ρ ≠ 0
statistics class. And perform the There is a significant correlation between the variables in
hypothesis testing at α = 0.01 the population.