Pearson R

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

1

Pearson’s r
CORRELATION

2
OBJECTIVES:
At the end of the lesson the students should be able to:

1. calculates the Pearson’s sample correlation coefficient


2. solves problems involving correlation analysis
3. identifies the independent and dependent variables
4. illustrates the nature of bivariate data.
5. constructs a scatter plot.

3
Correlation is a statistical method used to determine whether a
relationship between variables exists. Regression is a statistical
method used to describe the nature of the relationship between
variables, that is, positive or negative, linear or nonlinear.

The purpose of this chapter is to answer these questions statistically:


•Are two variables related?
•If so, what is the strength of the relationship?
•What type of relationship exists?

4
Are two variables related? If so, what is the strength of the
relationship?

Statisticians use a numerical measure to determine


whether two or more variables are related and to
determine the strength of the relationship between or
among the variables. This measure is called a correlation
coefficient.

5
For example, there are many variables that contribute to
heart disease, among them lack of exercise, smoking, heredity,
age, stress, and diet. Of these variables, some are more important
than others; therefore, a physician who wants to help a patient
must know which factors are most important.

6
What type of relationship exists?

What type of relationship exists. There are


two types of relationships: Simple and
Multiple.

7
In a simple relationship, there are two variables—an
independent variable, also called an explanatory
variable or a predictor variable, and a dependent
variable, also called a response variable.

8
Example of Simple Regression
For example, a manager may wish to see whether the
number of years the salespeople have been working for
the company has anything to do with the amount of
sales they make. This type of study involves a simple
relationship, since there are only two variables—years of
experience and amount of sales.

9
Multiple regression, two or more independent
variables are used to predict one dependent variable

For example, an educator may wish to


investigate the relationship between a student’s
success in college and factors such as the
number of hours devoted to studying, the
student’s GPA, and the student’s high school
background. This type of study involves several
variables.
10
A positive relationship exists when both variables
increase or decrease at the same time.
For instance, a person’s height and weight are
related; and the relationship is positive, since the
taller a person is, generally, the more the person
weighs.

11
In a negative relationship, as one variable
increases, the other variable decreases, and
vice versa.

For example, if you measure the strength


of people over 60 years of age, you will find
that as age increases, strength generally
decreases.

12
Pearson’s r Correlation Coefficient
Statisticians use a measure called the correlation
coefficient to determine the strength of the linear
relationship between two variables. There are several
types of correlation coefficients. The one explained in this
lesson is called the Pearson product moment correlation
coefficient (PPMC), named after statistician Karl Pearson,
who pioneered the research in this area.

13
Pearson’s r Correlation Coefficient
The correlation coefficient computed from the
sample data measures the strength and direction of a
linear relationship between two variables. The symbol for
the sample correlation coefficient is . The symbol for the
population correlation coefficient is (Greek letter rho).

14
Pearson’s r Correlation Coefficient
The range of the correlation coefficient is from -1 to +1. If
there is a strong positive linear relationship between the
variables, the value of r will be close to +1 . If there is a strong
negative linear relationship between the variables, the value of r
will be close to -1 . When there is no linear relationship between
the variables or only a weak relationship, the value of r will be
close to 0.

15
The graphs show the relationship
between the correlation coefficients
and their corresponding scatter plots.
Notice that as the value of the
correlation coefficient increases from
(parts a, b, and c), data values
become closer to an increasingly
stronger positive relationship. As the
value of the correlation coefficient
decreases from (parts d, e, and f), the
data values also become closer to a
straight line and again this suggests a
stronger negative relationship.

16
Formula for the Correlation Coefficient r

where n is the number of data in pairs.

17
EXAMPLE:
Absences and Final Grades. Compute the value of the correlation coefficient
for the data obtained in the study of the number of absences and the final
grade of the seven students in the statistics class.
No. of Final
Student absences
(x)
grade
(y)
xy x2 y2 (7) (3,745) - (57) (511)
r=
A 6 82 492 36 6,724 [7 (579) – (57)2][7 (38,993) – (511)2]
B 2 86 172 4 7,396
C 15 43 645 225 1,849 r = - 0.944
D 9 74 666 81 5,476
E 12 58 696 144 3,364 The value r of suggests a strong negative
F 5 90 450 25 8,100 relationship between a student’s final grade
and the number of absences a student has.
G 8 78 624 64 6,084 That is, the more absences a student has, the
lower is his or her grade.
Total 57 511 3,745 579 38,993
18
The Significance of the Correlation
Coefficient
A bivariate data – is data on each two
• The variables and are variables, where each value of one
linearly related. variable is paired with the value of
other variable.
• The variables are random
variables.
Example: absences of students to
• The two variables have a grade, calories to cholesterol, age
bivariate normal to blood pressure, height to
distribution. weight, etc.

19
Compute the value of the correlation Step 1:
coefficient for the data obtained in the 𝐻𝑜: ρ = 0
study of the number of absences and the There is no correlation between the and variables in the
population.
final grade of the seven students in the
𝐻𝑎: ρ ≠ 0
statistics class. And perform the There is a significant correlation between the variables in
hypothesis testing at α = 0.01 the population.

No. of Final Step 2:


Student absences grade α = 0.01; n=7
(x) (y) T.V. = ± 0.875
A 6 82 Step 3:
B 2 86 r = – 0.944
C 15 43 Step 4:
D 9 74 | – 0.944| ≥ |0.875|
E 12 58 The decision is to reject the null hypothesis.
F 5 90
Step 5:
G 8 78 Therefore, there is a significant correlation between the number of
Total absences and final grade of the students.
21
activity
22
1. Calories and Cholesterol. The number of Step 1:
calories and the number of milligrams of 𝐻𝑜: ρ = 0
cholesterol for a random sample of fast-food
𝐻𝑎: ρ ≠ 0
chicken sandwiches from seven restaurants are
shown here. Is there a relationship between the Step 2:
variables? α=
And perform the hypothesis testing at α = 0.05
n=
Random Calories Cholesterol T.V. =
Sample (x) (y)
Step 3:
A 390 43
B 535 45
C 720 80
D 300 50
E 430 55
F 500 52 Step 4:
G 440 60
23
Step 5:
2. Fifteen (10) randomly selected Grade 11 students Step 1:
took a Math Aptitude Test before they started 𝐻𝑜: ρ = 0
their Statistics and Probability subject perform the
hypothesis testing at α = 0.05
𝐻𝑎: ρ ≠ 0
Step 2:
Students
Aptitude Periodical α=
(x) (y)
n=
A 38 25
T.V. =
B 35 20 Step 3:
C 30 17
D 28 15
E 25 12
F 24 15
G 20 18
H 18 10
I 16 12 Step 4:
J 15 10
24
Step 5:
25

You might also like