Pearson and Correlation

EUNICE F.
ORTILE NOVEMBER 27,2020
BSA-IIIX
RESEARCH ON THE FOLLOWING:
1. MEASURES OF VARIABILITY
Variability refers to how spread scores are in a distribution out; that
is, it refers to the amount of spread of the scores around the mean. For
example, distributions with the same mean can have different amounts of
variability or dispersion.
Measures of Variability
There are many ways to describe variability or spread including:
 Range
 Interquartile range (IQR)
 Variance and Standard Deviation
Range
The range is the difference in the maximum and minimum values of a data
set. The maximum is the largest value in the dataset and the minimum is the
smallest value. The range is easy to calculate but it is very much affected by
extreme values.
Range=maximum−minimum
the range, the IQR is a measure of variability, but you must find the
quartiles in order to compute its value.
Interquartile Range (IQR)

The interquartile range is the difference between upper and lower quartiles
and denoted as IQR.
IQR=Q3−Q1=upper quartile−lower quartile=75th percentile−25th percentile
Variance and Standard Deviation

One way to describe spread or variability is to compute the standard
deviation. In the following section, we are going to talk about how to
compute the sample variance and the sample standard deviation for a data
set. The standard deviation is the square root of the variance.
Variance
the average squared distance from the mean
Population variance
σ2=∑i=1N(xi−μ)2N
where μ is the population mean and the summation is over all possible values
of the population and N is the population size.
σ2 is often estimated by using the sample variance.
Sample Variance
s2=∑i=1n(xi−x¯)2n−1=∑i=1nxi2−nx¯2n−1
Where n is the sample size and x¯ is the sample mean.
Why do we divide by n−1 instead of by n?
When we calculate the sample sd we estimate the popoulation mean with the
sample mean, and dividing by (n-1) rather than n which gives it a special
property that we call an "unbiased estimator". Therefore s2 is an unbiased
estimator for the population variance.
The sample variance (and therefore sample standard deviation) are the
common default calculations used by software. When asked to calculate the
variance or standard deviation of a set of data, assume - unless otherwise
instructed - this is sample data and therefore calculating the sample
variance and sample standard deviation.
2.PEARSON AND SPEARMAN CORRELATION
What is correlation?
A correlation coefficient measures the extent to which two variables

tend to change together. The coefficient describes both the strength and
the direction of the relationship. Minitab offers two different correlation
analyses:
Pearson product moment correlation

The Pearson correlation evaluates the linear relationship between two
continuous variables. A relationship is linear when a change in one variable is
associated with a proportional change in the other variable.
For example, you might use a Pearson correlation to evaluate whether

increases in temperature at your production facility are associated with
decreasing thickness of your chocolate coating.
Spearman rank-order correlation
The Spearman correlation evaluates the monotonic relationship between two

continuous or ordinal variables. In a monotonic relationship, the variables
tend to change together, but not necessarily at a constant rate. The
Spearman correlation coefficient is based on the ranked values for each
variable rather than the raw data.
Spearman correlation is often used to evaluate relationships involving ordinal

variables. For example, you might use a Spearman correlation to evaluate
whether the order in which employees complete a test exercise is related to
the number of months they have been employed.
It is always a good idea to examine the relationship between variables with a

scatterplot. Correlation coefficients only measure linear (Pearson) or
monotonic (Spearman) relationships. Other relationships are possible.
Comparison of Pearson and Spearman coefficients
The Pearson and Spearman correlation coefficients can range in value from −1
to +1. For the Pearson correlation coefficient to be +1, when one variable
increases then the other variable increases by a consistent amount. This
relationship forms a perfect line. The Spearman correlation coefficient is also
+1 in this case.
Pearson = +1, Spearman = +1
If the relationship is that one variable increases when the other increases, but
the amount is not consistent, the Pearson correlation coefficient is positive
but less than +1. The Spearman coefficient still equals +1 in this case.
Pearson = +0.851, Spearman = +1

When a relationship is random or non-existent, then both correlation
coefficients are nearly zero.
Pearson = −0.093, Spearman = −0.093

If the relationship is a perfect line for a decreasing relationship, then both
correlation coefficients are −1.
Pearson = −1, Spearman = −1
If the relationship is that one variable decreases when the other increases,
but the amount is not consistent, then the Pearson correlation coefficient is
negative but greater than −1. The Spearman coefficient still equals −1 in this
case
Pearson = −0.799, Spearman = −1

Correlation values of −1 or 1 imply an exact linear relationship, like that
between a circle's radius and circumference. However, the real value of
correlation values is in quantifying less than perfect relationships. Finding that
two variables are correlated often informs a regression analysis which tries to
describe this type of relationship more.
Other nonlinear relationships
Pearson correlation coefficients measure only linear relationships. Spearman

correlation coefficients measure only monotonic relationships. So a meaningful
relationship can exist even if the correlation coefficients are 0. Examine a
scatterplot to determine the form of the relationship.
Coefficient of 0
This graph shows a very strong relationship. The Pearson coefficient and
Spearman coefficient are both approximately 0.
3.EXAMPLE OF EACH TOOL
EXAMPLE QUESTION FOR SPEARMAN CORRELATION
The scores for nine students in history and algebra are as follows:
Compute the Spearman rank correlation.
The scores for nine students in history and algebra are as follows:
 History: 35, 23, 47, 17, 10, 43, 9, 6, 28

 Algebra: 30, 33, 45, 23, 8, 49, 12, 4, 31
Compute the Spearman rank correlation.
Answer
1. Step 1: rank each student

History Rank Algebra Rank
35 3 30 5
23 5 33 3
47 1 45 2
17 6 23 6
10 7 8 8
43 2 49 1
9 8 12 7
6 9 4 9
28 4 31 4
2. step 2: calculate difference between the ranks (d) and square d
History Rank Algebra Rank d d2
35 3 30 5 2 4
23 5 33 3 2 4
47 1 45 2 1 1
17 6 23 6 0 0
10 7 8 8 1 1
43 2 49 1 1 1
9 8 12 7 1 1
6 9 4 9 0 0
28 4 31 4 0 0
3. step 3: sum (add up) all the d2 scores
 Σ d2 = 4 + 4 + 1 + 0 + 1 + 1 + 1 + 0 + 0 = 12
4. step 4: insert the values in the formula. These ranks are not tied, so use the
first formula:
ρ=1−6∑d2in(n2−1) ρ=1−6∑di2n(n2−1)
= 1 – (6*12)/(9(81-1))
= 1 – 72/720
= 1-0.1
= 0.9
> The Spearman Rank Correlation for this set of data is 0.9.
EXAMPLE OF PEARSON CORRELATION
The Pearson correlation coefficient is denoted by the letter “r”. The formula for
Pearson correlation coefficient r is given by:
Where,
r = Pearson correlation coefficient
x = Values in the first set of data
y = Values in the second set of data
n = Total number of values.
Solved Example
Question: Marks obtained by 5 students in algebra and trigonometry as given

below:
Algebra 15 16 12 10 8
Trigonometry 18 11 10 20 17
Calculate the Pearson correlation coefficient.
Solution:
Construct the following table:
x y x2 y2 xy
15 18 225 324 270
16 11 256 121 176
12 10 144 100 120
10 20 100 400 200
8 17 64 289 136
∑x = 61 ∑y = 76 ∑x2 = 789 ∑y2 = 1234 ∑xy = 902
Formula for Pearson correlation coefficient is given by:
r = n(∑xy)−(∑x)(∑y)[n∑x2−(∑x)2][n∑y2−(∑y)2]√
r = 5×902–61×76[5×789–(61)2][5×1234–(76)2]√
r = -0.424

Pearson and Correlation

Uploaded by

Copyright:

Available Formats

Pearson and Correlation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pearson and Correlation

Uploaded by

Copyright:

Available Formats

EUNICE F.

ORTILE NOVEMBER 27,2020

RESEARCH ON THE FOLLOWING:

There are many ways to describe variability or spread including:

Interquartile Range (IQR)

Variance and Standard Deviation

2.PEARSON AND SPEARMAN CORRELATION

A correlation coefficient measures the extent to which two variables

Pearson product moment correlation

For example, you might use a Pearson correlation to evaluate whether

Spearman rank-order correlation

The Spearman correlation evaluates the monotonic relationship between two

Spearman correlation is often used to evaluate relationships involving ordinal

It is always a good idea to examine the relationship between variables with a

Comparison of Pearson and Spearman coefficients

Pearson = +0.851, Spearman = +1

coefficients are nearly zero.

Pearson = −0.093, Spearman = −0.093

correlation coefficients are −1.

Pearson = −1, Spearman = −1

Pearson = −0.799, Spearman = −1

Other nonlinear relationships

Pearson correlation coefficients measure only linear relationships. Spearman

scatterplot to determine the form of the relationship.

3.EXAMPLE OF EACH TOOL

EXAMPLE QUESTION FOR SPEARMAN CORRELATION

Compute the Spearman rank correlation.

 History: 35, 23, 47, 17, 10, 43, 9, 6, 28

Compute the Spearman rank correlation.

1. Step 1: rank each student

2. step 2: calculate difference between the ranks (d) and square d

History Rank Algebra Rank d d2

3. step 3: sum (add up) all the d2 scores

EXAMPLE OF PEARSON CORRELATION

Question: Marks obtained by 5 students in algebra and trigonometry as given

Calculate the Pearson correlation coefficient.

Construct the following table:

15 18 225 324 270

16 11 256 121 176

12 10 144 100 120

10 20 100 400 200

∑x = 61 ∑y = 76 ∑x2 = 789 ∑y2 = 1234 ∑xy = 902

Formula for Pearson correlation coefficient is given by:

You might also like