Pearson and Correlation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

EUNICE F.

ORTILE NOVEMBER 27,2020

BSA-IIIX

RESEARCH ON THE FOLLOWING:

1. MEASURES OF VARIABILITY
Variability refers to how spread scores are in a distribution out; that
is, it refers to the amount of spread of the scores around the mean. For
example, distributions with the same mean can have different amounts of
variability or dispersion.

Measures of Variability

There are many ways to describe variability or spread including:

 Range
 Interquartile range (IQR)
 Variance and Standard Deviation
Range
The range is the difference in the maximum and minimum values of a data
set. The maximum is the largest value in the dataset and the minimum is the
smallest value. The range is easy to calculate but it is very much affected by
extreme values.
Range=maximum−minimum
the range, the IQR is a measure of variability, but you must find the
quartiles in order to compute its value.

Interquartile Range (IQR)


The interquartile range is the difference between upper and lower quartiles
and denoted as IQR.
IQR=Q3−Q1=upper quartile−lower quartile=75th percentile−25th percentile

Variance and Standard Deviation


One way to describe spread or variability is to compute the standard
deviation. In the following section, we are going to talk about how to
compute the sample variance and the sample standard deviation for a data
set. The standard deviation is the square root of the variance.

Variance
the average squared distance from the mean
Population variance
σ2=∑i=1N(xi−μ)2N
where μ is the population mean and the summation is over all possible values
of the population and N is the population size.
  σ2 is often estimated by using the sample variance.

Sample Variance
s2=∑i=1n(xi−x¯)2n−1=∑i=1nxi2−nx¯2n−1
Where n is the sample size and x¯ is the sample mean.
Why do we divide by n−1 instead of by n?

When we calculate the sample sd we estimate the popoulation mean with the
sample mean, and dividing by (n-1) rather than n which gives it a special
property that we call an "unbiased estimator". Therefore s2 is an unbiased
estimator for the population variance.

The sample variance (and therefore sample standard deviation) are the
common default calculations used by software. When asked to calculate the
variance or standard deviation of a set of data, assume - unless otherwise
instructed - this is sample data and therefore calculating the sample
variance and sample standard deviation.

2.PEARSON AND SPEARMAN CORRELATION

What is correlation?

A correlation coefficient measures the extent to which two variables


tend to change together. The coefficient describes both the strength and
the direction of the relationship. Minitab offers two different correlation
analyses:

Pearson product moment correlation


The Pearson correlation evaluates the linear relationship between two
continuous variables. A relationship is linear when a change in one variable is
associated with a proportional change in the other variable.

For example, you might use a Pearson correlation to evaluate whether


increases in temperature at your production facility are associated with
decreasing thickness of your chocolate coating.

Spearman rank-order correlation

The Spearman correlation evaluates the monotonic relationship between two


continuous or ordinal variables. In a monotonic relationship, the variables
tend to change together, but not necessarily at a constant rate. The
Spearman correlation coefficient is based on the ranked values for each
variable rather than the raw data.

Spearman correlation is often used to evaluate relationships involving ordinal


variables. For example, you might use a Spearman correlation to evaluate
whether the order in which employees complete a test exercise is related to
the number of months they have been employed.

It is always a good idea to examine the relationship between variables with a


scatterplot. Correlation coefficients only measure linear (Pearson) or
monotonic (Spearman) relationships. Other relationships are possible.

Comparison of Pearson and Spearman coefficients

The Pearson and Spearman correlation coefficients can range in value from −1
to +1. For the Pearson correlation coefficient to be +1, when one variable
increases then the other variable increases by a consistent amount. This
relationship forms a perfect line. The Spearman correlation coefficient is also

+1 in this case.
Pearson = +1, Spearman = +1
If the relationship is that one variable increases when the other increases, but
the amount is not consistent, the Pearson correlation coefficient is positive
but less than +1. The Spearman coefficient still equals +1 in this case.

Pearson = +0.851, Spearman = +1


When a relationship is random or non-existent, then both correlation

coefficients are nearly zero.

Pearson = −0.093, Spearman = −0.093


If the relationship is a perfect line for a decreasing relationship, then both

correlation coefficients are −1.

Pearson = −1, Spearman = −1

If the relationship is that one variable decreases when the other increases,
but the amount is not consistent, then the Pearson correlation coefficient is
negative but greater than −1. The Spearman coefficient still equals −1 in this

case

Pearson = −0.799, Spearman = −1


Correlation values of −1 or 1 imply an exact linear relationship, like that
between a circle's radius and circumference. However, the real value of
correlation values is in quantifying less than perfect relationships. Finding that
two variables are correlated often informs a regression analysis which tries to
describe this type of relationship more.

Other nonlinear relationships

Pearson correlation coefficients measure only linear relationships. Spearman


correlation coefficients measure only monotonic relationships. So a meaningful
relationship can exist even if the correlation coefficients are 0. Examine a

scatterplot to determine the form of the relationship.

Coefficient of 0

This graph shows a very strong relationship. The Pearson coefficient and
Spearman coefficient are both approximately 0.

3.EXAMPLE OF EACH TOOL

EXAMPLE QUESTION FOR SPEARMAN CORRELATION

The scores for nine students in history and algebra are as follows:

Compute the Spearman rank correlation.

The scores for nine students in history and algebra are as follows:

 History: 35, 23, 47, 17, 10, 43, 9, 6, 28


 Algebra: 30, 33, 45, 23, 8, 49, 12, 4, 31

Compute the Spearman rank correlation.

Answer

1. Step 1: rank each student


History Rank Algebra Rank

35 3 30 5

23 5 33 3

47 1 45 2

17 6 23 6

10 7 8 8

43 2 49 1

9 8 12 7

6 9 4 9

28 4 31 4

2. step 2: calculate difference between the ranks (d) and square d

History Rank Algebra Rank d d2

35 3 30 5 2 4

23 5 33 3 2 4

47 1 45 2 1 1

17 6 23 6 0 0

10 7 8 8 1 1

43 2 49 1 1 1

9 8 12 7 1 1

6 9 4 9 0 0
28 4 31 4 0 0

3. step 3: sum (add up) all the d2 scores

 Σ d2 = 4 + 4 + 1 + 0 + 1 + 1 + 1 + 0 + 0 = 12

4. step 4: insert the values in the formula. These ranks are not tied, so use the
first formula:

ρ=1−6∑d2in(n2−1) ρ=1−6∑di2n(n2−1)

= 1 – (6*12)/(9(81-1))
= 1 – 72/720
= 1-0.1
= 0.9

> The Spearman Rank Correlation for this set of data is 0.9.

EXAMPLE OF PEARSON CORRELATION

The Pearson correlation coefficient is denoted by the letter “r”. The formula for
Pearson correlation coefficient r is given by:

Where,
r = Pearson correlation coefficient
x = Values in the first set of data
y = Values in the second set of data
n = Total number of values.

Solved Example

Question: Marks obtained by 5 students in algebra and trigonometry as given


below:
Algebra      15      16      12     10      8

   Trigonometry 18 11 10 20 17

Calculate the Pearson correlation coefficient.

Solution:

Construct the following table:

x y x2 y2 xy

15 18 225 324 270

16 11 256 121 176

12 10 144 100 120

10 20 100 400 200

8 17 64 289 136

    ∑x = 61     ∑y = 76     ∑x2 = 789    ∑y2 = 1234    ∑xy = 902

Formula for Pearson correlation coefficient is given by:

r = n(∑xy)−(∑x)(∑y)[n∑x2−(∑x)2][n∑y2−(∑y)2]√
r = 5×902–61×76[5×789–(61)2][5×1234–(76)2]√
r = -0.424

You might also like