Pearson and Correlation
Pearson and Correlation
Pearson and Correlation
BSA-IIIX
1. MEASURES OF VARIABILITY
Variability refers to how spread scores are in a distribution out; that
is, it refers to the amount of spread of the scores around the mean. For
example, distributions with the same mean can have different amounts of
variability or dispersion.
Measures of Variability
Range
Interquartile range (IQR)
Variance and Standard Deviation
Range
The range is the difference in the maximum and minimum values of a data
set. The maximum is the largest value in the dataset and the minimum is the
smallest value. The range is easy to calculate but it is very much affected by
extreme values.
Range=maximum−minimum
the range, the IQR is a measure of variability, but you must find the
quartiles in order to compute its value.
Variance
the average squared distance from the mean
Population variance
σ2=∑i=1N(xi−μ)2N
where μ is the population mean and the summation is over all possible values
of the population and N is the population size.
σ2 is often estimated by using the sample variance.
Sample Variance
s2=∑i=1n(xi−x¯)2n−1=∑i=1nxi2−nx¯2n−1
Where n is the sample size and x¯ is the sample mean.
Why do we divide by n−1 instead of by n?
When we calculate the sample sd we estimate the popoulation mean with the
sample mean, and dividing by (n-1) rather than n which gives it a special
property that we call an "unbiased estimator". Therefore s2 is an unbiased
estimator for the population variance.
The sample variance (and therefore sample standard deviation) are the
common default calculations used by software. When asked to calculate the
variance or standard deviation of a set of data, assume - unless otherwise
instructed - this is sample data and therefore calculating the sample
variance and sample standard deviation.
What is correlation?
The Pearson and Spearman correlation coefficients can range in value from −1
to +1. For the Pearson correlation coefficient to be +1, when one variable
increases then the other variable increases by a consistent amount. This
relationship forms a perfect line. The Spearman correlation coefficient is also
+1 in this case.
Pearson = +1, Spearman = +1
If the relationship is that one variable increases when the other increases, but
the amount is not consistent, the Pearson correlation coefficient is positive
but less than +1. The Spearman coefficient still equals +1 in this case.
If the relationship is that one variable decreases when the other increases,
but the amount is not consistent, then the Pearson correlation coefficient is
negative but greater than −1. The Spearman coefficient still equals −1 in this
case
Coefficient of 0
This graph shows a very strong relationship. The Pearson coefficient and
Spearman coefficient are both approximately 0.
The scores for nine students in history and algebra are as follows:
The scores for nine students in history and algebra are as follows:
Answer
35 3 30 5
23 5 33 3
47 1 45 2
17 6 23 6
10 7 8 8
43 2 49 1
9 8 12 7
6 9 4 9
28 4 31 4
35 3 30 5 2 4
23 5 33 3 2 4
47 1 45 2 1 1
17 6 23 6 0 0
10 7 8 8 1 1
43 2 49 1 1 1
9 8 12 7 1 1
6 9 4 9 0 0
28 4 31 4 0 0
Σ d2 = 4 + 4 + 1 + 0 + 1 + 1 + 1 + 0 + 0 = 12
4. step 4: insert the values in the formula. These ranks are not tied, so use the
first formula:
ρ=1−6∑d2in(n2−1) ρ=1−6∑di2n(n2−1)
= 1 – (6*12)/(9(81-1))
= 1 – 72/720
= 1-0.1
= 0.9
> The Spearman Rank Correlation for this set of data is 0.9.
The Pearson correlation coefficient is denoted by the letter “r”. The formula for
Pearson correlation coefficient r is given by:
Where,
r = Pearson correlation coefficient
x = Values in the first set of data
y = Values in the second set of data
n = Total number of values.
Solved Example
Trigonometry 18 11 10 20 17
Solution:
x y x2 y2 xy
8 17 64 289 136
r = n(∑xy)−(∑x)(∑y)[n∑x2−(∑x)2][n∑y2−(∑y)2]√
r = 5×902–61×76[5×789–(61)2][5×1234–(76)2]√
r = -0.424