Central Tendency
Central Tendency
Central Tendency
Central Tendency
Measure of Central Tendency:
A single summary score that best describes the central location of an entire distribution of scores.
The typical score. The center of the distribution.
Mean
The most commonly used measure of central tendency When people ask about the average of a group of scores, they usually are referring to the mean. The mean is the sum of all the scores in the distribution divided by the number of scores (the mathematical average). Is the balance point of a distribution.
Mean (cont)
Population
mu sigma, the sum of X, add up all scores
Sample
X = N
N, the total number of scores in a population sigma, the sum of x, add up all scores
X bar
X X= n
Mean (cont)
Exam Scores 75 82 72 68 89 91 78 94 88 75
X X= n
812 X = = 81 .2 10
Mean (cont)
2 4 2 4 3 4 3 4
Frequency Performance and Memory S tudy
6 5 4 3 2 1 0
4 10
40 X = =4 10
1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5
Cons
Influenced by extreme scores. May not exist in the data.
Median
The middle score of the distribution when all the scores have been ranked either in ascending or descending. Represents the exact center or middle of the distribution Appropriate for variables that are at least at the ordinal level Odd number of cases = (n+1)/2 th score Even number of cases = ((n/2)+(n/2 +1) th score)/2 average the two middle values together
8
What is the median suicide rate for the nine largest U.S. cities?
Rate 7.44 13.38 10.00 14.11 14.78 12.61 12.26 14.30 18.37 Total (N) City New York Los Angeles Chicago Houston Philadelphia San Diego Detroit Dallas Phoenix 9
9
n is odd (9 + 1) / 2 = 5 Now, find the 5th case The median suicide rate for the nine largest U.S. cities is 13.38 (not 5)
Median (cont)
2 2 3 3 4 4 4 4 4 10
11
Total(N)
7
12
To locate the median Arrange the responses in order from lowest to highest (or highest to lowest): Response
very dissatisfied very dissatisfied somewhat dissatisfied somewhat satisfied ( The middle case =Median) somewhat satisfied very satisfied very satisfied
13
Cons
May not exist in the data. Doesnt take actual values into account.
14
Mode
The most frequent score in the distribution. A distribution where a single score is most frequent has one mode and is called unimodal. When there are ties for the most frequent score, the distribution is bimodal if two scores tie or multimodal if more than two scores tie. Applications: Printing, Manufacturing, etc
For example, it is important to print more of the most popular books; because printing different books in equal numbers would cause a shortage of some books and an oversupply of others. For example, it is important to manufacture more of the most popular shoes; because manufacturing different shoes in equal numbers would cause a shortage of some shoes and an oversupply of others.
15
Mode (cont)
2 2 3 3 4 4 4 4 4 10
16
Mode (cont)
72 81 87
72 83 88
73 85 90
76 85 91
78 86 92
17
Mode (cont)
Mode is best measure of central tendency when data are not orderedlike the colors of cars in a parking lot.
Cons
Ignores most of the information in a distribution. Small samples may not have a mode.
20
21
Scales of Measurement
Nominal scale = mode Ordinal scale = median Ratio scale = mean, median, or mode Interval scale = mean, median, or mode
22
23
Symmetrical D istribution
16 14 12 10 8 6 4 2 0
Frequency
2 4 .5 2 9 .5 3 4 .5 3 9 .5 4 4 .5 4 9 .5 5 4 .5 5 9 .5 6 4 .5 6 9 .5
Score s
25
Distributions that are skewed have one side of the distribution where the data frequency tapers off
26
Skewed Distribution
P ositive S kew
12 10
Frequency
8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77
Score s
Skewed Distribution
Negative Skew
12 10
Frequency
8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77
Scores
The mean will either underestimate or overestimate the center of skewed distributions.
Positive Skew
12 10 12 10
Negative Skew
Frequency
Frequency
27 32 37 42 47 52 57 62 67 72 77
8 6 4 2 0
8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77
Scores
Scores
Dispersion
The spread of a set of scores around some central value Why it is important
It gives us additional information that enables us to judge the reliability of our measure of the central tendency
Two datasets can have the same average but very different variability.
Applications
Stock market Quality control Data set B Data set A
30
Measures of Variability
Range Interquartile Range Variance. Standard Deviation
31
Range
The difference between the highest and lowest score in a distribution Range = highest value - lowest value
Las Vegas Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Range: 891-52 = 839
32
Cons
Value depends only on two scores. Very sensitive to outliers. Influenced by sample size (the larger the sample, the larger the range).
33
Interquartile Range
Range of the middle half of scores IQR = Q3(Third quartile) Q1(First quartile) Las Vegas Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Interquartile Range: (35+1)/4 = 9 (Q1) 472 (Q3) 257(Q1) = 215
34
Cons
Discards much of the data.
35
Variance
Mean of all squared deviations from the mean. The average amount that a score deviates from the typical score. Score Mean = Difference Score Average of Difference Scores = 0 In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).
36
Variance: Formula
Population Sample
2
=
2
(X )
N
(X X ) S = n 1
2
sigma
37
3, 4, 4, 4, 6, 7, 7, 8, 8, 9
(X X ) S = n 1
2
X 60 X= = =6 n 10
S2 = S2 =
(3 6)2 + (4 6)2 + (4 6)2 + (4 6)2 + (6 6)2 + (7 6)2 + (7 6)2 + (8 6)2 + (8 6)2 + (9 6)2 9 40 = 4.4 5 9
38
Cons
Hard to interpret. Can be influenced by extreme scores.
39
Standard Deviation
Square root of the average of the squared distances of the observations from the mean To undo the squaring of difference scores, take the square root of the variance. Return to original units rather than squared units.
Population
Sample
= (X ) =
2
s= s
2
2
2
(X X ) S= n 1
40
Example
(X X ) S= n1
(3 6) 2 + (4 6) 2 + (4 6) 2 + (4 6) 2 + (6 6) 2 + (7 6) 2 + (7 6) 2 + (8 6) 2 + (8 6) 2 + (9 6) 2 S= 9 S= 40 = 2.11 9
41
Cons
Influenced by extreme scores.
42
And
If X = mean, s = standard deviation and x is a value in the data set, then: about 68% of the data lie in the interval X -s < x < X +s about 95% of the data lie in the interval X -2s < x < X +2s about 99% of the data lie in the interval X -3s < x < X +3s
43
Coefficient of Variation
It relates the standard deviation and the mean by expressing the standard deviation as a percentage of the mean Population Sample
(100)
S (100) X
44
C V = (100 ) O
= 500/40 =12.5%
CO = (100 ) V
= 1500/160 =9.4%
45