Descriptive Measure 241122 125046
Descriptive Measure 241122 125046
Descriptive Measure 241122 125046
Data Analysis
NUMERICAL DESCRIPTIVE
MEASURES
MEASURES OF CENTRAL TENDENCY FOR
UNGROUPED DATA
Mean
Median
Mode
Relationships among the Mean, Median, and Mode
Figure 1
Mean
The mean for ungrouped data is obtained by dividing the
sum of all values by the number of values in the data set. Thus,
x x1 x 2 x 3 x 4 x 5 x 6 x 7 x 8
319 199 110 63 21 315 26 63 1116
x
x
1116
139 . 5 $ 139 .5 million
n 8
x 362
45.25 years
N 8
21.6 21.7 22.9 25.2 26.5 28.0 28.2 32.6 32.9 70.1 76.1 84.5
There are 12 values in this data set. Because there are an even
number of values in the data set, the median is given by the
average of the two middle values.
Example 5: Solution
The two middle values are the sixth and seventh in the
arranged data, and these two values are 28.0 and 28.2.
28 . 0 28 . 2 56 . 2
Median 28 . 1 $ 28 . 1 million
2 2
77 82 74 81 79 84 74 78
Thus, the total areas of these four states are spread over a range of
217,626 square miles.
Range
Disadvantages
The range, like the mean, has the disadvantage of being
influenced by outliers. Consequently, the range is not a good
measure of dispersion to use for a data set that contains
outliers.
Its calculation is based on two values only: the largest and
the smallest. All other values in a data set are ignored when
calculating the range. Thus, the range is not a very
satisfactory measure of dispersion.
Variance and Standard Deviation
The standard deviation is the most-used measure of
dispersion.
2
x
2
and s 2
x x
2
N n 1
x x x
2 2
and s
N n 1
x
2
x
2
x N
2
x n
2
2 and s2
N n 1
x
2
x
2
x N 2
x n
2
and s
N n 1
where σ² is the population variance, s² is the sample variance,
σ is the population standard deviation, and s is the sample
standard deviation.
Example 12
Until about 2009, airline passengers were not charged for checked
baggage. Around 2009, however, many U.S. airlines started charging a
fee for bags. According to the Bureau of Transportation Statistics, U.S.
airlines collected more than $3 billion in baggage fee revenue in 2010.
The following table lists the baggage fee revenues of six U.S. airlines
for the year 2010. (Note that Delta’s revenue reflects a merger with
Northwest. Also note that since then United and Continental have
merged; and American filed for bankruptcy and may merge with
another airline.)
Find the variance and standard deviation for these data.
Example 12
Example 12: Solution
Let x denote the 2010 baggage fee revenue (in millions of
dollars) of an airline. The values of Σx and Σx2 are calculated
in Table 6.
Example 12: Solution
Step 1. Calculate Σx
The sum of values in the first column of Table 6 gives 2,854.
x
2
2 ,854
2
x n
2
1, 746 , 098
6
s2
n 1 6 1
1, 746 , 098 1,357 ,552 . 667
5
77 , 709 . 06666
Example 12: Solution
Step 4. Obtain the standard deviation
The standard deviation is obtained by taking the (positive) square root
of the variance:
x
2
x n
2
s 77 ,709 . 06666
n 1
278 . 7634601 $ 278 . 76 million
x
2
2
(449.30)
x 2
N
35,978.51
6
2 388.90
N 6
388.90 $19.721 thousand $19,721
mf
535
21.40 minutes
N 25
x
mf
832
16.64 orders
n 50
Thus, this mail-order company received an average of
16.64 orders per day during these 50 days.
Variance and Standard Deviation for Grouped Data
Basic Formulas for the Variance and Standard Deviation for
Grouped Data
f m f m x
2 2
2
and s 2
N n 1
( mf ) 2
mf
2
m f2
N
m f n
2
2 and s 2
N n 1
m f
2 ( mf ) 2
14 ,825
( 535 ) 2
N 25 3376
2
135 . 04
N 25 25
2
135 . 04 11 . 62 minutes
Thus, the standard deviation of the daily commuting times for these
employees is 11.62 minutes.
Example 17
The following data, reproduced from Table 10 of Example 15,
give the frequency distribution of the number of orders
received each day during the past 50 days at the office of a
mail-order company.
m 2
f
( mf ) 2
14,216
(832 ) 2
s2 n 50 7.5820
n 1 50 1
Each of the two points, 16 and 64, is 24 units away from the
mean.
(a) Find the values of the three quartiles. Where does the total
compensation of Michael D. White (CEO of DirecTV) fall in
relation to these quartiles?
(a) Find the values of the three quartiles. Where does the age of
28 years fall in relation to the ages of the employees?
kn
Pk Value of the th term in a ranked data set
100
21.6 21.7 22.9 25.2 26.5 28.0 28.2 32.6 32.9 70.1 76.1 84.5
kn (60)(12)
7.20th term 7th term
100 100
Example 22: Solution
The value of the 7.20th term can be approximated by the value
of the 7th term in the ranked data. Therefore,
Percentile rank of xi
Number of values less than xi
100
Total number of values in the data set
Example 23
Refer to the data on total compensations (in millions of
dollars) for the year 2010 of the 12 highest-paid CEOs of U.S.
companies given in Example 20. Find the percentile rank for
$26.5 million (2010 total compensation of Alan Mulally, CEO
of Ford Motor). Give a brief interpretation of this percentile
rank.
Example 23: Solution
The data on revenues arranged in increasing order is as
follows:
21.6 21.7 22.9 25.2 26.5 28.0 28.2 32.6 32.9 70.1 76.1 84.5
In this data set, 4 of the 12 values are less than $26.5 million.
Hence,
Example 23: Solution
Rounding this answer to the nearest integral value, we can
state that about 33% of these 12 CEOs had 2010 total
compensations of less than $26.5 million. Hence, 67% of these
12 CEOs had $26.5 million or higher total compensations in
2010.
BOX-AND-WHISKER PLOT
Definition
A plot that shows the center, spread, and skewness of a data
set. It is constructed by drawing a box and two whiskers that
use the median, the first quartile, the third quartile, and the
smallest and the largest values in the data set between the
lower and the upper inner fences.
Example 24
The following data are the incomes (in thousands of dollars)
for a sample of 12 households.