Chapter 3 Descriptive Measures
Chapter 3 Descriptive Measures
Chapter 3 Descriptive Measures
3-1
CHAPTER 3
Descriptive Measures
GENERAL
OBJECTIVE
LESSON
OUTLINE
In the last chapter, you learned how to summarize and organize data using
tables, graphs, and charts. Descriptive measures are numbers calculated from
the data that describe certain characteristics of the data. This chapter will
focus on calculating some common descriptive measures. You should be
familiar with Chapter 3 of your textbook before beginning this chapter.
3.1
3.2
3.3
3.4
3.5
Measures of Center
Measures of Variation
The Five-Number Summary; Boxplots
Descriptive Measures for Populations; Use of Samples
Problems
3-2
Descriptive Measures
Weekly Salaries: Professor Hassett spent one summer working for a small
mathematical consulting firm. The firm employed a few senior consultants,
who made between $800 and $1050 per week; a few junior consultants, who
made between $400 and $450 per week; and several clerical workers, who
made $300 per week.
Because the first half of the summer was busier than the second half, more
employees were required during the first half. Table 3 1 displays typical
lists of weekly earnings for the two halves of the summer.
Table 3 1
Data Set I
300
300
300
400
300
450
940
800
300
450
Data Set II
300
400
300
300
940
300
450
1050
450
300
300
1050
400
Determine the mean, median, and mode for both sets of weekly salaries.
Solution
To calculate the mean, median, and mode for this data set we use the now
familiar Frequencies dialog box.
1. Enter the data into a variable named, SET_I.
2. Choose Analyze > Descriptive Statistics > Frequencies to open the
Frequencies dialog box.
3. Paste the variable, SET_I, into the Variable(s) box.
Section 3.1
3-3
Statistics
SET_I
N
Valid
Missing
13
0
Mean
483.8462
Median
400.0000
Mode
300.00
Sum
6290.00
3-4
Descriptive Measures
Next, we calculate the same descriptive measures for the second half of the
summer. We are naturally interested in knowing if the typical weekly salary
for the first half of summer is less, more, or the same as the salary in the
second half of summer. The mean, median, and mode can give us insight into
this question and tell us more about the two data sets.
1. Enter the data into a variable named, SET_II.
2. Follow the same procedure as above to calculate the mean, median, and
mode for this data set.
The descriptive measures for the variable, SET_II, are displayed in Figure
3 3.
Figure 3 3
Descriptive
measures for
Data Set II
Figures 3 2 and 3 3 give the number of cases in the data set that are Valid
and that are Missing. In order for a case to be Valid (only valid cases area
used in calculations) SPSS requires that all the variables for that case have a
value. There are no missing observations in either data file and all the
observations are valid.
We can see from the descriptive measures in Figures 3 2 and 3 3, that the
mean and median weekly salaries in the first half of summer are both larger
than the second half of summer. We also note that the mode weekly salary
(the most common weekly salary) is $300. This is the salary of the clerical
workers.
Another observation that we can make is that both data sets are right skewed.
We can infer this because in both data sets the mean is larger than the median.
This is because the mean is more strongly affected by the comparatively large
salaries of the senior consultants.
Section 3.2
3-5
Table 3 2
Arterial
blood
pressures
Solution
81.6
82.0
84.6
69.4
84.1
88.9
104.9
78.9
87.6
86.7
90.8
75.2
82.8
96.4
94.0
91.0
Type the data into a new data file named PRESSURE. The measures of
variation could be calculated by choosing Analyze > Descriptive Statistics >
Frequencies as we did in the previous example. Simply choose the
checkboxes for Mean, Std. deviation, Variance, and Range in the
Frequencies: Statistics dialog box (see Figure 3 1).
Alternatively, the Explore dialog box will calculate several common
measures of center and variation automatically.
1. Choose Analyze > Descriptive Statistics > Explore to open the
Explore dialog box (Figure 3 4).
3-6
Descriptive Measures
3. Click the Statistics button to open the Explore: Statistics dialog box
(Figure 3 5).
Checking the Descriptives button has SPSS calculate the mean, median,
mode, variance, standard deviation, minimum, maximum, range, and a
number of other descriptive measures, some of which will be discussed later
in the text.
4. Choose the checkbox for Descriptives and click the Continue button.
Figure 3 5
Explore:
Statistics
dialog box
Section 3.3
3-7
Figure 3 6
Descriptives
table from
the Explore
procedure
The mean and median (measures of the center) are displayed along with the
variance, standard deviation, and range (measures of variation). In later
chapters, we will discuss some of the other descriptive measures that are
displayed.
3-8
Descriptive Measures
TV-viewing Times: The A.C. Nielsen Company publishes data on TVviewing habits of Americans by various characteristics in Nielsen Report on
Television. Table 3 3 shows the weekly viewing times, in hours, for a
sample of 20 people. Determine and interpret the five-number summary for
these data.
Table 3 3
Weekly
TV-viewing
times
25
66
34
30
Solution
41
35
26
38
27
31
32
30
32
15
38
20
43
5
16
21
Type the data into a new data file named TIMES. The five-number summary
can be calculated by choosing Analyze > Descriptive Statistics >
Frequencies as before
1. Choose Analyze > Descriptive Statistics > Frequencies to open the
Frequencies dialog box.
2. Paste the variable, TIMES, into the Variable(s) box.
3. Click the Statistics button to open the Frequencies: Statistics dialog
box (Figure 3 1).
4. Choose the checkboxes for Quartiles, Median, Minimum, and
Maximum and then click the Continue button.
5. Click the OK button.
The five-number summary will be displayed in the Viewer window.
(Figure 3 7).
Figure 3 7
Five-number
summary:
Weekly
TV-viewing
times
Section 3.3
3-9
Figure 3 7 gives the five-number summary as 5.00, 22.00, 30.50, 37.25, and
66.00. We did not have to choose the checkbox for Median, since the median
and Q2 are the same number. The measure of the center, the median, implies
that half of the TV-viewing times are less than 30.50 and half of the times are
greater. We can further infer from the results that 25% of the TV-viewing
times are between 5.0 hours and 22.0 hours, 25% are between 22.0 hours and
30.5 hours, 25% are between 30.5 hours and 37.25 hours, and 25% are
between 37.5 hours and 66.0 hours. The Interquartile range is found to be
15.25 = 37.25 22.00 hours. Notice that the variation in the fourth quarter,
maximum Q3 = 28.75, is larger than the variation in the first quarter, Q1
minimum = 17.00. Right-skewed data will have the variation in the fourth
quarter larger than the variation in the first quarter. It is possible that this data
set has a distribution that is right-skewed but further analysis is needed. It is
easier to see the shape of the distribution from the boxplot.
Constructing a Boxplot
Example
3.19
Solution
3-10
Descriptive Measures
Figure 3 8
Explore:
Plots dialog
box
The Boxplots section of the Explore: Plots dialog box controls how boxplots
are displayed when there is more than one dependent variable. The bullet for
Factor levels together generates a separate boxplot for each dependent
variable. The bullet for Dependents together generates a separate boxplot for
each group defined by a factor variable.
5. Choose the bullet for Factor levels together and click the Continue
button to return to the Explore dialog box (Figure 3 4).
6. Click the OK button to display the boxplot in the Viewer window
(Figure 3 9).
Figure 3 9
Boxplot:
Weekly
TV-viewing
times
Section 3.5
3-11
SPSS makes a modified boxplot. The circle with a 6 beside it indicates that
the 6th case, which has TIMES equal to 66, is an outlier. In Example 3.17, we
suspected that the distribution might be right-skewed, but now it is clear that
the data is left-skewed with an outlier. This reminds us that a picture is worth
a thousand words.
= population mean
= population standard deviation
2 = population variance
A statistic is a descriptive measure for a sample. For example, the following
are statistics:
x = sample mean
s = sample standard deviation
s2 = sample variance
3.5 Problems
Problem 3.15
Table 3 4
Time to
Hatch
Problem 3.71
11
11
Refer to problem 3.15. Determine the range and sample standard deviation.
3-12
Descriptive Measures
Problem 3.22
Table 3 5
Router
Horsepower
1.75
2.25
2.25
2.25
1.75
2.00
1.50
Problem 3.78
Refer to Problem 3.22, determine the range and sample standard deviation.
Problem 3.16
Table 3 6
Ammonia
Fluxes
96
116
66
57
147
154
147
88
175
154
Problem 3.72
Refer to Problem 3. 16. Use SPSS to determine the standard deviation, and
range of the sample of ammonia fluxes in the first year after Hugo.
Problem 3.123
Hospital Stays: The U.S. National Center for Health Statistics compiles data
on the length of stay by patients in short term hospitals and publishes its
findings in Vital and Health Statistics. A random sample of 21 patients
yielded the data on length of stay, in days given in Table 3 7.
Table 3 7
Length of
Stay
4
3
10
4
6
13
12
15
5
18
7
7
9
3
1
6
55
23
12
1
9
Obtain and interpret the quartiles, determine and interpret the interquartile
range, find and interpret the five-number summary. Then identify potential
outliers, if any, and construct and interpret a boxplot.