0% found this document useful (0 votes)

2 views17 pages

Lecture Notes 3

Chapter 3 discusses the display and description of quantitative data, focusing on graphical and tabular representations such as histograms and stem-and-leaf plots. It includes exercises on analyzing patient ages and soil pH levels, as well as understanding the shape of histograms and measures of central tendency like mean and median. The chapter emphasizes the importance of visualizing data to identify patterns and summarize key statistics.

Uploaded by

seid yimer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views17 pages

Lecture Notes 3

Uploaded by

seid yimer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Chapter 3

Displaying and Describing

Quantitative Data

After collecting a sample, statistical data is often first analyzed in a descriptive man-
ner. In particular, quantitative data is described in both a graphical and tabular
form.

3.1 Displaying Quantitative Variables

Exercise 3.1 (Displaying Quantitative Variables)

1. Histogram: Patient Ages. Distribution table and histogram of age when first
becoming Chief Executive Officer (CEO) of established companies are given
below.

32, 37, 39, 40, 41, 41, 41, 42, 42, 43,
44, 45, 45, 45, 46, 47, 47, 49, 50, 51

bin frequency relative

frequency
1
30–35 1 20
= 0.05
35–40 2 0.10
40–45 8 0.40
45–50 7 0.35
50–55 2 0.10

Import chapter2.CEO.ages text file into R: Environment panel, Import Dataset.

data <- chapter3.CEO.ages; attach(data); head(data)

data.freq <- as.data.frame(table(factor(cut(age, right=FALSE, breaks=c(30,35,40,45,50,55)))))
transform(data.freq, relative = prop.table(Freq))

25
26 Chapter 3. Organizing and Summarizing Data (lecture notes 3)

8 Histogram of age Histogram of age

6
(a) (b)

5
6
Frequency

Frequency

4
4

3
2
2

1
0

0
30 35 40 45 50 55 30 35 40 45 50 55

age (years) age (years)

Histogram of age Histogram of age

30
(c) (d)
30

20
Density

Density
20

10
10

5
0

30 35 40 45 50 55 30 35 40 45 50 55

age age

Figure 3.1: Histogram for CEO Ages

par(mfrow=c(2,2)) # set for 4 graphs per panel

hist(age,right=FALSE,breaks=c(30,35,40,45,50,55),xlab="age (years)",col="green")
hist(age,right=FALSE,breaks=c(30,32.5,35,37.5,40,42.5,45,47.5,50,52.5,55),xlab="age (years)",col="green")
h <- hist(age,right=FALSE,breaks=c(30,35,40,45,50,55),plot=FALSE) # percentage histogram
h$density <- h$counts/sum(h$counts)*100
plot(h,freq=FALSE, col="green")
h <- hist(age,right=FALSE,breaks=c(30,32.5,35,37.5,40,42.5,45,47.5,50,52.5,55),plot=FALSE)
h$density <- h$counts/sum(h$counts)*100
plot(h,freq=FALSE, col="green")
par(mfrow=c(1,1) # reset to 1 graph per panel

(a) Histogram Figure 3.1(a),

Number of bins is 3 / 4 / 5 / 10.
First bin is [30, 35) / [30, 32.5) / [40, 44).
Lower limit of first bin is 30 / 32.5 / 34 / 35.
Upper limit of first bin is almost 30 / 32.5 / 34 / 35.
Width of first bin is 35 − 30 = (circle one) 2.5 / 4 / 5 / 6 years.
Number of CEOs in [30,35) age bin is 1 / 2 / 3 / 4.
(b) Histogram Figure 3.1(b),
Number of bins is 3 / 4 / 5 / 10.
First bin is [30, 35) / [30, 32.5) / [40, 44).
Lower limit of first bin is 30 / 32.5 / 34 / 35.
Upper limit of first bin is almost 30 / 32.5 / 34 / 35.
Width of first bin is 32.5 − 30 = (circle one) 2.5 / 4 / 5 / 6 years.
Number of CEOs in [30,32.5) age bin is 1 / 2 / 3 / 4.
Section 1. Displaying Quantitative Variables (lecture notes 3) 27

Increasing number of bins changes / does not change the shape of the
histogram.
(c) Histogram Figure 3.1(c),
Percentage of patients in [30,35) age bin: 5% / 10% / 35% / 40%
Shape of density histogram in Figure 3. 3.1(c) same / different to fre-
quency histogram in Figure 3. 3.1(a).
(d) Histogram Figure 3.1(d),
Percentage of patients in [30,32.5) age bin: 5% / 10% / 35% / 40%.

2. Histogram: pH levels. Consider distribution table and histogram of 28 pH levels

of soil data below.

4.3 5 5.9 6.5 7.6 7.7 7.7 8.2 8.3 9.5

10.4 10.4 10.5 10.8 11.5 12 12 12.3 12.6 12.6
13 13.1 13.2 13.5 13.6 14.1 14.1 15.1

bin frequency relative

frequency
3
4–6 3 28
≈ 0.107
6–8 4 0.143
8–10 3 0.107
10–12 5 0.179
12–14
14–16

Histogram of pH
35
30
25
Density

20
15
10
5
0

4 6 8 10 12 14 16

Figure 3.2: Histogram for pH Level Data

Import chapter2.pH.soil text file into R: Environment panel, Import Dataset.

28 Chapter 3. Organizing and Summarizing Data (lecture notes 3)

data <- chapter3.pH.soil; attach(data); head(data)

data.freq <- as.data.frame(table(factor(cut(pH, right=FALSE, breaks=c(4,6,8,10,12,14,16)))))
transform(data.freq, relative = prop.table(Freq))
h <- hist(age,right=FALSE,breaks=c(4,6,8,10,12,14,16),plot=FALSE) # percentage histogram
h$density <- h$counts/sum(h$counts)*100
plot(h,freq=FALSE, col="green")

(a) Fill in blanks in distribution table. Hint: 28 readings total.

(b) Number of bins is (circle one) 3 / 4 / 5 / 6.
(c) Width of each bin is (circle one) 2 / 3 / 4 / 5 pH.
(d) Most frequent pH reading is
[8, 10) / [10, 12) / [12, 14) / [14, 16).

3. Stem-and-leaf: patient ages. Stem-and-leaf plot for CEO ages is given below.

32, 37, 39, 40, 41, 41, 41, 42, 42, 43,
44, 45, 45, 45, 46, 47, 47, 49, 50, 51

Import chapter2.CEO.age text file into R: Environment panel, Import Dataset.

3 2 7 9∗
4 0 1 1 1∗∗ 2 2 3 4 5 5 5 6 7 7 9 stem: 10s
5 0 1 leaf: 1s

stem(age0.5,scale=0.5)

(a) Starred number, 9∗ , represents age (circle one) 39 / 93 / 9.

Double–starred number, 1∗∗ , represents age (circle one) 41 / 14 / 1.
(b) Numbers left of double line (in first column) are called stems / leaves;
numbers to right are called (circle one) stems / leaves.
(c) Starred number 9∗ is a leaf of stem (circle one) 3 / 4 / 5.
(d) True / False Note to right of stem-and-leaf plot specifies numbers used
as stems are “tens” (or “10s”) and numbers used as leaves are “ones”
(or “1s”). So, for instance, stem “3” represents 3x10 = 30 and leaf “2”
represents 1x2 = 2.
(e) Stem–and–leaf plot is ordered where, in first stem, for example, 32 is fol-
lowed by (circle one) 37 / 39 / 40.
(f) Stem–and–leaf plot is useful in identifying “center” of data, or, where
“most” data values are located. In this case, this is 30s / 40s / 50s.
Section 2. Shape (lecture notes 3) 29

4. Split stem-and-leaf plots: patient ages. Sometimes, to spread data out, stems
are split as, for example, in following table.

3 2
3 7 9
4 0 1 1 1 2 2 3 4
4 5 5 5 6 7 7 9
5 0 1 stem: 10s
5 leaf: 1s

stem(age)

(a) True / False Low stem 3 contains one half of leaves, 0, 1, 2, 3 or 4; high
stem 3 contains other half of leaves, 5, 6, 7, 8 and 9.
(b) True / False Stem-and-leaf plots can have stems split not only twice,
but also three or more times. Splitting each stem three times might, say,
consist of a low stem which contains leaves 0, 1 and 2; a middle stem with
leaves 3, 4, 5 and 6 and a high stem with leaves 7, 8 and 9.
(c) A stem and leaf plot with 10s as stems can be split at most
(circle one) 5 / 7 / 10 / 100 times.
(d) True / False Although no one “best” way of constructing a stem–and–leaf
plot, most stem–and–leaf plots consist of 5 to 20 stems.

3.2 Shape
Shape of histograms discussed, including symmetry, skewness, mode and outliers.

Exercise 3.2 (Shape)

1. Shapes of Histograms for Continuous Quantitative Data.
Describe the shape, symmetry, skewness, mode and outliers of the histograms
in Figure 3.3.
(a) Histogram Figure 3.3(a),
shape is symmetric / skewed left / skewed right / none
number of modes is one / two / more than two
has no / one / two / more than two outlier(s)
(b) Histogram Figure 3.3(b),
shape is symmetric / skewed left / skewed right / none
number of modes is one / two / more than two
has no / one / two / more than two outlier(s)
30 Chapter 3. Organizing and Summarizing Data (lecture notes 3)

0.40 0.40
0.35 0.35
relative frequency

relative frequency
0.30 0.30
0.25 0.25
0.20 0.20
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
(a) (b)

0.40 0.40
0.35 0.35
relative frequency

relative frequency
0.30 0.30
0.25 0.25
0.20 0.20
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
(c) (d)

Figure 3.3: Shapes of Histograms

(c) Histogram Figure 3.3(c),

shape is symmetric / skewed left / skewed right / none
number of modes is one / two / more than two
has no / one / two / more than two outlier(s)
(d) Histogram Figure 3.3(d),
shape is symmetric / skewed left / skewed right / none
number of modes is one / two / more than two
has no / one / two / more than two outlier(s)

3.3 Center
Two measures of central tendency are average (or, equivalently, mean) and median.
These measures are either statistics for samples for parameters for populations.

measure statistic (sample, nPmembers) parameter (population,

P N members)
x x
average (mean) x̄ = x1 +x2 +···+x
n
n
= ni µ = x1 +x2 N
+···+xN
= Ni
median M: middle, n+1 2
, of ordered sample middle, N 2+1 , of ordered population

Exercise 3.3 (Center)

1. Mean and Median: Bidding on Land. Consider small population of bidders for
N = 9 parcels of land:
Section 4. Spread of the Distribution (lecture notes 3) 31

0, 0, 0, 0, 1, 1, 2, 2, 3.

(a) Population average is

0+0+0+0+1+1+2+2+3 9
µ= = =
9 9
(choose one) 0 / 1 / 1.5 / 2.
(b) Population median is middle of 9 ordered bid numbers: N 2+1 = 9+1
2
= 5th
observation. Since first bid number is x1 = 0, second is x2 = 0, third is
x3 = 0, fourth is x4 = 0, fifth bid number is 0 / 1 / 1.5 / 2.
(c) If sample n = 5 bid numbers {0, 1, 2, 2, 3},
sample average x̄ = 0+1+2+2+3
5
= 58 = (choose one) 0 / 1 / 1.6 / 2,
sample median M is middle, 3rd, of 5: (choose one) 0 / 1 / 1.6 / 2,
(d) If sample n = 5 bid numbers {0, 0, 1, 2, 3},
sample average x̄ = 0+0+1+2+3
5
= 65 = 0 / 1 / 1.2 / 1.4,
sample median M is n+12
= 5+1
2
= 3rd observation: 0 / 1 / 1.2 / 1.4,

2. Mean and Median: Goals Scored.

Consider small sample of number of goals scored in N = 9 soccer games:

0, 1, 1, 2, 2, 2, 3, 3, 4.

goals <- c(0,1,1,2,2,2,3,3,4)

(a) Population average is µ = 0+1+1+2+2+2+3+3+4

9
= 18
9
= 0 / 1 / 1.5 / 2,
N +1 9+1
Population median is 2 = 2 = 5th observation: 0 / 1 / 1.5 / 2,
mean(goals); median(goals)

(b) If sample n = 5 of {0, 1, 2, 3, 4} goals scored,

sample average x̄ = 0+1+2+3+4
5
= 10
5
= 1 / 1.6 / 1.8 / 2,
n+1 5+1
sample median M is 2 = 2 = 3rd observation: 1 / 1.6 / 1.8 / 2,
goals.s1 <- c(0,1,2,3,4); mean(goals.s1); median(goals.s1)

(c) If sample n = 6 of {0, 0, 1, 2, 3, 4} goals scored,

sample average x̄ = 0+0+1+2+3+4
6
= 10
6
≈ 0 / 1.5 / 1.7 / 2,
n+1 6+1
sample median M is 2 = 2 = 3.5rd observation; in other words,
average of 3rd and 4th observations 1+2 2
= 0 / 1.5 / 1.7 / 2,
goals.s2 <- c(0,0,1,2,3,4); mean(goals.s2); median(goals.s2)
32 Chapter 3. Organizing and Summarizing Data (lecture notes 3)

3.4 Spread of the Distribution

In addition to standard deviation and variance, two versions of the lower quartile,
upper quartile and interquartile range are discussed.

measure statistic (sample, n members) parameter (population, N members)

rP
i (x −x̄)2
standard deviation s= n−1
σ
variance s2 σ2
range R = maximum − minimum maximum − minimum

Exercise 3.4 (Spread of the Distribution)

1. Range, Standard Deviation and Variance: Goals Scored.

Consider small sample of number of goals scored in n = 9 soccer games:

0, 1, 1, 2, 2, 2, 3, 3, 4.

(a) Measuring dispersion (spread) in entire set of goals scored.

range = Max − Min = 4 − 0 = r (circle one) 1 / 2 / 4 goals scored
P
i (x −µ)2
standard deviation (SD) σ = N
≈ 1.05 / 1.15 / 1.22 goals
variance σ ≈ 1.15 ≈ 1.11 / 1.26 / 1.5 goals2
2 2

goals <- c(0,1,1,2,2,2,3,3,4); range(goals); sd(goals); var(goals)

(b) If sample n = 5 games with {0, 1, 2, 3, 4} goals scored,

R = Max − Min = 4 − 0 = (circle
r one) 1 / 2 / 4 goals scored
P
i (x −x̄)2
standard deviation (SD) s = n−1
≈ 1.05 / 1.41 / 1.58 goals scored
variance s2 ≈ 1.582 ≈ 1 / 1.5 / 2.5 goals scored2
goals.s1 <- c(0,1,2,3,4); range(goals.s1); sd(goals.s1); var(goals.s1)

(c) If sample n = 6 games with {0, 0, 1, 2, 3, 4} goals scored,

R = Max − Min = 4 − 0 = (circle
r one) 1 / 2 / 4 goals scored
P
i (x −x̄)2
standard deviation (SD) s = n−1
≈ 1.05 / 1.41 / 1.63 goals scored
variance s ≈ 1.63 ≈ 1.56 / 1.87 / 2.67 goals scored2
2 2

goals.s2 <- c(0,0,1,2,3,4); range(goals.s2); sd(goals.s2); var(goals.s2)

2. Tukey quartiles and interquartile range: temperatures.

Consider small sample of n = 10 temperatures, set A:

0, 1, 1, 2, 2, 2, 3, 3, 5, 7.
Section 4. Spread of the Distribution (lecture notes 3) 33

temperature.A <- c(0,1,1,2,2,2,3,3,5,7)

quart.tukey <- fivenum(temperature.A) # Tukey five number summary
c(Q1.tukey=quart.tukey[2],Q3.tukey=quart.tukey[4],IQR.tukey=quart.tukey[4] - quart.tukey[2])

(a) Since
0, 1, 1, 2, 2, 2 , 3, 3, 5, 7,
|{z}
M
10+1
median temperature located 2
= 5.5th position and so M = 1 / 2 / 3
median(temperature.A)

(b) Since lower half of ordered data set

1 , 2, 2,
0, 1, |{z}
Q1

5+1
first (lower) quartile located 2
= 3rd position, Q1 = 1 / 2 / 3
(c) Since upper half of ordered data set

3 , 5, 7,
2, 3, |{z}
Q3

5+1
third (upper) quartile located 2
= 3rd position, Q3 = 3 / 5 / 7
(d) Interquartile range is IQR = Q3 − Q1 = 1 / 2 / 3

3. Tukey quartiles and interquartile range: more temperatures.

Another sample of n = 9 temperatures, set B

0, 0, 0, 0, 1, 1, 1, 2, 3
is compared to the first set of temperatures.
temperature.B <- c(0,0,0,0,1,1,1,2,3)
quart.tukey <- fivenum(temperature.B) # Tukey five-number summary
c(Q1.tukey=quart.tukey[2],Q3.tukey=quart.tukey[4],IQR.tukey=quart.tukey[4] - quart.tukey[2])

(a) Since
0, 0, 0, 0, |{z}
1 , 1, 1, 2, 3
M
9+1
median temperature located 2
= 5th position and so M = 1 / 2 / 3
median(temperature.B)

(b) Since lower half of ordered data set

0, 0, |{z}
0 , 0, 1
M

5+1
first (lower) quartile located 2
= 3rd position, Q1 = 0 / 2 / 3
34 Chapter 3. Organizing and Summarizing Data (lecture notes 3)

(c) Since upper half of ordered data set

1, 1, |{z}
1 , 2, 3
M

5+1
third (upper) quartile located 2
= 3rd position, Q3 = 2 / 5 / 7
(d) Interquartile range is IQR = Q3 − Q1 = 1 / 2 / 3

4. TI calculator quartiles and interquartile range: more temperatures.

Reconsider temperatures, set B

0, 0, 0, 0, 1, 1, 1, 2, 3

quart.TI <- function(x) {

x <- sort(x)
n <- length(x)
m <- (n+1)/2
if (floor(m) != m) {
l <- m-1/2; u <- m+1/2
} else {
l <- m-1; u <- m+1
}
c(Q1.TI=median(x[1:l]), Q3.TI=median(x[u:n]), IQR.IT= median(x[u:n]) - median(x[1:l]))
}
quart.TI(temperature.B)

(a) Since
0, 0, 0, 0, |{z}
1 , 1, 1, 2, 3
M

median temperature located 9+1 2

= 5th position and so M = 1 / 2 / 3
temperature.B <- c(0,0,0,0,1,1,1,2,3)
median(temperature.B)

(b) Since lower half of ordered data set, excluding median:

0, 0, 0 , 0
|{z}
M

4+1
first (lower) quartile located 2
= 2.5th position, Q1 = 0 / 2 / 3
(c) Since upper half of ordered data set, excluding median:

1, 1, 2 , 3
|{z}
M

4+1
third (upper) quartile located 2
= 2.5th position, Q3 = 1.5 / 5 / 7
(d) Interquartile range is IQR = Q3 − Q1 = 1.5 / 2 / 3
Section 6. Standardizing Variables (lecture notes 3) 35

3.5 Shape, Center and Spread–A Summary

This material is covered in the previous sections.

3.6 Standardizing Variables

We look at the measure of position, the z-score:
x − x̄ x−µ
z= , (sample) z = (population)
s σ

Exercise 3.6 (Standardizing Variables)

1. z-scores: IQ scores. IQ scores differ for different ages. Mean, SD for 16 year
olds are µ = 100 and σ = 16; mean, SD for 20 year olds are µ = 120 and σ = 20.

16 year olds
mean 100, SD 16

52 68 84 100 116 132 148 X, nonstandard

-3 -2 -1 0 1 2 3 Z, standard

84 132

20 year olds
mean 120, SD 20

60 80 100 120 140 160 180 X, nonstandard

-3 -2 -1 0 1 2 3 Z, standard

Figure 3.4: Comparing IQ Scores with Z-Scores

132−100
(a) A 16 year old with IQ 132 has z-score z = 16
= 0 / 1 / 2.
This IQ is two SDs above average.
84−100
(b) A 16 year old with IQ 84 has z-score z = 16
= −2 / −1 / 0.
This IQ is one SD below average.
132−120
(c) A 20 year old with IQ 132 has z-score z = 20
= 0 / 0.6 / 2.
This IQ is 0.6 of a SD above average.
36 Chapter 3. Organizing and Summarizing Data (lecture notes 3)

84−120
(d) A 20 year old with IQ 84 has z-score z = 20
= −2 / −1.8 / −1.
This IQ is 1.8 SDs below average.
(e) True / False. z-scores allow comparison of position of data points in
different data sets, data sets with different averages and SDs.
(f) If z = x−µ
σ
, then x = zσ + µ, so a 16 year old with IQ three SDs above
average has IQ x = 3(16) + 100 = (choose one) 116 / 132 / 148.
(g) A 20 year old with IQ two SDs below average has IQ
x = −2(20) + 120 = (choose one) 60 / 80 / 100.

2. Using z-scores to find outliers: Temperatures.

Consider small sample of n = 10 temperatures, set A:

0, 1, 1, 2, 2, 2, 3, 3, 5, 7.

(a) average temperature, x̄ = (choose one) 0 / 1.6 / 2.6 degrees.

SD in temperature, s ≈ (choose one) 1.15 / 1.23 / 2.07 degrees.
temperature.A <- c(0,1,1,2,2,2,3,3,5,7); mean(temperature.A); sd(temperature.A)

(b) Temperature 0o has z-score z ≈ 0−2.6

2.07
≈ −1.98 / −1.27 / −0.56.
This temperature is roughly 1.3 SDs below average.
(c) Temperature 7o has z-score z ≈ 7−2.6
2.07
≈ 1.68 / 1.97 / 2.13.
This temperature is roughly 2.1 SDs above average.
(d) z-scores less than z = −2 or greater than z = 2 are considered outliers.
So, 7o is / is not an outlier because it is more than 2 SDs above average.
(e) Temperature 1.5 SDs above average is
x ≈ 1.5(2.07) + 2.6 = (choose one) 3.335 / 3.745 / 5.705.

3.7 Five-Number Summary and Boxplots

We look at five-number summary

{min, P25 = Q1 , M = P50 = Q2 , P75 = Q3 , max},

and related boxplots.

Exercise 3.7 (Five-Number Summary and Boxplots)

1. Five Number Summary and Boxplot: Temperatures.

Consider small sample of n = 10 temperatures, set A:

0, 1, 1, 2, 2, 2, 3, 3, 5, 7.
Section 8. Comparing Groups (lecture notes 3) 37

(a) Five–number summary for temperatures, is (choose one)

(i) {0, 1, 1.5, 3, 4}
(ii) {0, 0, 1.5, 3, 6}
(iii) {0, 1, 2, 3, 7}
temperature.A <- c(0,1,1,2,2,2,3,3,5,7); fivenum(temperature.A)

(b) Consider boxplot for temperatures.

lower median upper

quartile (2) quartile
(1) (3)
min and max (7)
smallest value (outlier)
above lower fence
(0)
IQR
largest value
below upper fence
lower upper (5)
whiskers
fence fence
(1 - 1.5(2) = -2) (3 + 1.5(2) = 6)

Figure 3.5: Boxplot for temperatures

Boxplot indicates data symmetric / skewed right / skewed left.

boxplot(temperature.A, xlab="temperature",col="green",horizontal=TRUE)

2. Five Number Summary and Boxplot: More Temperatures.

Another sample of n = 9 temperatures, set B

0, 0, 0, 0, 1, 1, 2, 2, 3

is compared to the first set of temperatures.

(a) Five–number summary for this set of temperatures, is (choose one)
(i) {0, 1, 1.5, 3, 4}
(ii) {0, 0, 1, 2, 3}
(iii) {0, 1, 2, 3, 7}
temperature.B <- c(0,0,0,0,1,1,2,2,3); fivenum(temperature.B)

(b) Consider side-by-side boxplot for temperatures. Set A has warmer /

same / colder median temperature than set B.
Set A has smaller / same / larger IQR in temperature than set B.
response <- c(temperature.A,temperature.B)
temperature <- c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B")
data <- cbind.data.frame(temperature,response); attach(data); head(data)
boxplot(response~temperature, xlab="temperature",col="green",horizontal=TRUE)
38 Chapter 3. Organizing and Summarizing Data (lecture notes 3)

B
A

0 1 2 3 4 5 6 7

temperature

Figure 3.6: Side-by-side boxplot for two sets of temperatures

3.8 Comparing Groups

This material is covered in the previous sections.

3.9 Identifying Outliers

This material is covered in the previous sections.

3.10 Time Series

Exercise 3.9 (Time Series)

month 1 2 3 4 5 6 7 8 9 10 11 12 13
gold 80 70 90 75 95 100 105 107 109 112 111 115 120
technology 70 60 58 55 58 60 70 60 58 55 58 60 70

Two stock prices (in dollars) are measured monthly over a 13 month period.
Import chapter3.gold.tech.timeseries text file into R: Environment panel, Import Dataset.

data <- chapter3.gold.tech.timeseries; attach(data); head(data)

data.freq <- as.data.frame(table(factor(cut(gold, right=FALSE, breaks=c(70,80,90,100,110,120))))); data.freq
hist(gold,right=FALSE,breaks=c(70,80,90,100,110,120),xlab="Price ($)",col="red")
data.freq <- as.data.frame(table(factor(cut(technology, right=FALSE, breaks=c(55,60,65,70))))); data.freq
hist(technology,right=FALSE,breaks=c(55,60,65,70),xlab="Price ($)",col="blue")

gold.ts <- as.ts(data$gold)

technology.ts <- as.ts(data$technology)
ts.plot(gold.ts, technology.ts, gpars=list(xlab="Month", ylab="Price ($)", lty=c(1:2), col=c("red","blue")))
Section 10. Time Series (lecture notes 3) 39

Histogram of gold
Histogram of technology
4

6
5
3

4
Frequency

Frequency
2

3
2
1

1
0

0
70 80 90 100 110 120
55 60 65 70
Price ($)
Price ($)
120
110
100
Price ($)

90
80
70
60

2 4 6 8 10 12

Month

Figure 3.7: Line graph of two stocks

1. Histograms.
Histogram of gold prices
least frequent: 70 − 80 / 80 − 90 / 90 − 100
unimodal / bimodal
symmetric / asymmetric
Histogram of technology prices
least frequent: 55 − 65 / 65 − 70 / 70 − 75
unimodal / bimodal
symmetric / asymmetric

2. Time Series.
Overall trend in gold stock
increasing / stationary / decreasing;
there is / is not a seasonal variation.
Overall trend in technology stock is
increasing / stationary / decreasing;
there is / is not a seasonal variation.

3. Histograms are more informative for stationary (no strong trend or variability)
40 Chapter 3. Organizing and Summarizing Data (lecture notes 3)

rather than non-stationary times series.

True / False

3.11 Transforming Skewed Data

Transformations of data x, in particular, the natural log transformation, ln(x), makes
skewed histograms more symmetric.

Exercise 3.10 (Transforming Skewed Data) Histogram of hours watching TV as

well as various transformations of hours watching TV are given below.

Histogram of hours Histogram of hours.ln

Frequency

25
25

10
0 10

0 5 10 15 20 −0.5 0.5 1.5 2.5

Hours Watching TV ln(Hours)

Histogram of hours.sqrt Histogram of hours.inverse

40
Frequency

Frequency

25
20

0 10
0

1 2 3 4 0.0 0.5 1.0 1.5

sqrt(Hours) 1/Hours

Figure 3.8: Hours and Transformations of Hours Watching TV

Import chapter3.tv.hours text file into R: Environment panel, Import Dataset.

data <- chapter3.tv.hours; attach(data); head(data)
par(mfrow=c(2,2))
hist(hours,right=FALSE,breaks=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),xlab="Hours Watching TV",col="gold")
hours.ln <- logb(hours, base=exp(1))
hours.sqrt <- sqrt(hours)
hours.inverse <- 1/hours
hist(hours.ln,right=FALSE,xlab="ln(Hours)",col="gold")
hist(hours.sqrt,right=FALSE,xlab="sqrt(Hours)",col="gold")
hist(hours.inverse,right=FALSE,xlab="1/Hours",col="gold")
par(mfrow=c(1,1))
Section 11. Transforming Skewed Data (lecture notes 3) 41

The transformation of hours watching TV which made the original histogram most
symmetrical is ln(hours) / sqrt(hours) 1/hours

Chapter 1-Overview & Descriptive Statistics - Classroom Upload
No ratings yet
Chapter 1-Overview & Descriptive Statistics - Classroom Upload
81 pages
3 Data Description and Measures of Central Tenndency
No ratings yet
3 Data Description and Measures of Central Tenndency
72 pages
V2 Chapter3 Summer 2020 - 21 - Tagged
No ratings yet
V2 Chapter3 Summer 2020 - 21 - Tagged
36 pages
che4C3Notes 2006
No ratings yet
che4C3Notes 2006
96 pages
WEEK1
No ratings yet
WEEK1
36 pages
IE 220 Probability and Statistics: Descriptive Statistics - Graphical Summary: Describing Data With Graphs
No ratings yet
IE 220 Probability and Statistics: Descriptive Statistics - Graphical Summary: Describing Data With Graphs
36 pages
Biostat Assignment
No ratings yet
Biostat Assignment
30 pages
Unit 1 - Class 2 - 1130 (Riley)
No ratings yet
Unit 1 - Class 2 - 1130 (Riley)
39 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
W4 - Lecture Slides
No ratings yet
W4 - Lecture Slides
75 pages
All Lectures
No ratings yet
All Lectures
53 pages
Introduction To Statistics
0% (1)
Introduction To Statistics
20 pages
Lecture 1: Introduction: Statistics Is Concerned With
No ratings yet
Lecture 1: Introduction: Statistics Is Concerned With
45 pages
CH 1
No ratings yet
CH 1
40 pages
Lecture 02
No ratings yet
Lecture 02
34 pages
4.1 Descriptive Stat - Part 1
No ratings yet
4.1 Descriptive Stat - Part 1
32 pages
Chap6 STAT 2
No ratings yet
Chap6 STAT 2
11 pages
0 Lec 4 5
No ratings yet
0 Lec 4 5
29 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
86 pages
Aphs .Presentation of Statistical Data - PPSX
No ratings yet
Aphs .Presentation of Statistical Data - PPSX
42 pages
Introduction To Probability and Statistics Thirteenth Edition
No ratings yet
Introduction To Probability and Statistics Thirteenth Edition
30 pages
Stat 101
100% (4)
Stat 101
25 pages
Statistics Ns 20231
No ratings yet
Statistics Ns 20231
49 pages
Chapter 1 - Introduction To Statistics
No ratings yet
Chapter 1 - Introduction To Statistics
38 pages
Chapter 01
No ratings yet
Chapter 01
55 pages
Chapter 1A - Descriptive Statistics
No ratings yet
Chapter 1A - Descriptive Statistics
30 pages
Statistic Frequency Distribution
100% (4)
Statistic Frequency Distribution
66 pages
AEM Lecture 2
No ratings yet
AEM Lecture 2
71 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
37 pages
Office Hour: - Chu Chi Wing - Monday 2:30-3:30p.m - Chen Zirui - Tuesday 3:30-4:30p.m Thursday4:30-5:30p.m
No ratings yet
Office Hour: - Chu Chi Wing - Monday 2:30-3:30p.m - Chen Zirui - Tuesday 3:30-4:30p.m Thursday4:30-5:30p.m
28 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Ch1 Prob&Stat NEW
No ratings yet
Ch1 Prob&Stat NEW
35 pages
STAT241 – Business Statistics (Day 2)
No ratings yet
STAT241 – Business Statistics (Day 2)
36 pages
STAT606 Class03
No ratings yet
STAT606 Class03
18 pages
Biostatistics Presentation Assignment
No ratings yet
Biostatistics Presentation Assignment
67 pages
Introduction To Probability and Statistics
No ratings yet
Introduction To Probability and Statistics
30 pages
Annotated 3 Ch3 Data Description F2014
No ratings yet
Annotated 3 Ch3 Data Description F2014
16 pages
Quality Control: Fundamentals of Statistics
No ratings yet
Quality Control: Fundamentals of Statistics
62 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
3 - Stat - More Graphs and Displays 2024
No ratings yet
3 - Stat - More Graphs and Displays 2024
32 pages
Slides 1 Statistics
No ratings yet
Slides 1 Statistics
171 pages
Stat Methods
No ratings yet
Stat Methods
243 pages
Chapter 1 Lecture Slides
No ratings yet
Chapter 1 Lecture Slides
22 pages
Inferential Statistics
No ratings yet
Inferential Statistics
92 pages
PROBABILITY Lecture 1 - 2 - 3
No ratings yet
PROBABILITY Lecture 1 - 2 - 3
63 pages
Probability+&+Statistics Formulas
No ratings yet
Probability+&+Statistics Formulas
47 pages
Statistics
No ratings yet
Statistics
289 pages
Statistics Pages
No ratings yet
Statistics Pages
67 pages
Week 2
No ratings yet
Week 2
24 pages
Part 1 Descriptive
No ratings yet
Part 1 Descriptive
42 pages
Lecture 1
No ratings yet
Lecture 1
94 pages
COR-STAT1202 Introductory Statistics Seminar 2 Full Version
No ratings yet
COR-STAT1202 Introductory Statistics Seminar 2 Full Version
17 pages
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
No ratings yet
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
24 pages
L1 Descriptive Stats
No ratings yet
L1 Descriptive Stats
149 pages
Chapter 3 Data Presentation
No ratings yet
Chapter 3 Data Presentation
37 pages
Lec 2
No ratings yet
Lec 2
59 pages
Lecture 2 - Descriptive Statistics
No ratings yet
Lecture 2 - Descriptive Statistics
53 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
The Basics of 3D Platonic Order.: 3D Platonic Order, #1
From Everand
The Basics of 3D Platonic Order.: 3D Platonic Order, #1
Paul Maddock
No ratings yet
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
tp_fsl_tdh_en
No ratings yet
tp_fsl_tdh_en
60 pages
Educator Guide
No ratings yet
Educator Guide
20 pages
Collect and Record Dairy Farm Data MDL-17
No ratings yet
Collect and Record Dairy Farm Data MDL-17
50 pages
1-3 Relabsa Workshop Biosafety Biosecurity in Labs
No ratings yet
1-3 Relabsa Workshop Biosafety Biosecurity in Labs
51 pages
12 Vaccination Overview PPT 6slide HANDOUT
No ratings yet
12 Vaccination Overview PPT 6slide HANDOUT
4 pages
1summary Staph
No ratings yet
1summary Staph
1 page
171122 ToR Intaltraining Fodder ENG V2
No ratings yet
171122 ToR Intaltraining Fodder ENG V2
23 pages
De Risking, Inclusion, And Value Enhancement of Pastoral Economies 7Mar
No ratings yet
De Risking, Inclusion, And Value Enhancement of Pastoral Economies 7Mar
9 pages
camel2pp
No ratings yet
camel2pp
16 pages
EVA RESTORE Launching Event
No ratings yet
EVA RESTORE Launching Event
10 pages
Zoonoses Training
No ratings yet
Zoonoses Training
21 pages
Field Postmortem m Manual
No ratings yet
Field Postmortem m Manual
58 pages
U064-v2.0-PPT-EN
No ratings yet
U064-v2.0-PPT-EN
15 pages
Module2_FacilitatorGuide
No ratings yet
Module2_FacilitatorGuide
50 pages
Pastoralist Field Schools Manual
No ratings yet
Pastoralist Field Schools Manual
307 pages
Sileshi Mekonne ILRI PPP 2025
No ratings yet
Sileshi Mekonne ILRI PPP 2025
7 pages
Amhara Wubet PPP
No ratings yet
Amhara Wubet PPP
27 pages
ILRI PPP Presentation 08 Nov 2024
No ratings yet
ILRI PPP Presentation 08 Nov 2024
17 pages
Status of Rabies in Selected Pastoral Districts of
No ratings yet
Status of Rabies in Selected Pastoral Districts of
8 pages
MDL 03
No ratings yet
MDL 03
43 pages
One Health
No ratings yet
One Health
60 pages
Speaker 8 Eu Animal Health Strategy
No ratings yet
Speaker 8 Eu Animal Health Strategy
31 pages
Journal PNTD 0012551
No ratings yet
Journal PNTD 0012551
19 pages
Anzsdp Anthrax
No ratings yet
Anzsdp Anthrax
15 pages
E683604D-Anthrax
No ratings yet
E683604D-Anthrax
10 pages
Nutrition Training For Ag Extension Officers Ethiopia
No ratings yet
Nutrition Training For Ag Extension Officers Ethiopia
41 pages
California Mastitis Test
No ratings yet
California Mastitis Test
14 pages
National Gender Mainstreaming Manual English Version, 12 April 2021
No ratings yet
National Gender Mainstreaming Manual English Version, 12 April 2021
51 pages
Developing Integrated Farm Production
No ratings yet
Developing Integrated Farm Production
101 pages
Zoonotic Diseases in Ethiopia Training of Trainers Facilitator Guide
No ratings yet
Zoonotic Diseases in Ethiopia Training of Trainers Facilitator Guide
23 pages
C775 - Lecture 4
No ratings yet
C775 - Lecture 4
7 pages
Lecture Notes GLS
No ratings yet
Lecture Notes GLS
5 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Student X Height (CM) y Weight (KG) : Relationship
No ratings yet
Student X Height (CM) y Weight (KG) : Relationship
8 pages
Chapter 1 - Introduction To Statistics
No ratings yet
Chapter 1 - Introduction To Statistics
6 pages
Propagation of Statistical Errors
No ratings yet
Propagation of Statistical Errors
6 pages
BUS173 Assignment
No ratings yet
BUS173 Assignment
19 pages
M1 Stat-701 SLR 2022
No ratings yet
M1 Stat-701 SLR 2022
17 pages
MSexam Stat 2019F Solutions
No ratings yet
MSexam Stat 2019F Solutions
11 pages
Histogram Samples
No ratings yet
Histogram Samples
20 pages
Introduction To Statistics For Biomedical Engineers
No ratings yet
Introduction To Statistics For Biomedical Engineers
98 pages
Nested Anova Revised
100% (1)
Nested Anova Revised
6 pages
Lesson - 5.1 - Design of Experiments - Improve - Phase
No ratings yet
Lesson - 5.1 - Design of Experiments - Improve - Phase
39 pages
Lesson 11 Sampling
No ratings yet
Lesson 11 Sampling
22 pages
MMW 101 Lesson 7 Measures of Variation
No ratings yet
MMW 101 Lesson 7 Measures of Variation
31 pages
Factor Analysis
No ratings yet
Factor Analysis
26 pages
Business Analytics Practical Problems
No ratings yet
Business Analytics Practical Problems
26 pages
Fontana 2009
No ratings yet
Fontana 2009
14 pages
Kolmogorov-Smirnov Test - Wikipedia, The Free Encyclopedia
No ratings yet
Kolmogorov-Smirnov Test - Wikipedia, The Free Encyclopedia
6 pages
Non Parametric Tests - A
No ratings yet
Non Parametric Tests - A
13 pages
Dayondon Thesis Presentation 1
No ratings yet
Dayondon Thesis Presentation 1
22 pages
102-Article Text-273-1-10-20201216
No ratings yet
102-Article Text-273-1-10-20201216
15 pages
The Mean of The Sample Mean
No ratings yet
The Mean of The Sample Mean
31 pages
Labour Productivity Analysis Using Multi-Variable Linear Regression Technique
No ratings yet
Labour Productivity Analysis Using Multi-Variable Linear Regression Technique
15 pages
Econometrics For Business in R and Python Watermark
No ratings yet
Econometrics For Business in R and Python Watermark
104 pages
MachineLearning Algorithm Hope
No ratings yet
MachineLearning Algorithm Hope
134 pages
7 Stages of Factor Analysis PDF - Compressed
No ratings yet
7 Stages of Factor Analysis PDF - Compressed
21 pages
Review Test Submission - Online Quiz 10 (Chapter 11) - ..
No ratings yet
Review Test Submission - Online Quiz 10 (Chapter 11) - ..
3 pages
11SMA201 Business Research
No ratings yet
11SMA201 Business Research
2 pages