Unit 3 Descriptive Statistics Part 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Unit 3

Descriptive Statistics

XAVIER –ATENEO
MATHEMATICS DEPARTMENT
Introduction

In addition to tabular and graphical


methods of summarizing data, one also
finds it useful to summarize data by
methods that lead to numerical results,
called descriptive measures.
After completing this session, you will be able to:

Summarize data, Describe data, Appreciate the


using measures of using measures of properties and
central tendency, variation, limitations of
such as the mean, such as the range, summary values
median, mode, variance, and
and midrange. standard
deviation.
Descriptive Statistics – numerical
measures that are used to
describe certain characteristics of
the data

1. Measures of Central Location


2. Measures of Variability
3. Measures of Shape
4. Measures of Location
6. Box-and-Whisker Plot (5-number
summary)
Measures of Central Location
- Any single value which is used to identify the
“center” or the typical value in the data set; it is
oftentimes referred to as the average.

Mean

Median

Mode
Measures of Central Location

1.Mean
– sum of all values of the observations divided by
the number of observations in the data set
Customer Time
A 10
Example B 16
The following table shows C 14
the length of waiting time D 20
(in minutes) for the 10 E 18
randomly selected F 18
customers at a certain fast G 8
H 14
food chain before the order
I 18
is served. J 10

Determine the mean


waiting time of customer
using the formula.
Solution Given:
n = 10
Customer Time Χ1 = 10 ; Χ2 = 16 ; Χ3 = 14 ; Χ4 = 20
; Χ5 = 18;….,
A 10
Χ10 = 10
B 16
C 14 10
D 20  xi
E 18 x = i =1
10
F 18
x1 + x 2 + x 3 + ... + x 9 + x10
G 8 x=
H 14 10
I 18
J 10 𝑥=
43 + 66 + 61 + 64 + 65 + 68 + 59 + 57 + 57 + 57
10

𝑥ҧ =14.6 mins
The arithmetic mean is, in general, a very
natural measure of central location. One of
its principal limitations, however, is that it
is overly sensitive to extreme values. In this
instance it may not be representative of
the location of the great majority of the
sample points.

Common examples occur in distributions


of income and selling prices of houses.
Because a few extreme values at the
upper end will inflate the mean, the
median will give a better picture of central
tendency.
Measures of Central Location
2. Median – a value that divides an ordered set of data
(array) into two equal parts (usually denoted by Md)
It is sometimes called as the positional average because
it is the value that lies exactly in the middle of the data
set after the values have been placed in an ordered
array.

Md = middle value in the array when n is odd


Md = mean of the two middle values when n is even
Customer Time
A 10
Example 1 B 16
The following table shows C 14
the length of waiting time D 20
(in minutes) for the 10 E 18
randomly selected F 18
customers at a certain fast G 8
H 14
food chain before the order
I 18
is served. J 10

Determine the median


waiting time of customer
using the formula.
Solution Ordered
Data Time
Customer Time G 8
A 10 A 10
B 16 J 10
C 14 C 14
D 20 H 14
E 18 B 16
F 18 E 18
G 8 F 18
H 14 I 18
I 18 D 20
J 10
14 + 16
𝑀𝑑 =
2

Md= 15 mins
Solution
Example 2
STEP 1: Arrange the scores in
The exam scores of a sample order
of 11 students in a statistics 83, 84, 84, 86, 87, 88, 90, 91,
class are shown below: 95, 96, 99.

86, 95, 84, 87, 91, 90, 99, STEP 2: Locate the middle value
in the data set.
84, 83, 88, 96
STEP 3: Since n is odd,
Determine the median age
83, 84, 84, 86, 87, 88, 90, 91,
of the students. 95, 96, 99.

𝑀𝑑 = 88
The principal strength of the sample median
is that it is insensitive to very large or very
small values.
The principal weakness of the sample median
is that it is determined mainly by the middle
points in a sample and is less sensitive to the
actual numerical values of the remaining data
points
Measures of Central Location

3. Mode – the value in the data set that occurs with


the greatest frequency

If all the observations occur with equal frequency,


the data has no mode.
But in some instances, there can be more than
one mode.
Customer Time
A 10
Example 1 B 16
The following table shows C 14
the length of waiting time D 20
(in minutes) for the 10 E 18
randomly selected F 18
customers at a certain fast G 8
H 14
food chain before the order
I 18
is served. J 10

Determine the modal


waiting time of customer
using the formula.
Solution
Since 18 appeared with
Customer Time the greatest frequency (3
A 10 times), then the mode is
B 16 18.
C 14
D 20
E 18
F 18 Mo = 18 mins
G 8
H 14
I 18
J 10
Solution
Example 2
a.The modal values are 20
Find the modal values for and 19 since both appeared
the following data: twice and the rest appeared
once.
a) 2, 19, 20, 20, 22, 19
m o = 19 and 20

b) 22, 66, 69, 70, 73 b. Since each data value


appeared with the same
frequency ( that is once), the
data set has no Mode
POSSIBILITIES

No Mode – each data value occurs the same


number of times

Unimodal – one data value occurs with the


greatest frequency

Bimodal – two data values occur with the same


frequency

Multimodal – more than two data values occur


with the same frequency
EXCEL: Data – Data Analysis – Descriptive Statistics
EXCEL: Data – Data Analysis – Descriptive Statistics

Time

Customer Time sum of the observations divided


A 10 Mean 14.6 by 10
Standard Error 1.30128142
B 16 Median is the average of the two middle values
C 14 Median 15 14 and 16
Mode 18 appeared thrice in the data set
D 20
Standard
E 18 Deviation 4.115013163
Sample
F 18 Variance 16.93333333
G 8 Kurtosis -1.257893944
H 14 Skewness -0.407572294
Range 12
I 18 Minimum 8
J 10 Maximum 20
Sum 146
Count 10
Available Excel functions:
=average(data_range)
=median(data_range)
=mode(data_range)
Measures of Variability/Dispersion

Dispersion refers to the spread or variability


in the data. A measure of dispersion indicates
to what degree the individual observations are
dispersed or spread out around the mean.

Common Measures of Variability

1. Range
2. Variance
3. Standard deviation
4. Standard Error
5. Coefficient of Variation
Measures of Variability/Dispersion

1. Range
– the difference between the maximum and
the minimum values in the data set

Maximum – the highest value in the data set


Minimum – the lowest value in the data set
Example 1 Solution

MAX = 20 minutes
Determine the following: MAX,
MIN, RANGE of the data below. MIN = 8 minutes

RANGE = 20 - 8=12 minutes


Customer Time
A 10
B 16
C 14
D 20
E 18
F 18
G 8
H 14
I 18
J 10
Measures of Variability/Dispersion

2. Standarddeviation – a measure of dispersion


which indicates the extent of scattering of the
observations from the mean

3. Variance – the square of the standard deviation


- the average squared deviation of the
observations from the mean
Measures of Variability/Dispersion

Standard deviation and variance


Solution (mean = 14.6)
Example 1

Compute the variance and Customer Time x – mean (x - mean)2


standard deviation of the waiting A 10 -4.6 21.16
time of customers using the B 16 1.4 1.96
C 14 -0.6 0.36
formula.
D 20 5.4 29.16
E 18 3.4 11.56
Customer Time
F 18 3.4 11.56
A 10
B 16 G 8 -6.6 43.56
C 14 H 14 -0.6 0.36
D 20 I 18 3.4 11.56
E 18 J 10 -4.6 21.16
F 18 sum 152.4
G 8
H 14
I 18
J 10
Solution
Example 1
x – mean (x - mean)2
Compute the variance and -4.6 21.16
1.4 1.96
standard deviation of the waiting -0.6 0.36
time of customers using the 5.4 29.16
formula. 3.4 11.56
3.4 11.56
-6.6 43.56
Customer Time -0.6 0.36
A 10 3.4 11.56
B 16 -4.6 21.16
C 14 sum 152.4
D 20
2
E 18 2
σ𝑛𝑖=1 𝑥 − 𝑥ҧ 152.4
𝑠 = = = 16.933
F 18 𝑛−1 9
G 8
Sample variance
H 14 S2 = 16.933 minutes2
I 18
Sample Standard deviation
J 10
Measures of Variability/Dispersion

4. Standard error (of the mean) – is used to measure


how well the obtained statistic will estimate the
target parameter. The smaller the standard error, the
better the statistic estimates the parameter.
- It is estimated by dividing the standard deviation
with the square root of the sample size.

standard error of the mean


Solution
Example
Note that that the computed
Determine the standard error of sample Standard deviation
S = 4.115 minutes
the mean waiting time.
𝑠
𝜎𝑥ҧ =
𝑛
Customer Time
A 10 4.115
B 16 𝜎𝑥ҧ =
C 14
10
D 20 𝜎𝑥ҧ =1.301 minutes
E 18
F 18
G 8
H 14
I 18
J 10
The standard deviation of sample means is known as
the standard error of the mean (SE).
Excel: Data – Data Analysis – Descriptive Statistics
Excel Output. Measures of Variability
Time

Mean 14.6
Standard Error 1.30128142standard deviation divided by square root of n
Median 15
Mode 18
Standard
Deviation 4.115013163average deviation of each value from the mean
Sample Variance 16.93333333square of the standard deviation
Kurtosis -1.257893944
Skewness -0.407572294
Highest value - Lowest
Range 12 value
Minimum 8lowest value
Maximum 20highest value
Sum 146
Count 10
Available Excel functions:
=stdev(data_range) =max(data_range)
=var(data_range) =min(data_range)
Measures of Variability/Dispersion
The coefficient of variation (CV) measures how
scattered the data relative to the mean. It is a relative
measure of variation that is always expressed as a
percentage.

𝑠
Formula: 𝐶𝑉 = ∙ 100%
𝑥ҧ

The coefficient of variation is very useful when


comparing the two or more data sets that have
different means and/or measured in different unit of
measurement.
Example
Jolly A Jolly B
The following table shows the 10 10
waiting time (in minutes) of 16 16
customers in Jolly A and Jolly B 14 15
fastfood chains. 20 25
18 20
Which fastfood chain has a lower 18 18
coefficient of variation? 8 9
14 14
18 21
10 12
Solution
Conclusion
Jolly A Jolly B
10 10 Therefore Jolly A has a
16 16 lesser coefficient of
14 15 variation.
20 25 Further, it means that the
mean waiting times of Jolly
18 20
A is more consistent than
18 18
that of Jolly B.
8 9
14 14
18 21
10 12
Stdev 4.115 5.077
Mean 14.6 16
CV 28.19% 31.73%
Mean,Median and Mode

Range, Variance, Standard deviation,


Standard Error and Coefficient of
Variation

You might also like