(Measures of Location) - Lec#1 - Chapter 1 - Part1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

BAS131 / BAS 115

Probability –Statistics
Lecture 1
Chapter 1 – Part 1
Measures of Location
(Central Tendency)
Chapter 1
Introduction to
Statistics and
Data Analysis

Copyright © 2017 Pearson Education ,Ltd. All rights reserved.


What is Statistics?
Statistics is the science of
conducting studies to:
collect, organize, present,
summarize , analyze and
draw conclusions “decisions”
from data.
Types of Data
(i) Quantitative: consists of numbers
representing counts or measurements.

(ii) Qualitative: can be separated into categories


that are distinguished by nonnumeric
characteristics, (blood types, colours, genders
(male/female) , letter grades in an exam etc.)
Types of Data (cont.)
Throughout this course we are mostly
concerned with quantitative data
Quantitative data assume numeric values:
◦ Discrete: when the number of values is
finite or countable, e.g., number of students
in class.
◦ Continuous: result from infinitely many
values correspond to some scale that covers
a range of values without gaps, e.g.,
Temperature, height, weight.
Types of statistical Applications
There are two possible types of
studies, depending on why the study
is conducted:
1. Descriptive statistics
2. Inferential statistics.
1-Descriptive Statistics
• Involves only the collection,
organization, presentation, and
summarization of data.

• The point is to describe a certain


situation as represented by a particular
data set.
2-Inferential Statistics
Involves drawing conclusions from
data.

The point is to make inferences


about a certain situation represented
by a particular data set.
Populations and Samples
Population is the complete collection of
all elements to be studied.
Populations are normally so large that it is
logistically impossible to examine all the
individuals. So…
One takes a subgroup of a population and
examines the desired characteristic for the
subcollection.
• One takes a subgroup of a population and
examines the desired characteristic for the
subcollection.

• Such subcollections are called samples.

• A major problem is to ensure that a selected


sample is representative of the population.

So the major task of inferential statistics is to


draw conclusions about a whole population on
the basis of [analyzing] sample.
Descriptive Statistics:
Summarization
• We’ll consider three aspects:

1.Measures of Central Location ( tendency).

2.Measures of Variation ( Dispersion ).

3. Measures of Position.
Types of Statistical Measures
1.Measures of Central Location:
The purpose of a measure of location is to pinpoint the center of a
distribution of data. It shows the central value of the data.
(What is the value that all data concentrated around it?)

2.Measures of Variation (Dispersion):


Often called the variation or the spread.
It measures to what extent the data is different from each other

3. Measures of Position
There are other ways of describing the variation or spread in a set of
data. The purpose of a measure of position is to determine the location
of values that divide a set of observations into equal parts.
These measures include quartiles, deciles, and percentiles.
[1] Measures of Location (Central Tendency)
• We’re interested in a value that represents the center of
the distribution:
We’ll study three measures:
 1-Mean (Arithmetic Mean)
 2-Median
 3-Mode
We will explain first the meaning of these measures and
then calculate them from the raw data.
(i) The Arithmetic Mean
• For a population of size N, the mean, , is
given by N


 = i 1
Xi

• For a sample of size n, the mean, X , is given


by n

X i
X i 1
n
Example (1) : The data represent the number of days off per
year for a sample of individuals selected from 9 different
countries. Find the sample mean.

20 26 40 36 23
42 35 24 30
Solution : n = 9

20  26  40  36  23  42  35  24  30 276
X   30.7
9 9
Example(2): There are 42 exits on I-75 through the state of
Kentucky. Listed below are the distances between exits (in
miles). 11 4 10 4 9 3 8 10 3 14 1 10 3 5
2 2 5 6 1 2 2 3 7 1 3 7 8 10
1 4 7 5 2 2 5 1 1 3 3 1 2 1
Why is this information a population? What is the mean
number of miles between exits?

Solution: This is a population because we are considering all


the exits in Kentucky
 X 11  4  10  ...  1 192
    4.57
N 42 42
This is the typical number of miles between exits.
Properties of the Arithmetic Mean
The arithmetic mean is a widely used measure of location. It has several
important properties:

1.The data must be Quantitative data because we add the data to


calculate the mean, so it should be numeric data

2. All the values are included in computing the mean.

3. The mean is unique. That is, there is only one mean in a set of data.

4. The sum of the deviations of each value from the mean is zero.
Expressed symbolically:
 (X  X )  0
Properties of the Arithmetic Mean

As an example, the mean of 3, 8, and 4 is

X  X 15
 5
n 3
Then:

 (X  X)  (3  5)  (8  5)  (4  5)  (2)  (3)  (1)  0


2] The Median
The Median is: The midpoint of the values after they have
been ordered from the minimum to the maximum values.

•The median is calculated by placing all the observations in


order; the observation that falls in the middle is the median.
To determine the median:
1. Sort the data values.
2. Pick the value in the middle as :For n data values,
(i) If n is odd, then MD = middle point,
i.e., the value in position
 n  1
2
If n is even,
then MD = (sum of the two middle points) / 2

i.e., the average of two numbers in positions

n  n
and    1
2 2

(Note: MD need not be a data value)


Example (3): The following data give the weights of a
sample of 7 children in k.g. Find the median weight.
20.5 , 48 , 36.5 , 44 , 32 , 23.5 , 50

Solution: n = 7 is an odd

First : sort the data from smallest to largest:


20.5, 23.5, 32, 36.5, 44, 48, 50

Rank of median = (n+1)/2= (7+1)/2=4 , i.e 4th position


Then = MD = 36.5
Example (4): Find the median for the following
sample

1 2 6 7 12

13 2 6 9 5

18 7 3 15 15

4 17 1 14 5
Step 1: Sorting…

1 1 2 2 3

4 5 5 6 6

7 7 9 12 13

14 15 15 17 18
Step 2 : n = 20 , which is even number so ,the two values
in the middle are in positions n/2 = 20/2 = 10 and
(n/2) + 1 =11

1 1 2 2 3

4 5 5 6 6

7 7 9 12 13

14 15 15 17 18

Then the median MD = (6 + 7) / 2 = 6.5


The Median : Remarks

 Accordingly, half of data (50 % of data ) less than the


median and half of data (50 % of data ) greater than the
median
 In example(3) : we have 3 values less than the median
and 3 values are greater than the median (and the median
is one of the observations)
 In example (4) : we have 10 values less than the median
and 10 values are greater than the median (and the median
is not one of the observations)
 Sample and population medians are computed by the
same way.
Properties of the Median
1. It does not use all data like the mean , it is in the middle
of the data ( use only one value of the data)

2. Accordingly , It is not affected by extremely large or


small values (It is not affected by outliers).
Properties of the Median
It is not affected by outliers
Example(5):
Facebook is a popular social networking website. Users can
add friends and send them messages, and update their
personal profiles to notify friends about themselves and
their activities. A sample of 10 adults revealed they spent
the following number of hours last month using Facebook.

3 5 7 5 9 1 3 9 170 10

Find the median number of hours.


Solution :
Note that the number of adults sampled is even (n = 10)
1] Sort : Outlier

1 3 3 5 5 7 9 9 10 170

2] median = The average of the two values in 5th and 6th positions
57
Median  6
2

We conclude that the typical Facebook user spends 6 hours


per month at the website.
3] The Mode
• The mode is a data value that has the
highest frequency in a data set.
• A distribution may have one, more than
one, or no mode at all.
• Defined for both qualitative and
quantitative data.
• Not affected by extreme values.
Example (6) : The following data represent
the duration (in days) of U.S. Space Shuttle
voyages for the years 1992—1994.Find the
mode
8 9 9 14 8 8
10 7 6 9 7 8
10 14 11 8 14 11
Solution : Sort the data for convenience
6 7 7 8 8 8
8 8 9 9 9 10
10 11 11 14 14 14
Identify the value with highest frequency
6 7 7 8 8 8
8 8 9 9 9 10
10 11 11 14 14 14

Then ,the mode is 8.


Example(7) :The following data represent the
number of coal employees per county for 10
selected counties in southwestern Pennsylvania .
Find the Mode
110 731 1031 84 20
118 1162 1977 103 752
Solution: Since each value occurs exactly once, then
there is no mode.
Note that : the mode is not 0.
Example(8) :The following data represent the favorite
subject of 10 MIU students.

CS Math Math Physics Physics


Math CS CS Chemistry Physics

Solution :The distribution has three modes:


Math, CS and Physics.

You might also like