Chap1 Student
Chap1 Student
Chap1 Student
CHAPTER 1 :
DATA DESCRIPTION AND NUMERICAL MEASURES
OBJECTIVES
INTRODUCTION
WHAT IS STATISTICS
TYPES OF STATISTICS
Statistics can be divided into two
(i) Descriptive Statistics
(ii) Inferential Statistics
Population refers to every element in an observation which are of interest for data
collection.
For example, If data on final exam results at UTeM is needed, every student in UTeM
forms the population.
__________________________________________________________________________________
Statistics and Probability 1
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
Sample refers to a certain number of elements that have been chosen from a population
for observation. Sample in subset to population.
For example, choose any 100 students in UTeM for interviews. The sample size is 100.
BASIC TERMS
TYPES OF VARIABLE
There are two types of variable:
(i) Quantitative variables -measured numerically
- eg: heights, weights, etc.
(ii) Qualitative variables - nonnumeric categories, variables that can be placed into
distinct categories, according to some characteristic or attribute
- eg: gender, color of eyes, etc.
ORGANIZING DATA
RAW DATA
Once data has been collected, it is crucial that the data be well presented for analysis and
interpretation.
Once data has been collected, before they are processed or ranked we called raw data.
Raw data also called as individual data.
__________________________________________________________________________________
Statistics and Probability 2
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
FREQUENCY DISTRIBUTION
Numerical data can be presented in form of a table. The data would have to be classified.
Frequency distribution is the lists all categories or classes and the number of elements
or values that belong to each of the categories or classes.
Two types :
(i) category type - ungrouped frequency
Class boundary is the midpoint of the upper limit of one class and the lower limit of the
next class. Formulas for finding the class boundaries are as follows :
Class midpoint or class mark is a average of lower class limit and upper class limit.
Formula:
upper class limit + lower class limit
Class midpoint =
2
Range is equal to highest value minus lowest value. R = H - L
Class size ( Class width) can be obtained by dividing the range with a number of classes.
__________________________________________________________________________________
Statistics and Probability 3
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
Range
Class size =
number of classes
Tally marks used to count class frequency by marking strokes against each class for each
data that falls in that class.
HISTOGRAMS
A graphical representation of a grouped frequency distribution.
class intervals - horizontal axis
frequency - vertical axis.
It is obtained by adjoining rectangles, the width of each rectangle is the size of each class
and the height of each rectangle is the frequency of the class interval. The area of each
rectangle is important.
HISTOGRAM
12
10
Frequency
8
6
4
2
0
72-74 75-77 78-80 81-83 84-86
Height (in inches)
__________________________________________________________________________________
Statistics and Probability 4
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
POLYGON
12
10
Frequency
8
6
4
2
0
72-74 75-77 78-80 81-83 84-86
Height (in inches)
Formula:
cummulative frequency of each class
Cummulative relative frequency =
sum of all frequencies
__________________________________________________________________________________
Statistics and Probability 5
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
OGIVES
These are the graphical representations of a cumulative frequency distribution. Ogive can
be drawn by joining with straight lines the dots marked above the upper boundaries of
classes at heights.
OGIVE
35
Cumulative Frequency
30
25
20
15
10
5
0
71.5 74.5 77.5 80.5 83.5 86.5
Height (in inches)
Example
Below is the distance in km of a random sample of 50 employees in Z company who
traveled to work each day.
1 2 6 7 12 13 2 6 9 5
18 7 3 15 15 4 17 1 14 5
4 16 4 5 8 6 5 18 5 2
9 11 12 1 9 2 10 11 4 10
9 18 8 8 4 14 7 3 2 6
i) Construct a frequency distribution table.
ii) Construct a histogram.
iii) Construct a frequency polygon
iv) Construct a relative frequency polygon.
v) Construct an ogive.
vi) Find the mean, variance and standard deviation for this data set. (give the answer
in 4 decimal places).
__________________________________________________________________________________
Statistics and Probability 6
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
Solution
a.i) Determine the number of classes and class width using Sturge’s formula.
Highest value = 18
Lowest value = 1
Number of classes = 1 + 3.3 log n n=the number of observation in data set
= 1 + 3.3 log 50
= 6.61
6 classes
a.ii) Histrogram
Histogram
16
14
12
Frequency
10
8
6
4
2
0
Distance in km
__________________________________________________________________________________
Statistics and Probability 7
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
Frequency Polygon
16
14
Frequency
12
10
8
6
4
2
0
2 5 8 11 14 17
Distance in km
a. iv)
Class boundary Midpoint Frequency Relative Frequency
0.5-3.5 2 10 0.20
3.5-6.5 5 14 0.28
6.5-9.5 8 10 0.20
9.5-12.5 11 6 0.12
12.5-15.5 14 5 0.10
15.5-18.5 17 5 0.10
0.28
0.24
0.20
0.16
0.12
0.08
0.04
0.00
2 5 8 11 14 17
Distance in km
__________________________________________________________________________________
Statistics and Probability 8
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
a.v)
Ogive
Cumulative Frequency
60
50
40
30
20
10
0
0.5 3.5 6.5 9.5 12.5 15.5 18.5
Distance in km
a.vi)
fm fm fm
2 2
x 2
f 1 f f 1
Sample mean, Variance, s
f
4181 391
2
391
50 50 1 50(49)
= 7.8200 = 22.9261
= 4.7881
__________________________________________________________________________________
Statistics and Probability 9
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
MEAN
The mean is the average
The mean from sample is denoted by x
The mean from population is denoted by.
Calculation of mean
x
x
x
n or N
x
fx 45
50
2
4
f 60 1
85 5
MEDIAN
The median is the value of the item which is located at the center of the distribution.
Calculation of the median.
__________________________________________________________________________________
Statistics and Probability 10
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
Example
Ten customers purchased the following number of magazines: 1,7,5,3,6,2,3,1,5,8. Find
the median.
Solution
1,1,2,3,3,5,5,6,7,8
35
Hence, the median =
Median 2
=4
MODE
The mode is the value, which occurs most frequently in a distribution.
(i) Individual data
Identify the data with the highest occurrence.
Note:
In any set of data may be there is no mode, or one or more than one mode.
Example
Solution:
Since each value occurs only once, there is no mode.
Note: Do not say that the mode is zero. That would be incorrect, because in some data,
zero can be an actual value.
RANGE
The range is the difference between highest and lowest value in the distribution.
Formula:
__________________________________________________________________________________
Statistics and Probability 11
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
Formula:
x x x
2
x
2 2
2
2
s 2
n n n 1 n( n 1)
Data , X Frequency ,f
45 2
50 4
60 1
85 5
Formula:
fx fx fx fx
2 2
2 2
2 s2
f f f 1 f ( f 1)
__________________________________________________________________________________
Statistics and Probability 12
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
Example
The following exam score frequency distribution was obtained from all the students in
ABC college.
Find the (a) mean (b) median (c) mode (d) standard deviation
Solution
(a) mean,
fm
f
12204
108
= 113
n 1
(b) Location of median th term
2
108 1
2
= 54.4
The median class is 107.5-116.5. Sometimes, the class limits is used. Hence, the
median class could also be given as 108-116.
(c) The modal class is 107.5-116.5 since it has the largest frequency.
Note: Sometimes the midpoint of the class is used rather than the boundaries;
hence, the mode could also be given as 112.0.
fm 2
fm
(d) standard deviation
f f
2
1387854 12204
108 108
__________________________________________________________________________________
Statistics and Probability 13
BITI 2233 Chapter 1 : Data Description and Numerical Measures .
81.5 = 9.03
__________________________________________________________________________________
Statistics and Probability 14