Chap1 Student

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

BITI 2233 Chapter 1 : Data Description and Numerical Measures .

CHAPTER 1 :
DATA DESCRIPTION AND NUMERICAL MEASURES

OBJECTIVES

 To study the basic introductory concept of statistics, including the


branches of statistics, descriptive and inferential statistics, the basic terms of
statistics, and types of variables.

 To be able to use graphical and numerical methods to describe a data


Set such as bar graph and pie chart.

INTRODUCTION

WHAT IS STATISTICS

Statistics is a field of study which implies collecting, presenting, analyzing and


interpreting data as a basis for explanation, description and comparison.

TYPES OF STATISTICS
Statistics can be divided into two
(i) Descriptive Statistics
(ii) Inferential Statistics

Descriptive Statistics is a field of study which involves organizing, displaying and


describing data by using tables, graphs and summary measures.

Inferential Statistics consists of generalizing from samples to populations, performing


hypothesis tests, determining relationships among variables, and making predictions.

POPULATION VERSUS SAMPLE

Population refers to every element in an observation which are of interest for data
collection.
For example, If data on final exam results at UTeM is needed, every student in UTeM
forms the population.

__________________________________________________________________________________
Statistics and Probability 1
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

Sample refers to a certain number of elements that have been chosen from a population
for observation. Sample in subset to population.
For example, choose any 100 students in UTeM for interviews. The sample size is 100.

BASIC TERMS

An element or member of a sample or population is a specific subject or object about


which the information is collected.

Variable is a characteristic or attribute that can assume different values.

The value of the variable for an element is called an observation or measurement.

A data set is a collection of observations on one or more variables.

TYPES OF VARIABLE
There are two types of variable:
(i) Quantitative variables -measured numerically
- eg: heights, weights, etc.

(ii) Qualitative variables - nonnumeric categories, variables that can be placed into
distinct categories, according to some characteristic or attribute
- eg: gender, color of eyes, etc.

We also can divide the variables into:


(i) Discrete variables - countable
- eg: number of students, etc

(ii) Continuous variables - values in a certain interval


- eg: weights, time taken, etc.

ORGANIZING DATA

RAW DATA
Once data has been collected, it is crucial that the data be well presented for analysis and
interpretation.

Once data has been collected, before they are processed or ranked we called raw data.
Raw data also called as individual data.

__________________________________________________________________________________
Statistics and Probability 2
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

ORGANIZING AND GRAPHING QUALITATIVE DATA

FREQUENCY DISTRIBUTION
Numerical data can be presented in form of a table. The data would have to be classified.

Frequency distribution is the lists all categories or classes and the number of elements
or values that belong to each of the categories or classes.

Two types :
(i) category type - ungrouped frequency

(ii) interval type - grouped frequency

ORGANIZING AND GRAPHING QUANTITATIVE DATA

GROUPED FREQUENCY DISTRIBUTION


A class interval is a range of values defined by the lower class limit and upper class limit.

Class boundary is the midpoint of the upper limit of one class and the lower limit of the
next class. Formulas for finding the class boundaries are as follows :

( lower class limit) - 0.5 = ( lower class boundary)


( upper class limit) + 0.5 = ( upper class boundary)
OR
( lower class limit) - 0.05 = ( lower class boundary)
( upper class limit) + 0.05 = ( upper class boundary)

Class midpoint or class mark is a average of lower class limit and upper class limit.
Formula:
upper class limit + lower class limit
Class midpoint =
2
Range is equal to highest value minus lowest value. R = H - L

Number of classes can be obtained by using Sturge’s formula.

Number of classes = 1 + 3.3 log n ; n =the number of observation in data set

Class size ( Class width) can be obtained by dividing the range with a number of classes.

__________________________________________________________________________________
Statistics and Probability 3
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

Range
Class size =
number of classes

Tally marks used to count class frequency by marking strokes against each class for each
data that falls in that class.

GRAPHING GROUPED DATA


After the data have been organized into a frequency distribution, they can be represented
in graphic forms such as histograms, frequency polygon, and ogives

HISTOGRAMS
A graphical representation of a grouped frequency distribution.
class intervals - horizontal axis
frequency - vertical axis.
It is obtained by adjoining rectangles, the width of each rectangle is the size of each class
and the height of each rectangle is the frequency of the class interval. The area of each
rectangle is important.

HISTOGRAM

12
10
Frequency

8
6
4
2
0
72-74 75-77 78-80 81-83 84-86
Height (in inches)

FREQUENCY POLYGONS AND CURVE


It is obtained by connecting with straight lines the midpoints of adjacent class intervals of
histogram
A frequency curve is obtained by smoothing the corners of a frequency polygon.

Relative frequency = frequency of each class / Sum of all frequencies

We can use frequency or relative frequency to represent the vertical axis.

__________________________________________________________________________________
Statistics and Probability 4
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

POLYGON

12
10
Frequency

8
6
4
2
0
72-74 75-77 78-80 81-83 84-86
Height (in inches)

CUMULATIVE FREQUENCY DISTRIBUTIONS


Cumulative frequencies are obtained by finding the total number of values or frequency
that fall below the upper class boundary of each class.

Formula:
cummulative frequency of each class
Cummulative relative frequency =
sum of all frequencies

We can use cummulative frequency or cumulative relative frequency to represent the


vertical axis.

__________________________________________________________________________________
Statistics and Probability 5
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

OGIVES
These are the graphical representations of a cumulative frequency distribution. Ogive can
be drawn by joining with straight lines the dots marked above the upper boundaries of
classes at heights.

OGIVE

35
Cumulative Frequency

30
25
20
15
10
5
0
71.5 74.5 77.5 80.5 83.5 86.5
Height (in inches)

Example
Below is the distance in km of a random sample of 50 employees in Z company who
traveled to work each day.

1 2 6 7 12 13 2 6 9 5
18 7 3 15 15 4 17 1 14 5
4 16 4 5 8 6 5 18 5 2
9 11 12 1 9 2 10 11 4 10
9 18 8 8 4 14 7 3 2 6
i) Construct a frequency distribution table.
ii) Construct a histogram.
iii) Construct a frequency polygon
iv) Construct a relative frequency polygon.
v) Construct an ogive.
vi) Find the mean, variance and standard deviation for this data set. (give the answer
in 4 decimal places).

__________________________________________________________________________________
Statistics and Probability 6
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

Solution
a.i) Determine the number of classes and class width using Sturge’s formula.
Highest value = 18
Lowest value = 1
Number of classes = 1 + 3.3 log n n=the number of observation in data set
= 1 + 3.3 log 50
= 6.61
 6 classes

Highest value - Lowest value


Class width =
Numbers of classes
18  1
=
6
= 2.8
3

Class Tally Frequen Cumulative Class Midpoints, fm fm2


limit cy,f Frequency,c boundaries m
f
1-3 |||| |||| 10 10 0.5-3.5 2 20 40
4-6 |||| |||| 14 24 3.5-6.5 5 70 350
||||
7-9 |||| |||| 10 34 6.5-9.5 8 80 640
10-12 |||| | 6 40 9.5-12.5 11 66 726
13-15 |||| 5 45 12.5-15.5 14 70 980
16-18 |||| 5 50 15.5-18.5 17 85 1445
 f  50  fm  391  fm 2  4181

a.ii) Histrogram

Histogram

16
14
12
Frequency

10
8
6
4
2
0
Distance in km

__________________________________________________________________________________
Statistics and Probability 7
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

a.iii) Frequency Polygon

Frequency Polygon

16
14
Frequency

12
10
8
6
4
2
0
2 5 8 11 14 17
Distance in km

a. iv)
Class boundary Midpoint Frequency Relative Frequency
0.5-3.5 2 10 0.20
3.5-6.5 5 14 0.28
6.5-9.5 8 10 0.20
9.5-12.5 11 6 0.12
12.5-15.5 14 5 0.10
15.5-18.5 17 5 0.10

Relative Frequency Polygon


0.32
Relative frequency

0.28
0.24
0.20
0.16
0.12
0.08
0.04
0.00
2 5 8 11 14 17
Distance in km

__________________________________________________________________________________
Statistics and Probability 8
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

a.v)

Class boundary Midpoint Frequency Relative Frequency Cumulative


Relative Frequency
0.5-3.5 2 10 0.20 0.20
3.5-6.5 5 14 0.28 0.48
6.5-9.5 8 10 0.20 0.68
9.5-12.5 11 6 0.12 0.80
12.5-15.5 14 5 0.10 0.90
15.5-18.5 17 5 0.10 1.00

Ogive
Cumulative Frequency

60
50
40
30
20
10
0
0.5 3.5 6.5 9.5 12.5 15.5 18.5
Distance in km

a.vi)
 fm  fm   fm 
2 2

x 2

 f  1  f  f  1
Sample mean, Variance, s
f
4181 391
2
391
  
50 50  1 50(49)

= 7.8200 = 22.9261

Standard deviation, s = 22.9261

= 4.7881

__________________________________________________________________________________
Statistics and Probability 9
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

NUMERICAL DESCRIPTIVE MEASURES

3.1 MEASURES OF CENTRAL TENDENCY


The three common measures of central tendency are mean, median and mode.

MEAN
The mean is the average
The mean from sample is denoted by x
The mean from population is denoted by.
Calculation of mean

(i) Individual data 50, 60, 40 , 35 , 25, 40 ,15, 60 , 50


Formula:

x
x 
x
n or N

(ii) Ungrouped frequency

Formula: Data , X Frequency ,f

x
 fx 45
50
2
4
f 60 1
85 5

(iii) Grouped frequency


Data Midpoint, m Frequency, f
Formula: 20 - 30 25 2
x
 fm 30 - 40 35 5
f 40 - 50 45 3
50 - 60 55 1

MEDIAN
The median is the value of the item which is located at the center of the distribution.
Calculation of the median.

(i) Individual data


n +1
Location = th term
2

(ii) Ungrouped data


n +1
Location of median = th term
2

__________________________________________________________________________________
Statistics and Probability 10
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

Example
Ten customers purchased the following number of magazines: 1,7,5,3,6,2,3,1,5,8. Find
the median.

Solution
1,1,2,3,3,5,5,6,7,8
 35
Hence, the median =
Median 2
=4

MODE
The mode is the value, which occurs most frequently in a distribution.
(i) Individual data
Identify the data with the highest occurrence.
Note:
In any set of data may be there is no mode, or one or more than one mode.

(ii) Ungrouped frequency


Identify the data with the highest occurrence.

Example

Find the mode for below numbers:


110, 731, 1031, 84, 20, 118, 1162, 1977, 103, 752

Solution:
Since each value occurs only once, there is no mode.
Note: Do not say that the mode is zero. That would be incorrect, because in some data,
zero can be an actual value.

3.2 MEASURES OF DISPERSION

RANGE
The range is the difference between highest and lowest value in the distribution.

Formula:

Range = highest value - lowest value

__________________________________________________________________________________
Statistics and Probability 11
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

VARIANCE AND STANDARD DEVIATION


The standard deviation measures the spread of the data as compared to the mean.

(i) Individual data -


Example : 50, 60, 40, 35, 25, 40, 15, 60, 50

Formula:
x   x  x 
2

x
2 2
2
 2
    s 2
 
n  n  n 1 n( n  1)

(ii) Ungrouped frequency


Example :

Data , X Frequency ,f
45 2
50 4
60 1
85 5

Formula:
 fx   fx   fx   fx 
2 2
2 2
2     s2 
f f   f  1  f ( f  1)

(iii) Grouped frequency


Example :

Data Midpoint, m Frequency, f


20 - 30 25 2
30 - 40 35 5
40 - 50 45 3
50 - 60 55 1
Formula:
 fm   fm  fm   fm
2 2
2 2
 2
    s2 
f  f   f  1  f ( f  1)
where the m = Midpoint

__________________________________________________________________________________
Statistics and Probability 12
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

Example
The following exam score frequency distribution was obtained from all the students in
ABC college.

Class Frequency Cumulative Midpoint, fm fm2


limits ,f frequency m
90-98 6 6 94 564 53 016
99-107 22 28 103 2266 233 398
108-116 43 71 112 4816 539 392
117-125 28 99 121 3388 409 948
126-134 9 108 130 1170 152 100
 f  108  fm  12204  fm  1 387 854
2

Find the (a) mean (b) median (c) mode (d) standard deviation

Solution

(a) mean,  
 fm
f
12204

108
= 113
n 1
(b) Location of median  th term
2
108  1

2
= 54.4

The median class is 107.5-116.5. Sometimes, the class limits is used. Hence, the
median class could also be given as 108-116.

(c) The modal class is 107.5-116.5 since it has the largest frequency.
Note: Sometimes the midpoint of the class is used rather than the boundaries;
hence, the mode could also be given as 112.0.

 fm 2
  fm 

(d) standard deviation 
f   f 
 
2
1387854  12204 
  
108  108 

__________________________________________________________________________________
Statistics and Probability 13
BITI 2233 Chapter 1 : Data Description and Numerical Measures .

 81.5 = 9.03

__________________________________________________________________________________
Statistics and Probability 14

You might also like