Sta416 Chapter 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

UiTM Terengganu STA416

CHAPTER 1: DESCRIPTIVE STATISTICS

By the end of this topic, you should be able to:

 Understand why people study statistics


 Distinguish between descriptive and inferential statistics
 Distinguish between qualitative variable and quantitative variable
 Understand some statistical terms
 Understand various of data collection methods
 Distinguish some of the sampling techniques : Random and Non-Random Sampling
 Present data in the form of graph : bar char, pie chart, histogram, frequency polygon and
ogive
 Present data in the form of stem and leaf

1.1 Types of Statistics:

Method of collecting, organizing, summarizing, presenting, analyzing and interpreting data


(information) in a convenient and informative way to assist in making more effective decisions.

Statistics can be categorized as descriptive statistics and inferential/inductive statistics.

Descriptive Statistics is designed to describe, without going any further; that is without
attempting to infer or conclude anything that goes beyond the data themselves.

Inferential Statistics is a method used to determine something about a population, based on


sample.

1.2 Variables and Types of Data

1.2.1 Statistical terms

i. Research/Survey – A study done using statistical methods in order to understand


certain problem.
ii. Element – Respondent/object on which data is taken.
iii. Population – All elements under study either living or non-living object.
iv. Sample – Subset or part of population.
v. Sampling frame – A complete list of all elements in a population.
vi. Pilot survey – A study done on a small scale before the actual survey.
vii. Census – A study done on the entire population.
viii. Parameters – a summary measure/characteristics obtained from population
ix. Statistics – a summary measure/characteristics obtained from sample
x. Variable/Attribute – Characteristics of the population under study.
UiTM Terengganu STA416

1.2.2 Types of Variable

1. Qualitative variable – measured according to their specific categories or characteristics.


Example: gender (male, female), marital status (single, married), race (Malay, Indian,
Chinese), grade (A, B, C)

2. Quantitative variable – when the variable studied comes in term of numbers (numerical
value)
Example: number of student, total income, distance traveled, test mark etc.

Quantitative variable can further be classified as:


a. Discrete – assume only exact values
Example: no. of student, annual sales, total income, shoe size, etc.
b. Continuous – can be expressed in a certain degree of accuracy
Example: Distance traveled litters of petrol, weight and height of children, etc.

Exercise 1:

1. Determine for the following whether you would use descriptive statistics or inferential
statistics for the following information.
a. A trainer wanted to determine the minimum time taken by his swimmers to swim 100m.
b. An economist uses a bar chart to illustrate the loss made by an airline company from 1990
- 2000.
c. A few botanists do a research on the relation between durian production and the usage of
cow manure as fertilizer.
d. Psychologists study whether urban students are higher achievers as compared to
suburban students.
e. Dewan Bandaraya Kuala Lumpur formed a committee to investigate the relation of flash
floods occurrence and the amount of rubbish in Sungai Gombak and Sungai Kelang.

2. Determine which of the following term is constant or variable. If it is a variable, determine


whether it is quantitative or qualitative. If it is quantitative, determine whether it is discrete or
continuous.
a. Number of days in February.
b. Marks to get grade B.
c. Maximum marks to get grade B.
d. Marital status of the workers in a firm.
e. The length of 2000 screws in a production line
UiTM Terengganu STA416

1.3 Data Collection Method

Generally there are 6 methods of data collection that can be used in order to collect
the primary data. They are:

i. Personal interview
Researcher talks to the respondent face to face.

ii. Telephone interview


Interviewer asks questions from a prepared questionnaire.

iii. Mailing
A questionnaire is sent to each respondent with a stamped
addressed envelope attached.

iv. Direct observation


Respondents will be observed without their knowledge

v. Direct Questionnaire
The researcher gives the questionnaire directly to the respondent and waits for
them to complete it.

vi. Other methods


Electronic e-mail, internet survey and short messaging services (SMS).

1.4 Sampling Techniques

Sampling is a process of selecting sample from a population. Researcher need to select an


appropriate sample so that the result obtained from the sampling can be used to
generalized the whole population.

1.4.1 Random Sampling (Probability Sampling) – Every elements in the population has equal
chance to be selected as sample.

a. Simple Random Sampling (SRS) - Two methods can be used to randomly select
sample either by using Lucky draw method or Random numbers.
b. Systematic Random Sampling (SYRS) - the selection of the sample is done by
choosing every kth elements in the population
c. Stratified Random Sampling - suitable to be used if the population can be
categorized. Sample will be choosing from each of the categories so that the
sample chosen will represent the population.
d. Cluster Sampling - This technique is also suitable for categorized population.
However, the selection of the sample is done by taking all elements in the
selected categories only.
e. Multi-Stage Sampling - selection of the sample is done by stages. This technique is
suitable for a large population.
UiTM Terengganu STA416

1.4.2 Non-random Sampling (Non-probability Sampling) – Not all elements in the population
has equal chance to be selected as sample.

a. Quota Sampling - the researcher has the flexibility to choose whomever he wants
as long as the specifications set are met.
b. Convenience Sampling - the researcher has the flexibility to select anybody that
they wants or meets until the required sampled is obtained.
c. Judgmental Sampling - the researcher selects a respondent whom he thinks has a
certain characteristics that he wants to study.
d. Snowball Sampling - An initial group of respondent is selected usually at random.
After being interviewed, these respondents are asked to identify others who
belong to the target population of interest.

1.5 Organizing Data

Data can be summarized in tabular forms and presented in pictorial form using graphs so that
important features can be grasped quickly and effectively. Some of these methods will be
discussed in the following sections.

1.5.1 Organizing and graphing qualitative data

After data is collected, it will be process, organized and presented. In order to enhance
the presentation, some chart, table and graph can be used.

Some consideration in drawing charts/graphs:

a. Indicate the title


b. Draw the axes properly
c. Use proper size and scale
d. Use colors/shading if needed
e. Indicate the sources

1.5.1.1 Bar Chart

a) Single bar chart/simple bar chart


 One chart present only one subject

b) Multiple bar chart


 One graph presents more than one subject
 Color/shading needed

c) Component bar chart


 Each bar contains more than one information
 Shading is needed
UiTM Terengganu STA416

1.5.1.2 Pie Chart

Pie chart can be used to represent categorical data. It consists of one or more circle that is divided
into sectors. The sectors show the percentage of each group or category.

i. Single bar chart for the year 2000


ii. Multiple bar chart for the year 2000 and 2001
iii. Component bar chart for the year 2000 and 2001
iv. Pie chart for the year 2000

Program No. of student


Year 2000 Year 2001
DIB 450 600
DIA 1200 1500
DBS 800 1100
DEE 300 400
DCS 650 800

i. Single bar chart for year 2000


UiTM Terengganu STA416

ii. Multiple bar chart for the year 2000 and 2001

iii. Component bar chart for the year 2000 and 2001

Program No. of student


Year 2000 Year 2001 Total
DIB 450 600 1050
DIA 1200 1500 2700
DBS 800 1100 1900
DEE 300 400 700
DCS 650 800 1450
UiTM Terengganu STA416

iv. Pie chart for the year 2000

Program No of Percentage(%) Degree(360)


students
DIB 600 14 600/4400x360=49
DIA 1500 34 123
DBS 1100 25 90
DEE 400 9 33
DCS 800 18 65
Total 4400 100% 360

1.5.2 Organizing and graphing quantitative data

1.5.2.1 Frequency distribution (table)

The frequency distribution is a table that contains a list of data values and its frequency.
Frequency is the number of times a values occurs.

1.5.2.2 Terminologies of frequency distribution

i. Class limit
The end values of each class interval.
Example: 80 – 90  upper limit is 90 and lower limit is 80

ii. Class boundary


Value that falls mid/half way between the upper limit of one class and the lower
limit of the next class.
UiTM Terengganu STA416

Class interval Class boundary


30 – < 50 30 – 50
Example 1 50 – < 70 50 – 70
70 – < 90 70 – 90
30 – 49 29.5 – 49.5
Example 2 50 – 69 49.5 – 69.5
70 – 89 69.5 – 89.5
30 – 50 30 – 50
Example 3 50 – 70 50 – 70
70 – 90 70 – 90

iii. Class midpoint


The middle value of a class interval; averaging the upper limit and lower limit or
upper boundary and lower boundary

iii. Cumulative frequency


Total frequency for the particular class and all the prior classes.

1.5.2.3 Graphic presentation of frequency distribution

i. Histogram
 Y – axis: class frequency
 X – axis: class boundary

The age distribution of the employees in ANZ Manufacturing is as follows.


Construct a histogram to show distribution of age of employees.
UiTM Terengganu STA416

ii. Frequency polygon


 Y – axis: class frequency
 X – axis: class midpoint

iii. Ogive/Cumulative frequency curve (less than/more than)


 Y – axis: cumulative frequency
 X – axis: less than class boundary

The distribution of parking charges collected per car at a car park facility
in Bukit Jelutong in one day is given as follows. Construct the less-than ogive.
UiTM Terengganu STA416

1.6 Measures of Central Tendency


1.6.1 Mean

Ungrouped data

Mean, x = sum of all values


no. of data

Let say a set of data; x1, x2, x3, x4, …..xn

x1  x2  ...  xn
Mean, x 
n
n

x i
x i 1

Example 5:
Age of 8 lecturers; 53, 32, 61, 27, 39, 44, 49, 57. Find the mean age.

Solution 5:
Mean age, x
= 53 + 32 + 61 + 27 + 39 + 44 + 49 + 57 = 45.25 years
8
Meaning: On the average, the age of all lecturers is 45.25 years
Meaning:

Grouped data

fx i i
Mean, x  i 1
n

f
i 1
i

; where i = 1, 2, 3, 4, ……n
x i – midpoint of ith class
f i – frequency of ith class

Example 2:
Numbers of book borrowed by 50 students in December 2002 from UiTMCTKD library.
Find the mean of number of book borrowed.

No. of book borrowed No. of students, f i


10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
UiTM Terengganu STA416

Solution 2:
No. of book No. of student, mid point, fi xi
borrowed fi xi
10 – 12 4 11 44
13 – 15 12 14 168
16 – 18 20 17 340
19 – 21 14 20 280
 f i = 50  f i x i = 832
n

fx i i
Mean, x  i 1
n

f
i 1
i

= 832 = 16.64  17 books


50

Meaning: On the average, the no. of books borrowed is 17 books

Example 6:
Calculate the mean for the following data:

No. of car No. of household


0 5
1 250
2 170
3 60
4 12
5 3

Solution 6:
No. of car No. of household, fi xi
xi fi
0 5
1 250
2 170
3 60
4 12
5 3
 f i = 500  f i x i = 833

Mean, x = 833 = 1.67  2 cars


500

Meaning: On the average, the no. of cars owned is 2 cars


UiTM Terengganu STA416

1.6.2 Median

Value of the middle term in a set of data that has been ranked in increasing order.

Ungrouped data

Step 1: Rank the data in increasing order


Step 2: Find position of the middle term (median position)
n 1
Position of the middle term = ; where n is the no. of data
2
(median position)
The value of middle term is MEDIAN

th
 n  1
* MEDIAN = value of the   term in a ranked data set
 2 

Example 7:
Weight (in kg) of 12 students:
42, 35, 52, 64, 48, 65, 37, 44, 50, 41, 46, 60.
Find the median weight.

Solution 7:
Step 1: Rank the given data: 35, 37, 41, 42, 44, 46, 48, 50, 52, 60, 64, 65 (n = 12)
Step 2: Position of the middle term = 12 + 1 = 6.5
2
Therefore the MEDIAN is the value of the 6.5th term in the ranked data.
Median, ~x = 0.5(48-46) + 46
= 47 kg

Meaning: Half or 50% of the students are more (less) than 47 kg.

Example 8:
Age data of 8 lecturers;
53, 32, 61, 27, 39, 44, 49, 57.
Find the median age.

Solution 8:
Step 1: 27, 32, 39, 44, 49, 53, 57, 61 (n = 8)
Step 2: Median position = 8 + 1 = 4.5
2
Therefore the median is given by the mean of the 4th and the 5th term.
Median, ~ x = 44 + 49 = 46.5 years
2
Meaning: Half or 50% of the lecturers are more (less) than 46.5 years
UiTM Terengganu STA416

Grouped data

By formula:

Step 1: Find the median position and median class


Step 2: Find the median by using the formula
f 
   f m 1 
x = Lm   2
Median, ~ C
 fm  m

 
 
Lm - lower bound of median class
fm - frequency of median class
fm-1 - cumulative frequency before median class
Cm - size of median class
f - number of observation / total frequency

Example 9:
Numbers of book borrowed by 50 students in December 2002 from UiTMCTKD library.
Find the median of number of book borrowed.

No. of book borrowed No. of students, f i


10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14

Solution 9:
No. of book No. of student, Class boundary Cumulative
borrowed fi frequency
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14

Step 1: Median position = n = f =


2 2
Therefore the median class =

Step 2: Lm =
fm =
fm-1 =
Cm =
f =
UiTM Terengganu STA416

f 
   f m 1 
x = Lm   2
Median, ~ C
 fm  m

 
 

= 15.5 + 50/2 – 16 x 3
20
= 16.85 books  17 books

Meaning:

Example 10:
Calculate the median for the following data:

No. of car No. of household


0 5
1 250
2 170
3 60
4 12
5 3

Solution 10:
No. of car No. of household, f i Cumulative
xi frequency
0 5
1 250
2 170
3 60
4 12
5 3

Step 1: Median position = n = f =


2 2
Therefore, median, ~ x =

Meaning:
UiTM Terengganu STA416

Example 11:
Find the median for the above data by using ogive:

Investment (RM) No. of student


1 – 100 5
101 – 200 8
201 – 300 10
301 – 400 12
401 – 500 3
501 – 600 2

Solution 11:
Investment Class boundary No. of student, Cumulative
(RM) fi frequency
1 – 100
101 – 200
201 – 300
301 – 400
401 – 500
501 – 600

Step 1: Median position = n = f =


2 2
Therefore the median class =

Step 2: Lm =
fm =
fm-1 =
Cm =
f =

f 
   f m 1 
x = Lm   2
Median, ~ C
 fm  m

 
 
=

Meaning:
UiTM Terengganu STA416

1.6.3 Mode

Value that occurs with the highest frequency in a data set.

Ungrouped data

Examples 12:
Find the mode for the following data

i. Speeds (in miles per hour): 77, 69, 74, 81, 71, 68, 74, 73
ii. Age data of 8 lecturers; 53, 32, 61, 27, 39, 44, 49, 57
iii. Incomes of seven randomly selected family: RM65000, RM66000, RM70000,
RM78000, RM70000, RM65000, RM75000

Solution 12:

i. Mode, x̂ = 74 miles per hours

ii. Mode, x̂ =

iii. Mode, x̂ =

Grouped data

By formula:
Step1: Find the highest frequency and the modal class.
Step 2: Find the mode by using the following formula:

 d 
Mode, x̂ = Lmo   1
  C mo

 1 d2 
d
Lmo : lower bound of modal class
d1 : difference between modal class frequency and the previous class
frequency
d2 : difference between modal class frequency and the next class
frequency
Cmo : size of modal class

Example 13:
Numbers of book borrowed by 50 students in December 2002 from UiTMCTKD library.
Find the mode of number of book borrowed.

No. of book borrowed No. of students, f i


10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
UiTM Terengganu STA416

Solution 13:
No. of book borrowed Class boundary No. of students, f i
10 – 12
13 – 15
16 – 18
19 – 21

Step 1: Highest frequency = 20.


Therefore the modal class = 16 – 18

Step 2: Lmo = 15.5


d1 = 20 – 12 = 8
d2 = 20 – 14 = 6
Cmo = 18.5 – 15.5 = 3

 d 
Mode , x̂ = Lmo   1
  C mo

 1 d2 
d

= 15.5 + 8 x3
8+6

Meaning: Most or majority of the students borrowed 17 books.


Example 14:
Calculate the mode for the following data:

No. of car No. of household


0 5
1 250
2 170
3 60
4 12
5 3

Solution 14:
Highest frequency =
Therefore the mode =

Meaning:
UiTM Terengganu STA416

Example 15:
Find the mode for the below data by using histogram:

Investment (RM) No. of student Class Boundary


0 – <100 5
100 – <200 8
200 – <300 10
300 – <400 12
400 – <500 3
500 – <600 2

1.7 Measures of Dispersion

1.7.1 Variance

To approximate the average deviation of each data (measurement) from the mean.
Deviation: difference between each data value and its mean (xi – x)

Ungrouped data

1 
  x  2


x  n
i
s  2 2
Sample variance,
n  1  
i
 
Grouped data

1 
  f x  2


fx
i i
Sample variance, s  
2 2

n  1  
i i
n
 
where n = f = sample size
1.7.2 Standard Deviation

Standard deviation, s = var iance

Ungrouped data

1 
  x  2


x  n
i
s
2

n  1  
i
 

Grouped data

1 
  f x  2



i i
s 
2
f i xi
n  1  n 
 
UiTM Terengganu STA416

Example 16:
Calculate the sample variance and standard deviation for the following data;
Age data of 8 lecturers; 53, 32, 61, 27, 39, 44, 49, 57

Solution 16:
xi xi2
53 2809
32 1024
61 3721
27 729
39 1521
44 1936
49 2401
57 3249
x i = 362 x i2 = 17390

1 
  xi 
2

s 
2
n  1 
 xi 
2

n 
 
= 1 ( 17390 ) – ( 362) 2
8–1 8

= 144.21 years2

s = 144.21 = 12.01 years

Meaning: On the average, the age of the lecturers deviated as much as _______ years from
its mean of ________ years.

Example 17:
Numbers of book borrowed by 50 students in December 2002 from UiTMT library. Find
the variance and standard deviation.

No. of book borrowed No. of students, f i


10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
UiTM Terengganu STA416

Solution 17:
No. of book No. of midpoint, x i2 fi xi f i x i2
borrowed students, f i xi
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
 f i = 50 fixi  f i x i2
= 832 = 14216

1 
  f i xi 
2


s 
2
n 1   f i xi 
2

n 
 
= 1 ( 14216 ) – ( 832 ) 2
50 – 1 50

= 7.58 books2

s =  7.58 = 2.75  3 books

Meaning: On the average,

Example 18:
Calculate the variance and standard deviation for the following data:

No. of car No. of household


0 5
1 250
2 170
3 60
4 12
5 3

Solution 18:
No. of car No. of x i2 fi xi f i x i2
xi household, f
i
0 5
1 250
2 170
3 60
4 12
5 3
 f i = 500 fixi =  f i x i2 = 1737
833
UiTM Terengganu STA416

1 
  f i xi 
2


s 
2
n 1 f i xi 
2

n 
 
= 1 ( 1737 ) – ( 833) 2
500 – 1 500

= 0.7 cars2

s =  0.7 = 0.84  1 car

Meaning:

1.8 Coefficient of Variation

Coefficient of variation = Standard deviation x 100%


Mean
s
CV =  100%
x

* The larger the percentage, the greater the variation


* Large variation implies less consistency, small variation implies better consistency.

Example 19: Refer example 18


Find the coefficient of variation.

Solution 19:
Mean, x = 16.64 books
Standard deviation, s = 2.75 books
s
Coefficient of variation, CV =  100%
x

= 2.75 x 100%
16.64
= 16.53%

Meaning: The no. of book borrowed by 50 students deviated approximately _______ from its
mean.
UiTM Terengganu STA416

1.9 Measures of Skewness

1.9.1 Shape of distribution

Symmetric (normal) distribution

x~
x  xˆ

Positively skewed / skewed to the right

xˆ  ~
xx

Negatively skewed / skewed to the left

x~
x  xˆ

1.9.2 Determining the skewness

Pearson’s coefficient of skewness, Pcs

Pearson’s Coefficient of Skewness = Mean – Mode or 3(Mean – Median)


Standard Deviation Standard Deviation

x  xˆ 3( x  ~
x)
Pcs = or
s s
UiTM Terengganu STA416

Example 20:
Table below shows the numbers of book borrowed by 50 students in December 2002
from UiTMT library.

No. of book borrowed No. of students, f i


10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14

Determine the shape of distribution for the data.

Solution 20:
Mean, x = 16.64 books
Median, ~ x = 16.85 books
Standard deviation, s= 2.75 books

Meaning : The shape of distribution is negatively skewed / skewed to the left.

Exercise 1:

Given below is the frequency distribution of the amount of money spent weekly by some
students in UiTM Dungun.

Amount Spent Students


(RM)
less than 95 6
95 and less than 110 12
110 and less than 125 22
125 and less than 140 15
140 and less than 155 5
155 and less than 170 7
more than 170 3

i. Calculate the mean and standard deviation for the above distribution.
ii. Calculate the modal value and explain its meaning.
iii. Draw a less than ogive for the above data and then estimate the value of median
iv. A similar survey on some students in UiTM Seri Iskandar showed that the mean and
variance for amount spent weekly are RM 125.85 and RM 676. Determine which group of
student shows much stability in amount spent weekly.

You might also like