Sta416 Chapter 1
Sta416 Chapter 1
Sta416 Chapter 1
Descriptive Statistics is designed to describe, without going any further; that is without
attempting to infer or conclude anything that goes beyond the data themselves.
2. Quantitative variable – when the variable studied comes in term of numbers (numerical
value)
Example: number of student, total income, distance traveled, test mark etc.
Exercise 1:
1. Determine for the following whether you would use descriptive statistics or inferential
statistics for the following information.
a. A trainer wanted to determine the minimum time taken by his swimmers to swim 100m.
b. An economist uses a bar chart to illustrate the loss made by an airline company from 1990
- 2000.
c. A few botanists do a research on the relation between durian production and the usage of
cow manure as fertilizer.
d. Psychologists study whether urban students are higher achievers as compared to
suburban students.
e. Dewan Bandaraya Kuala Lumpur formed a committee to investigate the relation of flash
floods occurrence and the amount of rubbish in Sungai Gombak and Sungai Kelang.
Generally there are 6 methods of data collection that can be used in order to collect
the primary data. They are:
i. Personal interview
Researcher talks to the respondent face to face.
iii. Mailing
A questionnaire is sent to each respondent with a stamped
addressed envelope attached.
v. Direct Questionnaire
The researcher gives the questionnaire directly to the respondent and waits for
them to complete it.
1.4.1 Random Sampling (Probability Sampling) – Every elements in the population has equal
chance to be selected as sample.
a. Simple Random Sampling (SRS) - Two methods can be used to randomly select
sample either by using Lucky draw method or Random numbers.
b. Systematic Random Sampling (SYRS) - the selection of the sample is done by
choosing every kth elements in the population
c. Stratified Random Sampling - suitable to be used if the population can be
categorized. Sample will be choosing from each of the categories so that the
sample chosen will represent the population.
d. Cluster Sampling - This technique is also suitable for categorized population.
However, the selection of the sample is done by taking all elements in the
selected categories only.
e. Multi-Stage Sampling - selection of the sample is done by stages. This technique is
suitable for a large population.
UiTM Terengganu STA416
1.4.2 Non-random Sampling (Non-probability Sampling) – Not all elements in the population
has equal chance to be selected as sample.
a. Quota Sampling - the researcher has the flexibility to choose whomever he wants
as long as the specifications set are met.
b. Convenience Sampling - the researcher has the flexibility to select anybody that
they wants or meets until the required sampled is obtained.
c. Judgmental Sampling - the researcher selects a respondent whom he thinks has a
certain characteristics that he wants to study.
d. Snowball Sampling - An initial group of respondent is selected usually at random.
After being interviewed, these respondents are asked to identify others who
belong to the target population of interest.
Data can be summarized in tabular forms and presented in pictorial form using graphs so that
important features can be grasped quickly and effectively. Some of these methods will be
discussed in the following sections.
After data is collected, it will be process, organized and presented. In order to enhance
the presentation, some chart, table and graph can be used.
Pie chart can be used to represent categorical data. It consists of one or more circle that is divided
into sectors. The sectors show the percentage of each group or category.
ii. Multiple bar chart for the year 2000 and 2001
iii. Component bar chart for the year 2000 and 2001
The frequency distribution is a table that contains a list of data values and its frequency.
Frequency is the number of times a values occurs.
i. Class limit
The end values of each class interval.
Example: 80 – 90 upper limit is 90 and lower limit is 80
i. Histogram
Y – axis: class frequency
X – axis: class boundary
The distribution of parking charges collected per car at a car park facility
in Bukit Jelutong in one day is given as follows. Construct the less-than ogive.
UiTM Terengganu STA416
Ungrouped data
x1 x2 ... xn
Mean, x
n
n
x i
x i 1
Example 5:
Age of 8 lecturers; 53, 32, 61, 27, 39, 44, 49, 57. Find the mean age.
Solution 5:
Mean age, x
= 53 + 32 + 61 + 27 + 39 + 44 + 49 + 57 = 45.25 years
8
Meaning: On the average, the age of all lecturers is 45.25 years
Meaning:
Grouped data
fx i i
Mean, x i 1
n
f
i 1
i
; where i = 1, 2, 3, 4, ……n
x i – midpoint of ith class
f i – frequency of ith class
Example 2:
Numbers of book borrowed by 50 students in December 2002 from UiTMCTKD library.
Find the mean of number of book borrowed.
Solution 2:
No. of book No. of student, mid point, fi xi
borrowed fi xi
10 – 12 4 11 44
13 – 15 12 14 168
16 – 18 20 17 340
19 – 21 14 20 280
f i = 50 f i x i = 832
n
fx i i
Mean, x i 1
n
f
i 1
i
Example 6:
Calculate the mean for the following data:
Solution 6:
No. of car No. of household, fi xi
xi fi
0 5
1 250
2 170
3 60
4 12
5 3
f i = 500 f i x i = 833
1.6.2 Median
Value of the middle term in a set of data that has been ranked in increasing order.
Ungrouped data
th
n 1
* MEDIAN = value of the term in a ranked data set
2
Example 7:
Weight (in kg) of 12 students:
42, 35, 52, 64, 48, 65, 37, 44, 50, 41, 46, 60.
Find the median weight.
Solution 7:
Step 1: Rank the given data: 35, 37, 41, 42, 44, 46, 48, 50, 52, 60, 64, 65 (n = 12)
Step 2: Position of the middle term = 12 + 1 = 6.5
2
Therefore the MEDIAN is the value of the 6.5th term in the ranked data.
Median, ~x = 0.5(48-46) + 46
= 47 kg
Meaning: Half or 50% of the students are more (less) than 47 kg.
Example 8:
Age data of 8 lecturers;
53, 32, 61, 27, 39, 44, 49, 57.
Find the median age.
Solution 8:
Step 1: 27, 32, 39, 44, 49, 53, 57, 61 (n = 8)
Step 2: Median position = 8 + 1 = 4.5
2
Therefore the median is given by the mean of the 4th and the 5th term.
Median, ~ x = 44 + 49 = 46.5 years
2
Meaning: Half or 50% of the lecturers are more (less) than 46.5 years
UiTM Terengganu STA416
Grouped data
By formula:
Lm - lower bound of median class
fm - frequency of median class
fm-1 - cumulative frequency before median class
Cm - size of median class
f - number of observation / total frequency
Example 9:
Numbers of book borrowed by 50 students in December 2002 from UiTMCTKD library.
Find the median of number of book borrowed.
Solution 9:
No. of book No. of student, Class boundary Cumulative
borrowed fi frequency
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
Step 2: Lm =
fm =
fm-1 =
Cm =
f =
UiTM Terengganu STA416
f
f m 1
x = Lm 2
Median, ~ C
fm m
= 15.5 + 50/2 – 16 x 3
20
= 16.85 books 17 books
Meaning:
Example 10:
Calculate the median for the following data:
Solution 10:
No. of car No. of household, f i Cumulative
xi frequency
0 5
1 250
2 170
3 60
4 12
5 3
Meaning:
UiTM Terengganu STA416
Example 11:
Find the median for the above data by using ogive:
Solution 11:
Investment Class boundary No. of student, Cumulative
(RM) fi frequency
1 – 100
101 – 200
201 – 300
301 – 400
401 – 500
501 – 600
Step 2: Lm =
fm =
fm-1 =
Cm =
f =
f
f m 1
x = Lm 2
Median, ~ C
fm m
=
Meaning:
UiTM Terengganu STA416
1.6.3 Mode
Ungrouped data
Examples 12:
Find the mode for the following data
i. Speeds (in miles per hour): 77, 69, 74, 81, 71, 68, 74, 73
ii. Age data of 8 lecturers; 53, 32, 61, 27, 39, 44, 49, 57
iii. Incomes of seven randomly selected family: RM65000, RM66000, RM70000,
RM78000, RM70000, RM65000, RM75000
Solution 12:
ii. Mode, x̂ =
iii. Mode, x̂ =
Grouped data
By formula:
Step1: Find the highest frequency and the modal class.
Step 2: Find the mode by using the following formula:
d
Mode, x̂ = Lmo 1
C mo
1 d2
d
Lmo : lower bound of modal class
d1 : difference between modal class frequency and the previous class
frequency
d2 : difference between modal class frequency and the next class
frequency
Cmo : size of modal class
Example 13:
Numbers of book borrowed by 50 students in December 2002 from UiTMCTKD library.
Find the mode of number of book borrowed.
Solution 13:
No. of book borrowed Class boundary No. of students, f i
10 – 12
13 – 15
16 – 18
19 – 21
d
Mode , x̂ = Lmo 1
C mo
1 d2
d
= 15.5 + 8 x3
8+6
Solution 14:
Highest frequency =
Therefore the mode =
Meaning:
UiTM Terengganu STA416
Example 15:
Find the mode for the below data by using histogram:
1.7.1 Variance
To approximate the average deviation of each data (measurement) from the mean.
Deviation: difference between each data value and its mean (xi – x)
Ungrouped data
1
x 2
x n
i
s 2 2
Sample variance,
n 1
i
Grouped data
1
f x 2
fx
i i
Sample variance, s
2 2
n 1
i i
n
where n = f = sample size
1.7.2 Standard Deviation
Ungrouped data
1
x 2
x n
i
s
2
n 1
i
Grouped data
1
f x 2
i i
s
2
f i xi
n 1 n
UiTM Terengganu STA416
Example 16:
Calculate the sample variance and standard deviation for the following data;
Age data of 8 lecturers; 53, 32, 61, 27, 39, 44, 49, 57
Solution 16:
xi xi2
53 2809
32 1024
61 3721
27 729
39 1521
44 1936
49 2401
57 3249
x i = 362 x i2 = 17390
1
xi
2
s
2
n 1
xi
2
n
= 1 ( 17390 ) – ( 362) 2
8–1 8
= 144.21 years2
Meaning: On the average, the age of the lecturers deviated as much as _______ years from
its mean of ________ years.
Example 17:
Numbers of book borrowed by 50 students in December 2002 from UiTMT library. Find
the variance and standard deviation.
Solution 17:
No. of book No. of midpoint, x i2 fi xi f i x i2
borrowed students, f i xi
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
f i = 50 fixi f i x i2
= 832 = 14216
1
f i xi
2
s
2
n 1 f i xi
2
n
= 1 ( 14216 ) – ( 832 ) 2
50 – 1 50
= 7.58 books2
Example 18:
Calculate the variance and standard deviation for the following data:
Solution 18:
No. of car No. of x i2 fi xi f i x i2
xi household, f
i
0 5
1 250
2 170
3 60
4 12
5 3
f i = 500 fixi = f i x i2 = 1737
833
UiTM Terengganu STA416
1
f i xi
2
s
2
n 1 f i xi
2
n
= 1 ( 1737 ) – ( 833) 2
500 – 1 500
= 0.7 cars2
Meaning:
Solution 19:
Mean, x = 16.64 books
Standard deviation, s = 2.75 books
s
Coefficient of variation, CV = 100%
x
= 2.75 x 100%
16.64
= 16.53%
Meaning: The no. of book borrowed by 50 students deviated approximately _______ from its
mean.
UiTM Terengganu STA416
x~
x xˆ
xˆ ~
xx
x~
x xˆ
x xˆ 3( x ~
x)
Pcs = or
s s
UiTM Terengganu STA416
Example 20:
Table below shows the numbers of book borrowed by 50 students in December 2002
from UiTMT library.
Solution 20:
Mean, x = 16.64 books
Median, ~ x = 16.85 books
Standard deviation, s= 2.75 books
Exercise 1:
Given below is the frequency distribution of the amount of money spent weekly by some
students in UiTM Dungun.
i. Calculate the mean and standard deviation for the above distribution.
ii. Calculate the modal value and explain its meaning.
iii. Draw a less than ogive for the above data and then estimate the value of median
iv. A similar survey on some students in UiTM Seri Iskandar showed that the mean and
variance for amount spent weekly are RM 125.85 and RM 676. Determine which group of
student shows much stability in amount spent weekly.