Partnership
Partnership
Partnership
statistics
Chapter 1 – Basic Concepts
1.1 Introduction
Without noticing it, we often apply statistics in our daily lives.
When we records number of clothing (shirts, blouses, pants, skirts, etc.) and
each colors in our cabinets, we are recording data of our collection of clothing
for future use.
When we record our monthly expenses, it is one way in determining how much
will be our monthly budget for the succeeding months.
When a nurse records the result of a physical examination of a patients, it is a
collection of data to aid the doctor/physician in diagnosing the type of illness
the patient has and to determine the appropriate medical treatment that must
be prescribed to the patient.
When watchers of a basketball game talk about the number of rebounds and
assists done by a player, they are talking about some vital statistics of the
game.
When a teacher records the examination scores of the students in a particular
subject, it is a recording of data which can be use later on to determine if th
method of teaching has been effective, to find out how the students performed
in the test, or to know the degree of ease or difficulty of the test.
1.2 Definition :
Statistics
Is a branch of applied mathematics that deals with the collection,
organization, presentation, analysis, and interpretation of data.
Its essential purpose is to describe and draw inferences about the
numerical properties of populations.
concerned with scientific methods of collection, organization,
presentation, analysis, and interpretation of data. Its essential purpose
is to describe and draw inferences about the numerical properties of
populations.
Is a science of learning from data.
The data are numerical or qualitative descriptions of the objects that
we want to study( examples: averages, population statistics,
statistics).
The term “data” means factual information or observations that may
either be quantitative or qualitative.
The collection of data entails gathering of information through
interview schedules, structured questionnaires, observations,
experimentations, use of existing records and other methods.
The data are then organized in an orderly fashion as a requisite for
data presentation.
More often than not the data gathered are presented in graphs and
tables to give the readers a quick picture o the data distribution.
Textual presentation is also being utilized when few data are to be
presented and explained.
Data analysis comes after processing of data as guided by statistical
principles.
This may involve the use of any method of statistics the choice of
which depends upon the nature or purpose of the statistical
problem at hand.
Drawing valid conclusions and making reasonable decisions are
based on such analysis.
For example, a political analyst can use data from a portion of the
voting population to predict the political preferences of the entire
voting population
Importance / Value of Statistics
Familiarization with it enables one to make sense of the
many things one reads in newspapers and magazines and
on the internet
in health and medicine,
sports,
finance,
education, and
other areas.
A statistical inquiry is a process of transforming raw data into
useful information that can tell us more about a subject and
allow us to make recommendations and possibly make
predictions of future outcomes. It consists of six stages:
1. posing questions - involves coming up with questions that, if
answered, would lead to meaningful information that would
allow us to draw a conclusion and to make recommendations.
For example,
suppose you were in charge of the school’s funds that have been set
aside for the development of a new sports field, but aren’t sure which
type of field (e.g. cricket pitch, basketball court) would be of greatest
benefit to students.
To investigate this issue, you would need to ask questions
such as
“What is the most popular sport among students?”
(you want to construct a type of field that would satisfy the
majority of students),
“Are there enough funds to construct the students’ preferred
type of field?”
(you can’t construct a type of field that you can’t afford) and
“How long will it take to construct?”
The military’s research and development team would like to find out both
the ideal type of uniform for its soldiers to wear in combat and the ideal
type for them to wear in parades. Pose some questions that would need to
be answered by the team in order draw a conclusion and to make
recommendations.
A restaurant owner would like to know how many chefs, waiters, cashiers
and managers to hire for his new restaurant and would like to know how
many staff to roster on during each time of the day. Pose some questions
that would need to be answered by the investigation
2. collecting data - Once we have posed questions,
we need to collect data to answer them.
Before we do the actual collecting, we have to
decide on how we will collect the data, the type of
data we will collect and the sources from which we
will collect them.
The sources can be either primary or secondary.
Collecting from a primary source involves
collecting the data directly yourself by interviewing
or observing others or even conducting
experiments.
When collecting data using any such methods, it is
important to ensure that the data to be collected
can be organised easily.
For example, when creating a questionnaire, it would be better to include
questions that are not open-ended, but rather have a limited number of options
from which participants can choose their answers.
This way, the answers collected can be easily tallied and organised.
For instance, instead of asking someone “What is your favourite colour?”,
it would be better to ask “Which of the following colours is your favourite?”
and to list a few common colours that they can choose from, including an option
of “Other” in case they would like to answer with a colour that is not one of
those listed.
Using a secondary source involves gathering data that has
already been collected or generated by others.
This could involve gathering data from books or the internet. It is
important that the data to be collected are from a reliable source
and not from some obscure website or outdated book, otherwise
the data may not be accurate.
Some reliable sources of note are government organisations
such as the Australian Bureau of Statistics and the Bureau of
Meteorology, which have strict data collection methodologies in
place to ensure the accuracy and reliability of their data.
Determine whether the data to be gathered to investigate the
following would be from a primary or secondary source.
Also state the method (eg. questionnaire, interview, observation,
experiment),
if the source is to be a primary one, or the source (eg. books,
newspapers, internet),
if the source is to be a secondary one, you would use to gather the
data.
1. the most popular subject among students at school
2. the average daily temperature in Sydney over the last month
3. the number of traffic accidents in the country each year
4. the number of visitors to the local library in an afternoon
5. the average daily temperature in your home over the past week
6. the number of goals scored by the Socceroos since the last World Cup
7. students’ main qualm with the school principal
3. organizing data
arrange the data we have collected into a form that gives structure
and order to the data.
A common way of accomplishing this is to use a table e.g. a
frequency table.
How this data will be organized will vary as a function of the nature
of the statistical investigation.
For example,
if the data collected were the incomes of a group of workers, it
would make more sense to organize the data into categories
of income ranges i.e. to tally up the number of workers within
certain income ranges such as $50,000-$60,000 rather than
tally up the number of workers with an income of a particular
value e.g. the number of workers with an income of $54,682.
The following are the HSC results of a class of 30 year 12 physics
students. 81 90 93 79 71 88 64 7 59 80
5
84 72 77 80 73 67 85 7 71 91
6
78 82 70 75 89 83 74 7 81 80
2
Draw up a frequency table of the results with suitable groupings.
(HINT: HSC results are usually grouped into bands.)
4. summarizing and displaying data
Once we have organized the data, we need to present the
data in a form that will be easy to read, understand and
analyze.
Most often this will be accomplished by using a graph such
as a column graph, bar graph, pie chart, dot plot or line
chart.
The particular type of graph to be used will depend on the
purpose of the investigation.
For example,
in order to present data on the proportion of
students with a particular type of favourite sport,
it may be more appropriate to use a pie chart
than a dot plot. Besides displaying the data in a
graph, it may also be beneficial to summarize
the data using statistical quantities such as the
mean, median, mode and range.
5. analyzing data and drawing conclusions
After we have finished summarizing and displaying the data, it is time to
examine and interpret the data, to decide on what it means and to ultimately
draw conclusions from it.
This may involve identifying trends and patterns from the graph, and
identifying how those trends and patterns change over time or across
categories (such as across different populations).
From these trends, we can then draw conclusions and possibly make
predictions about future outcomes.
6. writing a report
Once we have finished analyzing the data, it is time to put everything together
in a written report.
Any report should address the background and aim of the statistical inquiry and
the questions it sought to answer, detail the data collection method (including
sources and type of data), involve a thorough discussion of the findings, list
and explain the reasoning behind the conclusions, and, if appropriate, include
recommendations for the future.
It should also include the tables and graphs from steps 2 and 3 of the inquiry
(even if only as part of the appendix).
1.2.1 Categories of
Statistics:
1. Descriptive Statistics
Deals with the collection and presentation of data and the summarizing
values that describe the group’s characteristics.
Most common summarizing values are the measures of central
tendency and variation.
This means that no attempt is being made to generalize to a larger set
of data. T
his branch of statistics lays the foundation for all statistical knowledge.
For example, if we measure the heights of the complete population of
students in a particular FEU and compute the mean height, that mean is
a descriptive measure because it describes a characteristic of the
complete population. If, on the other hand, we measure the heights of a
sample of 100 students and compute the mean height for the sample,
that mean is also a descriptive statistic because it describes a
characteristic of
Example:
Descriptive performance of a class of
40 students using mean score in a
given examination in statistics.
Suppose the mean score is 56 and the
passing score is 45, then you can say
that majority of the students passed
the test..
If only 12 out of 60 students obtained
scores above 45, then it means that
the exam is too difficult or the
teaching strategy was not effective
so either the teacher has to give a
less difficult exam, or reteach the
topic.
2. Inferential Statistics
Deals with predictions and inferences based on the analysis and
interpretation of the results of the information gathered by the
statistician.
Common statistical tools of inferential statistics are the t-test, z-test,
analysis of variance, chi-square, and Pearson r.
Patterns in the data may be modeled, in a way that accounts for
randomness and uncertainty in
the observations, to draw inferences about the process or population
being studied.
Suppose we wish to make a statement about the
mean height in the complete population of
students in a particular FEU from the knowledge
of the mean computed on the sample of 100 and
to estimate the error involved in this statement,
and then we should use procedures from
inferential statistics. The application of these
procedures provides information about the
accuracy of the sample mean as an estimate of
the population mean; that is, it indicates the
degree of assurance we may place in the
inferences we draw from the sample to the
population.
Examples: gender, eye color, political preferences, religion, blood
type, civil status, year level, course, profession and
socioeconomic status
Qualitative variables are used extensively in observational
studies
Examples: a college program (BSCS, ACT, BSCOE, BSN,
etc.), courses enrolled (Mathematics, English, Filipino,
Science, etc.) , Religion (Catholic, Protestant, Iglesia ni
Cristo, etc.) , gender (male, female) .
Categories may be ordered but specific numerical values may
or may not be assigned.
Example: Performance rating (poor. Fair, good, very good,
excellent), Score (low, average, high), Public opinion about
performance of President (1 means Poor, 2 means Average,
and 3 means Good)
1.2.2 Types of Data
Data or information sources can be classified into two; primary
and secondary
a. Primary Data: are data which have been acquired directly from
source.
They are also called as eye-witness accounts written by
people who experienced the particular event or behavior
and are collected especially for the task at hand. Thus, we
collect primary data when, for example, we observe certain
enrollment backlog and determine what causes such. Other
examples of primary data are minutes of the meeting, office
memos, financial records, membership lists, etc.
Example:
minutes of the meeting
office memos
financial records
membership lists
data obtained by measuring the height of students in
Statistics class
b. Secondary Data : Non -primary data or existing records
For example:
census data (published statistics) from National
Statistics Coordinating Board (NSCB) on demographics
represents secondary source of data.
1.2.3 Variable
is a numerical characteristics or attribute associated with the
population being studied.
a particular attribute of interest that is measurable or
observable on each and every individual or object.
Variable can either be qualitative or quantitative.
This indicates that measurement is not confined to
numerical or quantitative specification
Different type of variables
a. Categorical or qualitative variables
Are variables that are classified according to some attribute or
categories.
have labels or names rather than numbers, assigned to their
categories.
Used extensively in observational studies
Example ofCategorical or qualitative variables :
gender (male, female)
courses enrolled (Mathematics, English, Filipino, Science,
etc.)
Religion (Catholic, Protestant, Iglesia ni Cristo, etc.)
a college program (BSCS, ACT, BSCOE, BSN, etc.).
b. Numerical-valued or quantitative variables.
Are variables that are classified according to numerical
characteristics.
are measured in numbers
Example: height, weight, age, pulse rate, number of
children, and speed, grade point average (GPA), and
number of academic units enrolled.
Can be treated as categorical variables when they are
grouped into class interval.
Examples of Numerical-valued or quantitative variables :
Age in years (5-9,10-14,15-19 and 20 and above), Height in cm
(100-149, 150-199, 200-249), Grade in Math (1.00-1.49, 1.50-1.99,
2.00-2.49, 2.50-2.99, 3.00-3.49, 3.50-3.99, and 4.00-5.00)
Numerical-valued variables an be classified as follows:
b.1 Discrete Variables
assume only a finite or countable number of values.
variables whose values are obtained by counting
Example:
Number of subjects/courses enrolled
number of students per class
number of children
number of persons with blue eyes
number of patients with TB
number of males and
females in Statistics class).
b.2 Continuous Variables
variables whose values are obtained by measuring
may take on any value in a given interval or
continuum of values
Example:
temperature, distance, area, density, age,
height, weight), all of which cannot be put into
a list because they can have any value in some
interval of real numbers, GPA
The room number is a numerical variable that is treated as a categorical
variable because the numbers are assigned only as codes or as identifiers.
The same is true with civil status.
Age is a continuous numerical variable which is rounded off to the nearest
ones.
Gender, job title, and type of illness are categorical variables.
1.2.4 Levels of Measurement
Categorical Data
Nominal: ordering does not exist, eg., gender, SSN, eye
color
Ordinal: ordering does exist, eg., military rank, class levels,
rating scales
Numerical Data
Interval:
distance exists but no ratios
zero is arbitrary and not an indication of absence of
the measurement;
eg.’s, temperature scale, IQ scores, GPA
Ratio:
ratios exists and zero indicates an absence of the
measurement
1.2.5 Scales of Measurement
In selecting statistical tool to be used for drawing inferences on a random
sample, the type of measurement scale must be carefully chosen.
Measurements are classified into four scales:
a. Nominal Scale
Isa measurement scale that classifies elements into two or more
categories or classes.
The numbers indicate that the elements are different, but the
difference is not according to order or magnitude.
data collected are simply labels or names or categories
without any implicit or explicit ordering of the labels;
observations with the same label belong to the same
category;
lowest level of measurement;
frequencies or counts of observations belonging to the
same category can be obtained.
arithmetic operations cannot be performed.
Example:
Example:
1. Length 10 m to 20 m
2. Height 4’ to 7’
3. Area 12 m2 to 25 m2
c. Recognition Type
Example:
d. Dichotomous Type
Example:
Do you live alone? Yes ____ No ____
e. Multiple-Choice Type
Example:
Which of the following means abattoir?
a) dungeon
b) cave
c) house
d) chateau
e) none of these
f. Multiple-Response Type
Example:
What appliances/devices do you have at home? Encircle the
numbers.
1. Television 7. Vacuum cleaner
2. Refrigerator 8. Personal computer
3. DVD/VCD player 9. Fax machine
4. Piano/organ 10. Telephone
5. Electric stove
6. Gas range
g. Free-Response Type
The respondent is not guided in giving his reply. He can answer using his
style and his own way.
h. Rating Scale Type
Example:
2.2.3 Empirical Observation Method
is a method of obtaining data by seeing, hearing,
testing, touching, and smelling.
Thismethod is commonly used in psychological
and anthropological studies.
Through observation, additional information,
which cannot be obtained using the other
methods like the questionnaire, may be gathered.
The observer may participate in the activities of
the group being studied (participant observation)
or he may just be a bystander (nonparticipant
observation).
When an observation is done in a laboratory,
as in the case of experimental studies, the
type of observation is controlled
observation.
2.2.4 Test Method
Thismethod is widely used in
psychological research and psychiatry.
Standards tests are used because of their
validity, reliability, and usability.
2.2.5 Registration Method
Example data gathered using this method are those that are obtained from the
National Statistics Office, Department of Education, CHED, SEC, Supreme Court,
and other government agencies.
Solution:
f.a.
2. A researcher plans to get 588 sample units from a population N
using a 4% margin of error. What is the value of the population N?
Given: n =588 ; e = 4% or 0.04
Find: N
Formula:
Solution:
N = 9,932 f.a.
2.3.3 Purposive Sampling
The respondents of the study will be chosen based
on their knowledge of the information required by
the researcher.
Example:
The horizontal axis lays out that portion of the scale of exam scores that includes all
12 of the listed values, and each student's individual score on the exam is
represented by a box placed at the appropriate point on the scale.
Thus, the box for student 'a' is placed at 61 on the scale, the box representing
student 'b' falls at 69, and so forth.
The type of graph shown in Figure 1.1 is useful when you are interested in
conveying detailed information about each and every measure in the distribution,
though with larger numbers of measures it can become quite cumbersome.
Besides, for most practical statistical purposes your interest is not so much in the
individual identity of your measures as in the overall shape and texture of the
distribution that they compose.
3.2 Raw Data and Frequency Distribution
A special table that may be constructed for any variable is the
Frequency Distribution Table (FDT).
Whether the variable is quantitative or qualitative, an FDT may be
constructed for it.
Thus we may have a quantitative FDT and qualitative FDT
Constructing a qualitative FDT only requires identifying the categories
where each datum may be classified and counting how many of the
data belong to each category.
Constructing a quantitative FDT, on the other hand, requires the
creation of classes where each datum may be classified as belonging to
one of these classes.
Raw Data will not be or organized in any meaningful manner.
When data are quantitative, we can use the frequency distribution
and histogram.
Frequency distribution is a table that divides the data values
into classes and shows the number of observed values that fall
into each class.
By converting data to a frequency distribution, we gain
perspective that helps us see the forest instead of the individual
trees.
Histogram a more visual representation.
Describe a frequency distribution by using a series of adjacent
rectangles, each of which has a length that is proportional to
the frequency of the observations within the range of values it
represents.
The Frequency Distribution
A frequency distribution table is a device for organizing and presenting
data.
When the set of data contains more than 30 cases, a frequency
distribution table may be constructed to make the task more manageable
and to save time in calculating different statistics.
Key Terms:
Class: each category of the frequency distribution.
Frequency: the number of data values falling within each class.
Class limits: the boundaries for each class.
These determine which data values are assigned to that class.
3.
c= 9
4. Lowest score = 6, this becomes the lower limit of the first
interval.
Higher limit = 6 + (9 – 1) = 14
5. Lower limit of the second interval = 9 + 6 = 15
Higher limit of second interval = 15 + (9-1) = 23
so on, and so forth
Example 2:
Solution
to
Exercise
1
Solution:
1. R=9–2=7
2. N = 77 Hence use 2𝑘 ≥ 𝑁
27 = 128 𝑇ℎ𝑢𝑠 k= 7
3. c = 7/7 = 1
a. Class Mark: midpoint of class interval.
Note that the last value of the variable resource is “Other”, which
includes all other online resources that were given as selection
options. For data sets that have a large number of values for a
categorical variable, we often create a category such as this that
includes categories that have relatively small counts or percents.
Careful judgment is needed when doing this. You don’t want to cover
up some important piece of information contained in the data by
combining data in this way.
3.5.4 Frequency Histogram
One of the kinds graphs which can be applied for grouped
data.
The frequency will be represented by points in the
3.5.6 Frequency Polygon
Unlike in the frequency histogram where bars drawn by side are used, points
connected by line segment are utilized in the frequency polygon.
3.5.7 Cumulative Frequency Ogive
Used in statistical reports.
Determine the cumulative frequencies (CF):
a. “Less than” cumulative frequency ( <CF) is the total number of
observations whose values do not exceed the upper limit of the class
<CF (read as “less than cumulative frequency”), each entry in <CF column will be
obtained by accumulating the frequencies starting from the frequency of the lowest
score interval. Up to th highest score.
b. “Greater than” cumulative frequency( >CF) is the total number of
observations whose values are not less than the lower limit of the class.
>CF (read as “greater than cumulative frequency”) starting from the highest score
interval up to lowest score interval.
c = 1680/5 = 336
2.The cumulative frequency distribution (ogive) for amount spent on
groceries last month is computed using the construction guidelines
outlined above for the ogive.
Based on the numeric frequency distribution in Table 3.15, an additional
interval (0 – < 400) is included. The cumulative frequency count for this
interval is zero, since no shopper spent less than 400 on groceries last
month. Referring to the upper limits for each successive interval above
400, the following cumulative counts are derived:
7 shoppers spent up to 800
21 (= 7 + 14) shoppers spent up to 1200
26 (= 21 + 5) shoppers spent up to 1600
29 (= 26 + 3) shoppers spent up to 2000
all 30 shoppers (= 29 + 1) spent no more than 2400 on groceries
last month.
The ogives for both the frequency counts and percentages are shown in
Table 3.15.
Figure 3.9 shows the percentage ogive graph. Note that the %
cumulative frequency is 0% at 400 (the upper limit of the extra
interval) and 100% at the upper limit of 2400 for the last interval.
This means that no shopper spent less than 400 or more than 2400
last month on groceries.
Management Interpretation
Note: The ogive is a less than cumulative frequency graph, but it can
also be used to answer questions of a more than nature (by
subtracting the less than cumulative percentage from 100%, or the
cumulative count from n, the sample size).
3.5.10 The Scatter Diagram
A scatter diagram is a graphical presentation of the
relationship between two quantitative variables, and a
trendline is a line that provides an approximation of the
relationship.
The diagram represents a pair of known or observed
values of two variables, generally referred to x and y.
The two variables are referred to as the dependent (y)
and independent (x) variables.
The the typical purpose fort his type of analysis is to
estimate or predict what y will be for a given value of x.
3.5.11Stem -and-Leaf Diagram
Stem-and-Leaf diagram is a diagram that presents a graphical
display of the ungrouped data.
It is also called as Stemplot.
The data are arranged by its stems and leaves.
The leading digits are called stems, the final digits are the leaves.
This form is best for small number of observations with values
greater than 0.
2. For each datum, identify its leaf (the units
digit) and its stem (all other digits except the
last or units digit).
Example: 24 4 is the leaf
2 is the stem
79 9 is the leaf
7 is the stem
3.List the stems vertically in increasing order
from top to bottom.
4. Draw a vertical line into the right of the
stems.
5. List the leaves to the corresponding stem to
the right of the line in an increasing order.
Chapter 4: Descriptive Measures
For large data, use 2𝑘 ≥ 𝑁, N ≥ 30
where k = the class interval
N = total population
Descriptive Measures:
1. Central tendency
2. Dispersion
3. Location
4. Skewness
5. Kurtosis
Central Tendency:
A measure of central tendency or location describes the “center” of a
given set of data.
A value within the range of data set which describes its location or
position relative to the entire set of data.
It is referred to as either measure of central tendency or measure of other
position or location.
It is a single value about which the observations tend to cluster.
The common measures of location are :
1. Arithmetic Mean
2. Median
3. Mode
Ungrouped Data
4.1 Mean
Arithmetic mean is defined as the sum of the data values
divided by the number of observation.
It is one of the most common measures of central tendency.
Also referred to as arithmetic average or simply mean.
Expressed as (the population mean, pronounced as
“myew”) or (the sample mean, ”x bar”).
b.
Mode = 3.8
Example: Consider the score listed below.
Table 1:
The use of array serves as a very effective tool in facilitating the construction of
an FDT. One thing which might be done is to rewrite these 120 post test scores in
order of magnitude from highest to lowest or, if preferred, from lowest to highest.
Table 1a:
= 33
For Grouped Mean:
σ𝑘
𝑖=1 𝑓𝑖 𝑋𝑚
𝑋=
σ𝑘
𝑖=1 𝑓𝑖
where:
𝑋 = class mark
𝑓𝑖 = 𝑖𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
k = number of classes
σ𝑘𝑖=1 𝑓𝑖 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 (𝑁)_
\\\\
𝑿 = 𝟑𝟐. 𝟒𝟔
b. Median is the middle value of an array, denoted by
Md (for population median)
where:
𝐿𝐶𝐵𝑀𝑑 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
c = class size
𝐹𝑏 = < 𝐶𝐹 𝑖𝑚𝑚𝑒𝑑𝑖𝑎𝑡𝑒𝑙𝑦 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
𝑓𝑀𝑑 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
Note:
𝑁
The median class = value of the arranged data set.
2
Consider the FDT 𝐹𝑏 = 40
<C
F
7
19
40
Median class 75
97
11
1
𝑁 120 12
= = 60𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜n 0
2 2
The modal
is the
highest
frequency
𝑓𝑀𝑜 − 𝑓𝑏
𝑀𝑜𝐺 = 𝐿𝐶𝐵𝑀𝑜 + 𝑐
2𝑓𝑀𝑜 − 𝑓𝑏 − 𝑓𝑎
35 −21 14
𝑀𝑜𝐺 = 29.5 + 5 = 29.5 + 5
2 35 −21 −22 27
𝑀𝑜𝐺 = 32.09
Graphical Presentation of the given data
Histogram It is a bar diagram where the bars are adjacent, and the
base extends from the lower true class boundary to upper true class
boundary. The height of the bar represents the number of cases within the
interval. The class boundaries are marked off along the horizontal axis and
the scale of frequency is shown on the vertical axis.
A curve drawn over the figure approximates the trend (Symmetric or normal
or bell-shaped) of the distribution of the data.
Measures of Dispersion
A quantitative measure that describes the extent to
which the data are dispersed are generally known as
measure of dispersion.
It is single value that describes how widely dispersed or
spread the data are.
a. Variance
Variance is the mean of the squared differences of the
observations from their mean and is denoted by s2.
b. Standard Deviation
Standard Deviation is the positive square root of the
variance, denoted by s .
For Ungrouped Data:
Ungrouped Variance: 𝑠 2
2 σ𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2 σ𝑁
𝑖=1 𝑥𝑖
2
𝑠 = = − 𝜇2
𝑁 𝑁
where 𝑥𝑖 = 𝑖𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
N = total number of observations
µ = mean of the ungrouped data
Standard Deviation: s
𝑠= 𝑠2
Example:
Consider the previous example, compute the ungrouped
variance and standard deviation. Using the table:
Solution:
a. Grouped variance:
σN
i=1 XI
2
s2 = − μ2
n
152 +162 +172 +⋯+482
= − 332
120
137782
= − 1089
120
= 1148.18 − 1089
s2 = 59.18 f.a.
b. Grouped standard Deviation
s = s2 = 59.18
s = 7.69 ≈ 7.7 𝑓. 𝑎.
For Grouped Data:
a. Variance: 𝑠 2
2
σ𝑘
𝑖=1 𝑓𝑖 (𝑋𝑚 −𝑋) σ𝑘 𝑓 𝑋 2 2
𝑠2 = = 𝑖=1 𝑖 𝑚
− 𝑋
𝑁 σ𝑘
𝑖=1 𝑓𝑖
120 𝒇𝒊 𝑿𝒎 𝟐 = 𝟏𝟑𝟑𝟔𝟕𝟓
𝒊=𝟏
133675
Thus, Variance: 𝑠 2 = 120
− (32.4)2 = 64.2 f.a.
c. Quartiles are numerical quantities which divide the array of data into four (4) equal parts.
Finding the 𝑘 𝑡ℎ Quartile (𝑄𝑘 );
The 1𝑠𝑡 quartile is the 25𝑡ℎ percentile, 2𝑛𝑑 quartile is the 50𝑡ℎ percentile (also the 5𝑡ℎ decile and the
𝑟𝑑 𝑡ℎ 𝑡ℎ 𝑡ℎ
median), the 3 quartile is the 75 percentile, and the 4 quartile is the 100 percentile.
Example:
Find the 3𝑟𝑑 Quartile (𝑄3 ).
Solution:
𝑗
Take note that 𝑄3 = 𝑃75 thus 𝑁 𝑥 will be evaluated.
100
𝑗 75
𝑁𝑥 = 120 𝑥 = 90
100 100
𝑗
Since 𝑁 𝑥 = 90 is a whole number, we take 𝑃75 as
100
the average of the values located in the 90𝑡ℎ position and
the 91𝑡ℎ position. Therefore,
39+39
𝑃75 = = 39
2
𝑄3 = 39
Grouped Data
1. Percentile
𝑗𝑁
𝑃𝑗 = 𝐿𝐶𝐵𝑃𝑗 + 100 − <𝐶𝐹𝑏 𝑐
𝑓 𝑃𝑗
3. Quartile
𝒌𝑵
𝟒
− <𝑪𝑭𝒃
𝑸𝒌 = 𝑳𝑪𝑩𝑸𝒌 + 𝒄
𝒇𝑸𝒌
where: 𝑄𝑘 = 𝑘𝑡ℎ 𝑑𝑒𝑐𝑖𝑙𝑒
𝐿𝐶𝐵𝑄𝑘 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑘𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒
< 𝐶𝐹𝑏 = 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑏𝑒𝑓𝑜𝑟 𝑡ℎ𝑒 𝑘𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒
𝑓𝑄𝑘 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑘𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒
𝒌𝑵
Location: the kth position
𝟒
Example:
Assumed
𝑷𝟐𝟎
𝑗𝑁
𝑡ℎ − <𝐶𝐹𝑏
24 𝑝𝑜𝑠𝑖𝑡𝑖on: 𝑃𝑗 = 𝐿𝐶𝐵𝑃𝑗 + 100
𝑓𝑃
𝑐
𝑗
24 −19
𝑃20 = 24.5 + 5 = 24.5 + 0.2381(5)
21
= 24.5 + 1.19
𝑃20 = 25.69
𝑗𝑁
𝑃𝑗 = 𝐿𝐶𝐵𝑃𝑗 + 100 − <𝐶𝐹𝑏 𝑐
𝑓𝑃𝑗
24 −19
𝑃20 = 24.5 + [ ](5)
21
b) 𝐷6
𝒊𝑵 𝟔(𝟏𝟐𝟎)
Location: = = 𝟕𝟐𝒏𝒅 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟏𝟎 𝟏𝟎
Type equation here.
ASSUMED
6TH DECILE
𝒊𝑵
− < 𝑪𝑭𝒃
𝑫𝒊 = 𝑳𝑪𝑩𝑫𝒊 + 𝟏𝟎 𝒄
𝒇𝑫𝒊
6(120)
𝐷6 = = 72𝑛𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
10
72 − 40
𝐷6 = 29.5 + (5)
35
= 29.5 + 4.57
𝐷6 = 34.07
c) 3rd Quartile
Assumed
3rd
quartile
𝒌𝑵 3(120)
Location: = = 90𝑡ℎ
𝟒 4
𝒌𝑵
− <𝑪𝑭𝒃
𝟒
𝑸𝒌 = 𝑳𝑪𝑩𝑸𝒌 + 𝒄
𝒇𝑸 𝒌
𝟗𝟎 − 𝟕𝟓
𝑸𝟑 = 𝟑𝟒. 𝟓 + 𝟓
𝟐𝟐
= 34.5 + 3.41
𝑸𝟑 = 37.91
Chebyshev’s Theorem
Chebyshev (1821–1894) was a Russian mathematician who
primarily worked on the theory of prime numbers and a wide
range of subjects.
One of those subjects was probability and his theorem applies to
any data set, not only normally distributed data sets.
His theorem states that the portion of any set of data within k
standard deviations of the mean is always at least 1-1/k2, where k
is any number greater than 1.
For any set of data (population or sample) and any constant k > 1,
the proportion of the data that must lie within k standard
deviations on either side of the mean is at least.
Chebyshev’s rule; for any data ser with mean, µ and standard
deviation
1. At least 75% of the observations are within 2 of its mean µ.
2. At least 88.9% of the 3 observations are within 3 its mean.
Emperical Rule
For a normally distributed data set with mean (µ) and standard
deviation (s) the empirical rule states that:
𝟏
𝟏− 𝒌𝟐
𝟏 1 1 3
For k = 2, then 𝟏 − 𝟐 = 1 − 2 = 1 − = , which is
𝒌 2 4 4
at least 75% of the data must always be within two
(2) standard deviations of the mean.
𝟏 1 1 8
For k = 3, then 𝟏 − =1 − =1 −
= which is ,
𝒌𝟐 32 9 9
at least 89% of the data must always be within three
(3) standard deviations of the mean.
Measure of Skewness
A measure of skewness describes the extent of deviation of the
data distribution from symmetry.
It is measured by the coefficient of skewness, denoted by SK.
Measured by coefficient of skewness, denoted by SK and is defined
as:
3 (𝑀𝑒𝑎𝑛 −𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝐾 =
𝜎
K = - 0.4
Therefore, the distribution of the ungrouped data is relatively flat K < 0.
Example:
Consider the ff. grouped data, determine the coefficient of Kurtosis.
σ𝑁 4
𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑋) = 1089648.70
Given the following quantities: N = 120; = 8
Then,
4
σ𝑘
𝑖=1 𝑓𝑖 (𝑥𝑖 −𝑋) 1089648.70
𝐾= −3= −3
𝑁𝜎4 120 8 4
= 2.21 − 3
K = - 0.79
Therefore, the distribution of the ungrouped data is relatively flat since K
< 0.
Comparison of Mean, Median, and Mode
A bell-shaped curve (symmetric or normal curve) is
generated when the mean, median and mode
coincide.
However, the mean, median, and mode are affected
by what is called skewness (i.e., lack of symmetry) in
the data.
Symmetric or Normal Curve
Figure above shows a normal curve, a negatively skewed curve, and
a positively skewed curve:
Take note that when a variable is normally distributed, the mean,
median, and mode are the same number.
When the variable is skewed to the left (i.e., negatively skewed),
the mean shifts to the left the most, the median shifts to the left the
second most, and the mode the least affected by the presence of
skew in the data.
An example of a negatively skewed graph would be the graphs of the
scores for a test that was too easy for the students.
Seat work:
The following entrance test of 100 freshmen students
in ABC College.
References:
1. Parreño, Elizabeth B., Jimenez, Ronel O., “Basic Statistics”, 2006
ed, C & E Publishing, ISBN 971-584-474-X.
2. Arao, Rosario A, et.al., “Statistics (based on CMO 03 Series 2007)”,1st
ed. Rex Book Store, ISBN 978-971-23-5682-1.
3. Weiers, Ronald M., “Introduction to Business Statistis”, 7th ed. ,
Philippine edition, 2014 Cengage Learning Asia Pte. Ltd., ISBN-13:
978-981-4624-14-5, ISBN-10: 981-4624-14-4
4. Moore, David S., et.al.,”The Practice of Statistics for Business and
Economics” 4th ed. , W.H. Freeman and Company A Macmillan
Education Imprint, © 2016, 2011, 2009, 2003 by W. H. Freeman and
Company ISBN-13: 978-1-4641-2564-5, ISBN-10: 1-4641-2564-3
5. Anderson, David R., et al., “STATISTICS FOR BUSINESS AND
ECONOMICS,” 11ed, © 2011, 2008 South-Western, Cengage Learning,
6. Wegner, Trevor, ”Applied Business Statistics Methods and Excel-based
Applications,” 3ed. First published 2013, © Juta and Company Ltd,
2013, ISBN: 978 0 7021 9709 3 (Web PDF)