Notes in Statistics

Statistics General Uses of statistics

STATISTICS is defined as a branch of mathematics - It aids in decision making

or science that deals with the collection,
- Provides comparison
organization, presentation, analysis and
interpretation of numerical data. - Explains an action that has taken place
- Justifies a claim or assertion
Statistics changes numbers/data into information; - Predicts future outcome
from information to knowledge; and from - Estimates an unknown quantity
knowledge to insight.
Fields of Statistics
Statistics is the art and science of deciding:
A. Statistical Methods of Applied
- what are the appropriate data to collect Statistics – refer to procedures and techniques used
- deciding how to collect data efficiently in the collection, presentation, analysis, and
- using data to give information, interpretation of data
- using data to answer questions,
- using data to make decisions. 1. Descriptive Statistics
– comprise those methods concerned with the
Statistics are data obtained by collecting, collection, description, and analysis of a set of data
processing, compiling, analyzing, publishing and without drawing conclusions or inferences about a
disseminating results, gathered from respondents larger set.
through statistical collections or from administrative – the main concern is simply to describe the set of
data. data such that otherwise vague information is
brought out clearly
Statistics is a mathematical science pertaining to the – conclusions apply only to the data on hand
- collection,
- organization 2. Inferential Statistics
- presentation, – comprise those methods concerned with making
- analysis, and predictions or inferences about a larger set of data
- interpretation or explanation of data. (population) using only the information gathered
from a subset (sample) of the larger set
Development of Statistics – main concern is not merely to describe but
actually predict and make inferences based on the
- The word statistics is believed to have been information gathered
derived from the word “ratio status”
meaning the study of practical politics. The Descriptive Statistics Inferential Statistics
administration of states required the In social research, socio- A research intends to
collection and analysis of data of population demographic establish relationship
and wealth for the purpose of war and characteristics of between poverty level
finance. respondents are and socio-economic
summarized and status of respondents
Uses of statistics presented in tables and
Statistics is a discipline which was developed to A researcher measures A researcher wants to
extract relevant facts from a large body of the total gain in weight compare the gain in
information and to help people make decisions and mortality of fish weight and mortality of
when uncertainty exists concerning the information. cultured in hatchery fish cultured in aquaria
using two feed
Descriptive Statistics Experimental unit/case – the individual or object
- Descriptive statistics is the term given to the on which a variable is measured
analysis of data that helps describe, show or
summarize data in a meaningful way, that is, Data – are actual values or labels of a variable.
it allows simpler interpretation of data They may be numbers or they may be words.
- When we use descriptive statistics it is Datum is a single value.
useful to summarize our group of data using
a combination of tabulated description (i.e., Example:
tables), graphical description (i.e., graphs - Observing the color of car entering a parking lot
and charts) and statistical discussion of the - Variable – color (Note: Color is the characteristic)
results by using descriptive numerical - Experimental Unit car (Note: Car is the object on
measures such as average, range, etc. which color is to be observed)

Inferential Statistics Measuring the IQ of school children in the province

of Albay
- Inferential statistics are techniques that Population: All school children in the province of
allow us to use subset of a larger set of data Albay
to make conclusions about said larger set of - Variable – IQ
data - Experimental Unit – school child
- Its applications are indicated in the use of
statistics as an aid in decision making in the Measuring the weights of college students in
face of uncertainty. University X
- Estimate, compare, determine a data Population: All college students in University X
Experimental Unit/case: college student
Population and Parameter Variable: weight

- A group of all individuals or entities (it can Classification of Variables
be people, animals, etc.) that we would like
to know something about Qualitative vs. Quantitative Variable

Parameter  Qualitative variable – a variable that yields

- A numerical characteristic of the population categorical responses
in which we have a particular interest Example:
- Often denoted with Greek letters (e.g., μ, σ, Courses offered in BUTC (BSFT, BSN, BSSW,
σ2, P) BSFi, BSEd, BS Entrep)
Examples: Religion (Protestant, Catholic, INC, etc.)
- The population proportion (P) that would Type of fishing gear
respond to a certain drug
- The population mean (μ) height of college  Quantitative variable – a variable that takes
students on numerical values representing an amount
or quantity
Some basic terms Example:
- Score in an exam
Variable – a characteristic or attribute of persons or - pH level
objects which can assume different values or labels - Protein content
for different persons or objects under consideration

Measurement – a process of determining the value

or label of a particular variable for a particular
experimental unit
Discrete vs. Continuous Variable Example:
- Performance rating (1-Poor, 2-Fair, 3-Good,
Continuous variable – a variable which can assume 4- Excellent)
the infinitely many values corresponding to a - Size of T-shirt (Small, Medium, Large, XL)
number line. - Adverse events (Mild, Moderate, Severe,
Example: Life-threatening, Death)
- Time - Income (Low, Medium, High)
- Weight
- Length Interval Level
- Body mass index - The interval level is that which has the
properties of the nominal and ordinal levels,
Discrete variable – a variable which can assume a and in addition, the distances between any
finite or at most countably infinite number of two numbers on the scale are of known
values, usually measured by counting or sizes. An interval scale must have a common
enumeration and constant unit of measurement. Addition
and subtraction, but not multiplication and
Example: division are meaningful operations.
- Number of siblings - The unit of measurement is arbitrary and
- Household size there is no “true zero” point.
- Number of COVID-19 patients
Levels or Scales of Measurement - IQ
- Temperature (in Celsius or Fahrenheit)
Nominal Level (or Classificatory Scale) - pH level
- The nominal level is the weakest level of
measurement where numbers or symbols are Ratio Level
used simply for categorizing subjects into - The ratio level of measurement contains all
different mutually exclusive unordered the properties of the interval level, and in
categories or groups. addition, it has a “true zero” point. Addition,
- Categorical or nominal variables with no subtraction, multiplication, and division are
inherent order or ranking sequence such as all meaningful operations.
names or classes (e.g., gender, marital - Variables whose values represent counts or
status, religion, racial color). Value may be a amounts are measured in ratio level
numerical, but without numerical value Example:
(e.g., I, II, III). The only operation that can - Age (in years)
be applied to Nominal variables is - Number of siblings
enumeration. - Volume (in cu. cm)
- Temperature (in Kelvin scale)
Ordinal Level (or Ranking Scale)
Classification of Data
- The ordinal level of measurement contains
the properties of the nominal level, and in 1. Primary vs. Secondary
addition the numbers assigned to categories
of any variable may be ranked or ordered in a. Primary source – data measured by the
some low- to-high manner. researcher/agency that published it
- Variables with an inherent rank or order, e.g. Example: Data collected by the researcher himself
mild, moderate, severe. Can be compared from the experiment or survey he conducted
for equality, or greater or less, but not how b. Secondary source – any republication of
much greater or less. data by another agency
- The publications of the National Statistics
Office are primary sources and all Advantages over Survey Method:
subsequent publications of other agencies - Does not rely on the respondent’s
are secondary sources. willingness to provide the desired data
- Often data is picked from reports and - Certain types of data can be collected only
publications of researchers, institutions and by observation (e.g. behavior patterns of
organizations. Such data is referred to as which the subject is not aware of or is
secondary. ashamed to admit)
- The potential bias caused by the
External vs. Internal interviewing process is reduced or
- Internal data – information that relates to
the operations and functions of the Disadvantages over Survey Method
organization collecting the data
- Secondary source – information that relates - Things such as, awareness, beliefs, feelings
to some activity outside the organization and preferences cannot be observed
collecting the data - The observed behavior patterns can be rare
or too unpredictable thus increasing the data
Example: The sales data of SM is internal data for collection costs and time requirements
SM but external data for any other organization
such as Robinson’s. 3. Experimental method – a method designed
for collecting data under controlled
COLLECTION OF DATA conditions. An experiment is an operation
where there is actual human interference
with the conditions that can affect the
variable under study. This is an excellent
method of collecting data for causation
studies. If properly designed and executed,
experiments will reveal with good deal of
accuracy, the effect of a change in one
variable on another variable.

4. Use of existing studies - e.g., census, health

statistics, weather bureau reports, etc.
(secondary source of data)
Data Collection Methods Two types:

1. Survey Method - questions are asked to - Documentary sources – published or written

obtain information, either through self- reports, periodicals, unpublished documents,
administered questionnaire (indirect etc.
method) or personal interview (direct - Fields sources – researchers who have done
method). studies on the area of interest are asked
2. Observation Method – makes possible the personally or directly for information
recording of behavior but only at the time of needed
occurrence (e.g., observing reactions to a
particular stimulus, traffic count) 5. Registration method – e.g., car
registration, student, registration, hospital
admission, etc.
GENERAL CLASSIFICATION OF A population is a collection of elements
COLLECTING DATA about which we wish to make an inference.

Census or complete enumeration is the process of Sampling units are nonoverlapping collections of
gathering information from every unit in the elements from the population that cover the entire
population. population.
- Not always possible to get timely, accurate
and economical data A sampling frame is a list of sampling units.
- Costly, if the number of units in the
population is too large A sample is a collection of sampling units drawn
from a sampling frame.
Survey sampling is the process of obtaining
information from the units in the selected sample. Parameter - numerical characteristic of a population

Advantages of Survey Sampling: Statistic - numerical characteristic of a sample

- Reduced cost
- Greater scope VARIABILITY
- Greater accuracy
Factors can cause variability:
- The population of interest is usually too Sampling error: the difference between a sample
large to attempt to survey all of its members. statistic and its population parameter.
- A carefully chosen sample can be used to - Random sampling allows us to estimate the
represent the population. typical size of the sampling error.
- The sample reflects the characteristics of the
population from which it is drawn. Non-sampling error: comes from other sources, can
be systematically biased, and is difficult to estimate.
PROBABILITY AND NON-PROBABILITY - Examples of non-sampling error include
SAMPLING undercoverage, nonresponse, question
wording (e.g., response bias), question
A sampling procedure that gives every element of order.
the population a (known) nonzero chance of being
selected in the sample is called probability ERRORS OF NONOBSERVATION
sampling. Otherwise, the sampling procedure is
called non- probability sampling. - The deviation between an estimate from an
ideal sample and the true population value is
Terminologies the sampling error.

The target population is the population from which - Almost always, the sampling frame does not
information is desired. match up perfectly with the target
population, leading to errors of coverage.
The sampled population is the collection of
elements from which the sample is actually taken. Nonresponse is probably the most serious of these
The population frame is a listing of all the - Arises in three ways:
individual units in the population. - Inability of the person responding to come
up with the answer
An element (experimental unit) is an object - Refusal to answer
on which a measurement is taken. - Inability to contact the sampled elements
These errors can be classified as due to the Advantages
interviewer, respondent, instrument, or method - The theory involved is much easier to
of data collection. understand than the theory behind other
sampling designs
METHODS OF NON PROBABILITY - Inferential methods are simple and easy.
1. Purposive sampling – sets out to make sample - The sample chosen may be widely spread,
agree with the profile of the population based on thus entailing high transportation costs.
some preselected characteristic - A population frame, or list, is needed.
- Less precise estimates result if the
2. Quota sampling – selects a specifies number population is heterogeneous with respect to
(quota) of sampling units possessing certain the characteristic under study
3. Convenience, haphazard or accidental - In stratified random sampling, the population of N
sampling – selects sampling units that come to units is first divided into subpopulations called
hand or are convenient to get information from strata. Then a simple random sample is
drawn from each stratum, the selection being made
4. Judgment or expert sampling – selects sample independently in different strata
in accordance with an expert’s judgment with
known or demonstrable experience and expertise in
some area. Advantages
- Stratification may produce a gain in
5. Snowball sampling – the process starts by precision in the estimates of characteristics
identifying someone who meets the criteria for of the population
inclusion in the study. The respondent is then asked - It allows for more comprehensive data
to recommend others whom they may know who analysis since information is provided for
also meet the criteria. It is especially useful when each stratum
populations that are inaccessible or hard to find are - It is administratively convenient
the target population.
METHODS OF PROBABILITY SAMPLING - A listing of the population for each stratum
is needed
Description of the Design - The stratification of the population may
require additional prior information about
1. SIMPLE RANDOM SAMPLING- Simple the population and its strata.
random sampling is a method of selecting n units of
the N units in the population in such a way that 3. SYSTEMATIC SAMPLING
every distinct sample of size n has an equal chance - Systematic sampling with a “random start” is a
of being drawn. The process of selecting the sample method of selecting a sample by taking every kth
must give an equal chance of selection to any one of unit from an ordered population, the first unit being
the remaining elements in the population at any one selected at random. Here k is called the sampling
of the n draws. interval, the reciprocal 1/k is the sampling fraction.

Random sampling may be with replacement

(SRSWR) or without replacement (SRSWOR. In
SRSWR, a chosen element is always replaced the
next selection s made, so that an element may be
chosen more than once
Advantages - In the second stage of sampling, each
- It is easier draw the sample and often easier selected PSU is subdivided into second-
to execute without mistakes than simple stage units (SSU) then a sample of SSUs is
random sampling drawn. The process of subsampling can be
- It is possible to select a sample in the field carried to a third stage, fourth stage and so
without a sampling frame on, by sampling the subunits instead of
- The systematic sample is spread more enumerating them completely at each stage.
evenly over the population Advantages
- Listing cost is reduced.
Disadvantages - Transportation cost is reduced.
- If periodic regularities are found in the list, a
systematic sample may consist only of Disadvantages
similar type. (Example: Store sales over - Estimation procedure is difficult, especially when
seven days of the week – estimating total the primary stage units are not of the same size.
sales based on a systematic sample every - Estimation procedure gets more complicated as
Tuesday would be unwise) the number of sampling stages increases
- Knowledge of the structure of the population - The sampling procedure entails much planning
is necessary for its most effective use. before selection is done.

4. CLUSTER SAMPLING Methods of Presenting Data

- Cluster sampling is a method of sampling where a
sample of distinct groups, or clusters, of elements is Constructing or arranging data in an array may be
selected and then a census of every element in the cumbersome and not practical in dealing with a
selected clusters is taken. much larger set of data. Hence, we may apply other
methods of presenting data in a more concise and
- Similar to strata in stratified sampling, informative manner. These methods are the
clusters are non-overlapping sub- following:
populations which together comprise the
entire population. For example, a household 1. Textual method
is a cluster of individuals living together or a - summarizes the data in paragraph or narrative
city block might also be considered as a form.
2. Tabular method
Advantages - summarizes data in a systematic and logical
- A population list of elements is not needed; arrangement into rows and columns called statistical
only a population list of cluster is required. table
Listing cost is reduced.
- Transportation cost is reduced. 3. Graphical method
- refers to the pictorial representation of data
Disadvantages through the use of graphs or charts. Graphs or charts
- The costs and problems of statistical are nothing else but illustrations of numerical data.
analysis are greater
- Estimation procedures are more difficult. Types of Graphs

5. MULTISTAGE SAMPLING A bar graph consists of bars of equal width either all
- In multistage sampling, the population is vertical or all horizontal. The length represents the
divided into a hierarchy of sampling units magnitude of the quantities being compared.
corresponding to the different sampling Vertical bars are generally used for chronological
stages. In the first stage of sampling, the comparison or comparing data taken at a particular
population is divided into primary stage time. Horizontal bars are used to show categorical
units (PSU) then a sample of PSUs is drawn. comparison.
❑ Simple bar graph - the bars stand singly apart ❑ It is always assumed that the values are evenly
from one another. distributed within the intervals. There are times
❑ Compound (Multiple bar graph) has two or, more when an interval has to be represented by a single
bars are drawn for each item The component bar value. This single value that serves as the
graph is used to show proportional variation or representative of the given class interval or
changes of the segments of a whole and the whole class boundaries is called class midpoint or class
itself. mark. The class midpoint is obtained by adding the
lower and upper class limits (or class boundaries)
❑ The line graph is an effective device used to and then divide the sum by 2. Thus, if we let X be
portray changes in values with respect to time. the class midpoint and LL, and UL, as the lower and
Variations in the data are indicated by a series of upper class limits of a particular class interval, then
line segments formed by joining consecutive points the ith class mark (CM) is where i = 1, 2, 3, ...,
plotted above the categories. k where k = number of classes
❑ The pie chart or circle graph is a appropriate for
Making Your Frequency Distribution
portraying the relative magnitude of the component
parts of a whole.
R= range
❑ Pictograph uses picture symbols to represent K= number of classes
values. The symbols used should appropriate to the C= Class size
to the data being represented.
Step 1: Calculate the range, k= number of classes,
Frequency Distribution Table (FDT) and class size of the data set
An array is an arrangement of data in ascending R= HL-LL (Higher Limit – Lower Limit)
or descending order of magnitude. This is usually K= 1+ 3.322logN (N represents the number of
used for small numbers of observations. classes)
C = R/K
A frequency distribution table is a systematic
arrangement of data that consists of reducing the Step 3: Use the class width to create your groups
data to forms that are manageable without losing
informative details. It is tabular presentation of Like 12-21… 21-30… 31-40
qualitative data grouped into categories (Table 1) or
quantitative data grouped into non-overlapping Step 4: Find the frequency for each group
numerical intervals called classes (Table 2) together
with the number of observations in each category or
class. Step 5: Compute the CM (Class mark), Class
Components of a Quantitative Frequency
Distribution Table Add the two class interval and divide by 2

❑ Consider the frequency the FDT in Table 2. The CM= LL + HL/2

grouping in terms of the numerical intervals are CB= Subtract the lower limit by .5 and add the
called classes or class intervals. The interval 10-14 higher limit by .5 example:
is the lowest class interval and 40-44 is the highest
class. 12-.5+= 11.5
❑ Class limits refer to the lowest and highest value 21+ .5 = 21.5
that can be entered in a class.
❑ The lowest value along with the highest value in Step 6: to get the Relative Frequency
each class is known as the lower class limit (LL) RF(%) = f/N x 100
and upper class limit (UL).
the center or the typical value of individual
Also, less than and greater than Cumulative characteristics.
Frequency (CF). (<CF) (>CF)
Consider the ff. set of five observations: 15, 15, 16,

The ARITHMETIC MEAN is the sum of all items

or terms divided by the total number of items.
The MEDIAN is the value of the middle item after
arranging the data in an ascending or descending
order of magnitude.
The MODE is defined as the value of the term that
appears most frequently. Hence,

Mean = 17
Median = 16
Mode = 15

Statistical Measures
Measures of Variability
Most Commonly Used Measures of Central Standard Score
Tendenc Skewness
The most commonly used measures of position or Kurtosis
central tendency or average are the ARITHMETIC Boxplot
MEDIAN, and MODE. These are characteristics of
a distribution or an array of numbers that express
Measures of relative dispersion are unitless and are of a distribution. In indicates not only the amount of
used when one wishes to compare the scatter of one skewness but also the direction.
distribution with another distribution Recall: Types of distribution (symmetric, skewed to
the right, skewed to the left)


1. Robert got a grade of 75 in Stat. 101 and a grade

of 90 in Major 1. The mean grade in Stat 101 is 70
and the standard deviation is 10, whereas in Major Normal Distribution
1, the mean grade is 80 and the standard deviation is
20. Relative to the other students, where did he Normal/Gaussian Distribution is a bell-
perform better? shaped graph which encompasses two basic
terms- mean and standard deviation. It is a
symmetrical arrangement of a data set in which
most values cluster in the mean and
the rest taper off symmetrically towards either
extreme. Numerous genetic and environmental
factors influence the trait.

Definition: A measure of skewness shows the
degree of asymmetry, or departure from symmetry

