Notes in Statistics
- A group of all individuals or entities (it can Classification of Variables
be people, animals, etc.) that we would like
to know something about Qualitative vs. Quantitative Variable
Census or complete enumeration is the process of Sampling units are nonoverlapping collections of
gathering information from every unit in the elements from the population that cover the entire
population. population.
- Not always possible to get timely, accurate
and economical data A sampling frame is a list of sampling units.
- Costly, if the number of units in the
population is too large A sample is a collection of sampling units drawn
from a sampling frame.
Survey sampling is the process of obtaining
information from the units in the selected sample. Parameter - numerical characteristic of a population
The target population is the population from which - Almost always, the sampling frame does not
information is desired. match up perfectly with the target
population, leading to errors of coverage.
The sampled population is the collection of
elements from which the sample is actually taken. Nonresponse is probably the most serious of these
The population frame is a listing of all the - Arises in three ways:
individual units in the population. - Inability of the person responding to come
up with the answer
An element (experimental unit) is an object - Refusal to answer
on which a measurement is taken. - Inability to contact the sampled elements
These errors can be classified as due to the Advantages
interviewer, respondent, instrument, or method - The theory involved is much easier to
of data collection. understand than the theory behind other
sampling designs
METHODS OF NON PROBABILITY - Inferential methods are simple and easy.
1. Purposive sampling – sets out to make sample - The sample chosen may be widely spread,
agree with the profile of the population based on thus entailing high transportation costs.
some preselected characteristic - A population frame, or list, is needed.
- Less precise estimates result if the
2. Quota sampling – selects a specifies number population is heterogeneous with respect to
(quota) of sampling units possessing certain the characteristic under study
3. Convenience, haphazard or accidental - In stratified random sampling, the population of N
sampling – selects sampling units that come to units is first divided into subpopulations called
hand or are convenient to get information from strata. Then a simple random sample is
drawn from each stratum, the selection being made
4. Judgment or expert sampling – selects sample independently in different strata
in accordance with an expert’s judgment with
known or demonstrable experience and expertise in
some area. Advantages
- Stratification may produce a gain in
5. Snowball sampling – the process starts by precision in the estimates of characteristics
identifying someone who meets the criteria for of the population
inclusion in the study. The respondent is then asked - It allows for more comprehensive data
to recommend others whom they may know who analysis since information is provided for
also meet the criteria. It is especially useful when each stratum
populations that are inaccessible or hard to find are - It is administratively convenient
the target population.
METHODS OF PROBABILITY SAMPLING - A listing of the population for each stratum
is needed
Description of the Design - The stratification of the population may
require additional prior information about
1. SIMPLE RANDOM SAMPLING- Simple the population and its strata.
random sampling is a method of selecting n units of
the N units in the population in such a way that 3. SYSTEMATIC SAMPLING
every distinct sample of size n has an equal chance - Systematic sampling with a “random start” is a
of being drawn. The process of selecting the sample method of selecting a sample by taking every kth
must give an equal chance of selection to any one of unit from an ordered population, the first unit being
the remaining elements in the population at any one selected at random. Here k is called the sampling
of the n draws. interval, the reciprocal 1/k is the sampling fraction.
5. MULTISTAGE SAMPLING A bar graph consists of bars of equal width either all
- In multistage sampling, the population is vertical or all horizontal. The length represents the
divided into a hierarchy of sampling units magnitude of the quantities being compared.
corresponding to the different sampling Vertical bars are generally used for chronological
stages. In the first stage of sampling, the comparison or comparing data taken at a particular
population is divided into primary stage time. Horizontal bars are used to show categorical
units (PSU) then a sample of PSUs is drawn. comparison.
❑ Simple bar graph - the bars stand singly apart ❑ It is always assumed that the values are evenly
from one another. distributed within the intervals. There are times
❑ Compound (Multiple bar graph) has two or, more when an interval has to be represented by a single
bars are drawn for each item The component bar value. This single value that serves as the
graph is used to show proportional variation or representative of the given class interval or
changes of the segments of a whole and the whole class boundaries is called class midpoint or class
itself. mark. The class midpoint is obtained by adding the
lower and upper class limits (or class boundaries)
❑ The line graph is an effective device used to and then divide the sum by 2. Thus, if we let X be
portray changes in values with respect to time. the class midpoint and LL, and UL, as the lower and
Variations in the data are indicated by a series of upper class limits of a particular class interval, then
line segments formed by joining consecutive points the ith class mark (CM) is where i = 1, 2, 3, ...,
plotted above the categories. k where k = number of classes
❑ The pie chart or circle graph is a appropriate for
Making Your Frequency Distribution
portraying the relative magnitude of the component
parts of a whole.
R= range
❑ Pictograph uses picture symbols to represent K= number of classes
values. The symbols used should appropriate to the C= Class size
to the data being represented.
Step 1: Calculate the range, k= number of classes,
Frequency Distribution Table (FDT) and class size of the data set
An array is an arrangement of data in ascending R= HL-LL (Higher Limit – Lower Limit)
or descending order of magnitude. This is usually K= 1+ 3.322logN (N represents the number of
used for small numbers of observations. classes)
C = R/K
A frequency distribution table is a systematic
arrangement of data that consists of reducing the Step 3: Use the class width to create your groups
data to forms that are manageable without losing
informative details. It is tabular presentation of Like 12-21… 21-30… 31-40
qualitative data grouped into categories (Table 1) or
quantitative data grouped into non-overlapping Step 4: Find the frequency for each group
numerical intervals called classes (Table 2) together
with the number of observations in each category or
class. Step 5: Compute the CM (Class mark), Class
Components of a Quantitative Frequency
Distribution Table Add the two class interval and divide by 2
Mean = 17
Median = 16
Mode = 15
Statistical Measures
Measures of Variability
Most Commonly Used Measures of Central Standard Score
Tendenc Skewness
The most commonly used measures of position or Kurtosis
central tendency or average are the ARITHMETIC Boxplot
MEDIAN, and MODE. These are characteristics of
a distribution or an array of numbers that express
Measures of relative dispersion are unitless and are of a distribution. In indicates not only the amount of
used when one wishes to compare the scatter of one skewness but also the direction.
distribution with another distribution Recall: Types of distribution (symmetric, skewed to
the right, skewed to the left)
Definition: A measure of skewness shows the
degree of asymmetry, or departure from symmetry