Applied Statistics
Applied Statistics
Applied Statistics
The word Statistics is derived from the Latin word status meaning “state”.
Early uses of statistics involved compilation of data and graphs describing various aspects of the state or
country.
Statistics – we sometimes use this word when referring to actual numbers derived from data, and the
other refers to statistics as a method of analysis
Statistics – a science that deals with the collection, presentation, analysis, and interpretation of data
Various jobs require aptitude in statistics, and you must be knowledgeable in them, because you are
expected to say something about data. You may also be asked to do research in your field. You can also
use the knowledge gained from studying statistics to become better consumers and citizens.
Data – the values (measurements or observations) that the variables can assume.
Descriptive statistics – consists of the collection, organization, summarization, and presentation of data.
The statistician tries to describe a situation; national census
Inferential statistics – generalizing from samples to populations, performing estimations and hypothesis
tests, determining relationships among variables, making predictions; uses probability
Descriptive or Inferential?
D–
Data summary; bar graphs, histograms, pie charts, shape of graph and skewness
Inferential
Uses probability to determine how confident we can be that the conclusions we make are correct
(confidence intervals, margins of error)
Average – also called mean, a number that describes the central tendency of the data
Categorical variable – variables that take on values that are names or labels
Qualitative variables – variables that can be placed into categories, according to some characteristic or
attribute: religious preference, geographic locations, gender, year level, subjects enrolled, student
number
Independent variable: academic ability, study habits, economic status, teacher, food; dependent
variable: academic achievement
Independent variable: traditional feeds, commercial feeds, temperature/season/environment;
dependent variable: hogs and poultry
Nominal – aka categorical; the categories of a qualitative variable are unordered; the categories are
merely names; used when we want to distinguish one object from another for identification purposes:
gender, nationality, civil status, names, colors, labels, sex, preferred chocolate; numbers labeled do not
imply order; can be displayed as a pie chart, column or bar chart, or stacked column or bar chart
Labeled
Ordinal – can be qualitative/quantitative; the categories can be put in order; we can say that one is
greater than the other, but we cannot tell how much one has over the other/differences cannot be
measured: ranking of contestants in a beauty contest; of honor students; of siblings in a family;
satisfaction; fanciness; can be displayed as column or bar chart
Interval – Order matters, no true zero point; if one can compare the differences between measurements
of the variable meaningfully, but not the ratio of the measurements; we can say that not only is one
over the other, we can also specify the amount of difference: scores, temperature in Celsius, F, K
Ratio – order matters, if one can compare the differences between measurements of the variable
meaningfully and the ratio; like the interval scale, but it always starts from the absolute or true zero
point: height, weight, area, test score
Interval/ratio can be discrete, with whole numbers, or continuous with fractional numbers
Interval/ratio best represented as bar chart or histogram, box plots, line chart (for data that occurs over
time)
M1 Study Guide
*Quantitative vs Qualitative
*Levels of measurements
Data Collection
Data – needed whenever we undertake studies or researches; used to solve particular problems or to
provide a basis from which certain decisions are generated
Types of data
Primary data – information collected from an original source, which is first-hand in nature: interviews,
surveys
Secondary data – information collected from published or unpublished sources like books, newspapers,
and theses
Direct/interview method – researcher obtains information needed by asking questions and inquiries
from the interviewee; gives precise and consistent information because clarifications can be made; time
consuming, expensive, and has limited field coverage, personal bias; high degree of originality, reliability
Indirect personal interview – interviewing 3 rd person who are directly/indirectly concerned with
the subject matter of enquiry: witnesses, informants
Telephone interview – questions asked through phone; random telephone calls to prospective
informants; cheap, can reach customers all over geographical area; impossible to employ visual aid
(overcome by video conferencing), population not having telephone connection excluded
Registration method – governed by laws: birth & death rates; registered cars; registered voters; involves
pre-collected and organized data from agencies or organization
Experimental method – used to find our cause and effect relationships; often used by scientific
researchers: agriculturists would like to know the effect of a new brand of fertilizer on the growth of
plants
Sampling Techniques
Entire population is seldom used in research because of the cost and time involved.
Sample, a small representative of a population, is used instead
The characteristics of the whole or entire population is described using the characteristics observed
from the sample
Slovin’s formula is used to determine the sample size from a given population size
n = N / (1 + Ne^2)
N = population size
e = margin of error
A group of researchers will conduct a survey to find out the opinion of residents of a particular
community regarding the oil price hike. If there are 11,060 residents in the community and the
researchers plan to use a sample using a 6% margin of error, what should the sample size be? – 271
N = 16825, e = 8% -- 155
N = 1016, e = 8% - 135
N = 12028, e = 7% - 201
N = 48089, e = 1% - 8279
Sampling Techniques
To obtain samples that are unbiased (that give each subject in the population an equally likely chance of
being selected) statisticians use four basic methods of probability or random sampling: random,
systematic, stratified, and cluster sampling
Simple random sampling – subjects are selected by random numbers; number each subject in the
population then draw lots; generate numbers; works for small geographic area, unbiased; homogeneous
sample
students are selected using random numbers to partake in a survey
Stratified sampling – dividing the population into groups (strata) according to some characteristic that is
important to the study (same characteristic), then sampling from each group; samples within each
stratum should be randomly selected
Systematic sampling – numbering each subject of the population and then selecting every kth subject;
must be careful about how the subjects in the population are numbered/samples must not be
patterned; a researcher may select every kth item; used when the targets are moving, i.e. respondents
needed are building visitors in a given day
every 100th hamburger manufactured is checked to determine its fat content
Cluster sampling – population is divided into groups called clusters by some means, then the researcher
randomly selects some of these clusters and uses all members of the selected clusters as the subjects of
the samples; used when the population is large or when it involves subjects residing in a large
geographic area; instead of selecting a random sample of patients, select a few hospitals at random
instead
Convenience sampling – convenient sample; do what is easy; kind of biased/self-selection bias when ppl
selected are already interested in the study
Observational study – researcher merely observes that is happening or what has happened in the past
and tries to draw conclusions based on these observations
Experimental study – researcher manipulates one of the variables and tries to determine how the
manipulation influences other variables