Applied Statistics

Introduction to Statistics
Basic Concepts and Definitions
The word Statistics is derived from the Latin word status meaning “state”.
Early uses of statistics involved compilation of data and graphs describing various aspects of the state or
country.
Statistics – we sometimes use this word when referring to actual numbers derived from data, and the
other refers to statistics as a method of analysis
Statistics – a science that deals with the collection, presentation, analysis, and interpretation of data
The four pillars of statistics
Collection – gathering of information or data
Organization or presentation – summarizing data in textual, graphical or tabular form
Analysis – describing the data by statistical methods or procedures
Interpretation – the process of making conclusions based on the analyzed data
Results from analysis will be used to interpret the data collected
Why study statistics?
Various jobs require aptitude in statistics, and you must be knowledgeable in them, because you are
expected to say something about data. You may also be asked to do research in your field. You can also
use the knowledge gained from studying statistics to become better consumers and citizens.
Objects in the Study of statistics
Variable – a characteristics or attribute that can assume different values
Data – the values (measurements or observations) that the variables can assume.
Random variables – variables whose values are determined by chance
Data set – a collection of data values
Data value or datum – Each value in a data set is called
Descriptive statistics – consists of the collection, organization, summarization, and presentation of data.
The statistician tries to describe a situation; national census
Inferential statistics – generalizing from samples to populations, performing estimations and hypothesis
tests, determining relationships among variables, making predictions; uses probability
Population – consists of all subjects that are being studied
Sample – a group of subjects selected from a population

Parameter – numerical summary or any measurement coming from a population
Statistic – a measurement from a sample
Descriptive or Inferential?
D–
Organizing and summarizing data using numbers and graphs
Data summary; bar graphs, histograms, pie charts, shape of graph and skewness
Measures of central tendency: mean, median, mode
Measures of variability: range, variance, standard deviation
Inferential
Using sample data to make an inference or draw a conclusion of the population
Uses probability to determine how confident we can be that the conclusions we make are correct
(confidence intervals, margins of error)
Increasing sample size will increate confidence interval
Average – also called mean, a number that describes the central tendency of the data
Categorical variable – variables that take on values that are names or labels
Numerical variable – variables
Data and Variables
Observation – each thing we collect data about
Variables – record the measurements we are interested in
Variables can be classified as qualitative or quantitative
Qualitative variables – variables that can be placed into categories, according to some characteristic or
attribute: religious preference, geographic locations, gender, year level, subjects enrolled, student
number
Quantitative variables – numerical in nature, obtained from counting or measuring. Meaningful

arithmetic operations can be done with these kinds of data. Age, weight, height
Variables can also be classified as dependent or independent
Dependent variable – variable affected or influenced by another variable
Independent variable – one that affects, or influences another variable
Independent variable: academic ability, study habits, economic status, teacher, food; dependent
variable: academic achievement
Independent variable: traditional feeds, commercial feeds, temperature/season/environment;
dependent variable: hogs and poultry
Independent variable: political party, economic status, attitude, connections/power; dependent

variable: politics
Independent variable: government officials, world market, attitude/sense of responsibility of people;

dependent variable: economy
Four levels/scales of measurements
Nominal – aka categorical; the categories of a qualitative variable are unordered; the categories are
merely names; used when we want to distinguish one object from another for identification purposes:
gender, nationality, civil status, names, colors, labels, sex, preferred chocolate; numbers labeled do not
imply order; can be displayed as a pie chart, column or bar chart, or stacked column or bar chart
Column chart is best to use for single set of nominal data
Labeled
Ordinal – can be qualitative/quantitative; the categories can be put in order; we can say that one is
greater than the other, but we cannot tell how much one has over the other/differences cannot be
measured: ranking of contestants in a beauty contest; of honor students; of siblings in a family;
satisfaction; fanciness; can be displayed as column or bar chart
Labeled, meaningful order
Interval – Order matters, no true zero point; if one can compare the differences between measurements
of the variable meaningfully, but not the ratio of the measurements; we can say that not only is one
over the other, we can also specify the amount of difference: scores, temperature in Celsius, F, K
Labeled, meaningful order, measurable difference
Ratio – order matters, if one can compare the differences between measurements of the variable
meaningfully and the ratio; like the interval scale, but it always starts from the absolute or true zero
point: height, weight, area, test score
Labeled, meaningful order, measurable difference, true zero point
Interval/ratio can be discrete, with whole numbers, or continuous with fractional numbers
Interval/ratio best represented as bar chart or histogram, box plots, line chart (for data that occurs over
time)
M1 Study Guide
1. Basic Terminologies in Statistics

*Population vs sample
*Parameter vs statistics
*Descriptive vs inferential statistics
2. Data and variables

3. Classification of variables
*Quantitative vs Qualitative
*Levels of measurements
Data Collection
Data – needed whenever we undertake studies or researches; used to solve particular problems or to
provide a basis from which certain decisions are generated
Types of data
Primary data – information collected from an original source, which is first-hand in nature: interviews,
surveys
Secondary data – information collected from published or unpublished sources like books, newspapers,
and theses
Methods of collecting primary data
Direct/interview method – researcher obtains information needed by asking questions and inquiries
from the interviewee; gives precise and consistent information because clarifications can be made; time
consuming, expensive, and has limited field coverage, personal bias; high degree of originality, reliability
Indirect personal interview – interviewing 3 rd person who are directly/indirectly concerned with
the subject matter of enquiry: witnesses, informants
Telephone interview – questions asked through phone; random telephone calls to prospective
informants; cheap, can reach customers all over geographical area; impossible to employ visual aid
(overcome by video conferencing), population not having telephone connection excluded
Indirect/questionnaire method – makes use of a written questionnaire; researcher distributes the

questionnaire to the respondents either by personal delivery or by mail; saves time and money because
questionnaires can be given to a large number of respondents at the same time, sufficient time to
respondents to answer; however, some respondents simply ignore the questionnaires; clarification
cannot be made if the respondent does not understand the question; success depends on how
questionnaire is drafted; low response rate; unreliable; limited accuracy
Registration method – governed by laws: birth & death rates; registered cars; registered voters; involves
pre-collected and organized data from agencies or organization
Experimental method – used to find our cause and effect relationships; often used by scientific
researchers: agriculturists would like to know the effect of a new brand of fertilizer on the growth of
plants
Sampling Techniques
Entire population is seldom used in research because of the cost and time involved.
Sample, a small representative of a population, is used instead
The characteristics of the whole or entire population is described using the characteristics observed
from the sample
Slovin’s formula is used to determine the sample size from a given population size
n = N / (1 + Ne^2)
where n = sample size
N = population size
e = margin of error
A group of researchers will conduct a survey to find out the opinion of residents of a particular
community regarding the oil price hike. If there are 11,060 residents in the community and the
researchers plan to use a sample using a 6% margin of error, what should the sample size be? – 271
N = 16825, e = 8% -- 155
N = 1016, e = 8% - 135
N = 12028, e = 7% - 201
N = 48089, e = 1% - 8279
Sampling Techniques
To obtain samples that are unbiased (that give each subject in the population an equally likely chance of
being selected) statisticians use four basic methods of probability or random sampling: random,
systematic, stratified, and cluster sampling
Simple random sampling – subjects are selected by random numbers; number each subject in the
population then draw lots; generate numbers; works for small geographic area, unbiased; homogeneous
sample
students are selected using random numbers to partake in a survey
Stratified sampling – dividing the population into groups (strata) according to some characteristic that is
important to the study (same characteristic), then sampling from each group; samples within each
stratum should be randomly selected
Systematic sampling – numbering each subject of the population and then selecting every kth subject;
must be careful about how the subjects in the population are numbered/samples must not be
patterned; a researcher may select every kth item; used when the targets are moving, i.e. respondents
needed are building visitors in a given day
every 100th hamburger manufactured is checked to determine its fat content
Cluster sampling – population is divided into groups called clusters by some means, then the researcher
randomly selects some of these clusters and uses all members of the selected clusters as the subjects of
the samples; used when the population is large or when it involves subjects residing in a large
geographic area; instead of selecting a random sample of patients, select a few hospitals at random
instead
Convenience sampling – convenient sample; do what is easy; kind of biased/self-selection bias when ppl
selected are already interested in the study
Ways to classify statistical studies
Observational study – researcher merely observes that is happening or what has happened in the past
and tries to draw conclusions based on these observations
Experimental study – researcher manipulates one of the variables and tries to determine how the
manipulation influences other variables

Applied Statistics

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Applied Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Statistics

Uploaded by

Copyright:

Available Formats

Introduction to Statistics

Basic Concepts and Definitions

The four pillars of statistics

Collection – gathering of information or data

Organization or presentation – summarizing data in textual, graphical or tabular form

Analysis – describing the data by statistical methods or procedures

Interpretation – the process of making conclusions based on the analyzed data

Results from analysis will be used to interpret the data collected

Why study statistics?

Objects in the Study of statistics

Variable – a characteristics or attribute that can assume different values

Random variables – variables whose values are determined by chance

Data set – a collection of data values

Data value or datum – Each value in a data set is called

Population – consists of all subjects that are being studied

Sample – a group of subjects selected from a population

Statistic – a measurement from a sample

Organizing and summarizing data using numbers and graphs

Measures of central tendency: mean, median, mode

Measures of variability: range, variance, standard deviation

Using sample data to make an inference or draw a conclusion of the population

Increasing sample size will increate confidence interval

Numerical variable – variables

Data and Variables

Observation – each thing we collect data about

Variables – record the measurements we are interested in

Variables can be classified as qualitative or quantitative

Quantitative variables – numerical in nature, obtained from counting or measuring. Meaningful

Variables can also be classified as dependent or independent

Dependent variable – variable affected or influenced by another variable

Independent variable – one that affects, or influences another variable

Independent variable: political party, economic status, attitude, connections/power; dependent

Independent variable: government officials, world market, attitude/sense of responsibility of people;

Four levels/scales of measurements

Column chart is best to use for single set of nominal data

Labeled, meaningful order

Labeled, meaningful order, measurable difference

Labeled, meaningful order, measurable difference, true zero point

1. Basic Terminologies in Statistics

2. Data and variables

Methods of collecting primary data

Indirect/questionnaire method – makes use of a written questionnaire; researcher distributes the

where n = sample size

Ways to classify statistical studies

You might also like