Applied Statistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Introduction to Statistics

Basic Concepts and Definitions

The word Statistics is derived from the Latin word status meaning “state”.

Early uses of statistics involved compilation of data and graphs describing various aspects of the state or
country.

Statistics – we sometimes use this word when referring to actual numbers derived from data, and the
other refers to statistics as a method of analysis

Statistics – a science that deals with the collection, presentation, analysis, and interpretation of data

The four pillars of statistics

Collection – gathering of information or data

Organization or presentation – summarizing data in textual, graphical or tabular form

Analysis – describing the data by statistical methods or procedures

Interpretation – the process of making conclusions based on the analyzed data

Results from analysis will be used to interpret the data collected

Why study statistics?

Various jobs require aptitude in statistics, and you must be knowledgeable in them, because you are
expected to say something about data. You may also be asked to do research in your field. You can also
use the knowledge gained from studying statistics to become better consumers and citizens.

Objects in the Study of statistics

Variable – a characteristics or attribute that can assume different values

Data – the values (measurements or observations) that the variables can assume.

Random variables – variables whose values are determined by chance

Data set – a collection of data values

Data value or datum – Each value in a data set is called

Descriptive statistics – consists of the collection, organization, summarization, and presentation of data.
The statistician tries to describe a situation; national census

Inferential statistics – generalizing from samples to populations, performing estimations and hypothesis
tests, determining relationships among variables, making predictions; uses probability

Population – consists of all subjects that are being studied

Sample – a group of subjects selected from a population


Parameter – numerical summary or any measurement coming from a population

Statistic – a measurement from a sample

Descriptive or Inferential?

D–

Organizing and summarizing data using numbers and graphs

Data summary; bar graphs, histograms, pie charts, shape of graph and skewness

Measures of central tendency: mean, median, mode

Measures of variability: range, variance, standard deviation

Inferential

Using sample data to make an inference or draw a conclusion of the population

Uses probability to determine how confident we can be that the conclusions we make are correct
(confidence intervals, margins of error)

Increasing sample size will increate confidence interval

Average – also called mean, a number that describes the central tendency of the data

Categorical variable – variables that take on values that are names or labels

Numerical variable – variables

Data and Variables

Observation – each thing we collect data about

Variables – record the measurements we are interested in

Variables can be classified as qualitative or quantitative

Qualitative variables – variables that can be placed into categories, according to some characteristic or
attribute: religious preference, geographic locations, gender, year level, subjects enrolled, student
number

Quantitative variables – numerical in nature, obtained from counting or measuring. Meaningful


arithmetic operations can be done with these kinds of data. Age, weight, height

Variables can also be classified as dependent or independent

Dependent variable – variable affected or influenced by another variable

Independent variable – one that affects, or influences another variable

Independent variable: academic ability, study habits, economic status, teacher, food; dependent
variable: academic achievement
Independent variable: traditional feeds, commercial feeds, temperature/season/environment;
dependent variable: hogs and poultry

Independent variable: political party, economic status, attitude, connections/power; dependent


variable: politics

Independent variable: government officials, world market, attitude/sense of responsibility of people;


dependent variable: economy

Four levels/scales of measurements

Nominal – aka categorical; the categories of a qualitative variable are unordered; the categories are
merely names; used when we want to distinguish one object from another for identification purposes:
gender, nationality, civil status, names, colors, labels, sex, preferred chocolate; numbers labeled do not
imply order; can be displayed as a pie chart, column or bar chart, or stacked column or bar chart

Column chart is best to use for single set of nominal data

Labeled

Ordinal – can be qualitative/quantitative; the categories can be put in order; we can say that one is
greater than the other, but we cannot tell how much one has over the other/differences cannot be
measured: ranking of contestants in a beauty contest; of honor students; of siblings in a family;
satisfaction; fanciness; can be displayed as column or bar chart

Labeled, meaningful order

Interval – Order matters, no true zero point; if one can compare the differences between measurements
of the variable meaningfully, but not the ratio of the measurements; we can say that not only is one
over the other, we can also specify the amount of difference: scores, temperature in Celsius, F, K

Labeled, meaningful order, measurable difference

Ratio – order matters, if one can compare the differences between measurements of the variable
meaningfully and the ratio; like the interval scale, but it always starts from the absolute or true zero
point: height, weight, area, test score

Labeled, meaningful order, measurable difference, true zero point

Interval/ratio can be discrete, with whole numbers, or continuous with fractional numbers

Interval/ratio best represented as bar chart or histogram, box plots, line chart (for data that occurs over
time)

M1 Study Guide

1. Basic Terminologies in Statistics


*Population vs sample
*Parameter vs statistics
*Descriptive vs inferential statistics

2. Data and variables


3. Classification of variables

*Quantitative vs Qualitative

*Levels of measurements

Data Collection

Data – needed whenever we undertake studies or researches; used to solve particular problems or to
provide a basis from which certain decisions are generated

Types of data

Primary data – information collected from an original source, which is first-hand in nature: interviews,
surveys

Secondary data – information collected from published or unpublished sources like books, newspapers,
and theses

Methods of collecting primary data

Direct/interview method – researcher obtains information needed by asking questions and inquiries
from the interviewee; gives precise and consistent information because clarifications can be made; time
consuming, expensive, and has limited field coverage, personal bias; high degree of originality, reliability

Indirect personal interview – interviewing 3 rd person who are directly/indirectly concerned with
the subject matter of enquiry: witnesses, informants

Telephone interview – questions asked through phone; random telephone calls to prospective
informants; cheap, can reach customers all over geographical area; impossible to employ visual aid
(overcome by video conferencing), population not having telephone connection excluded

Indirect/questionnaire method – makes use of a written questionnaire; researcher distributes the


questionnaire to the respondents either by personal delivery or by mail; saves time and money because
questionnaires can be given to a large number of respondents at the same time, sufficient time to
respondents to answer; however, some respondents simply ignore the questionnaires; clarification
cannot be made if the respondent does not understand the question; success depends on how
questionnaire is drafted; low response rate; unreliable; limited accuracy

Registration method – governed by laws: birth & death rates; registered cars; registered voters; involves
pre-collected and organized data from agencies or organization

Experimental method – used to find our cause and effect relationships; often used by scientific
researchers: agriculturists would like to know the effect of a new brand of fertilizer on the growth of
plants

Sampling Techniques

Entire population is seldom used in research because of the cost and time involved.
Sample, a small representative of a population, is used instead

The characteristics of the whole or entire population is described using the characteristics observed
from the sample

Slovin’s formula is used to determine the sample size from a given population size

n = N / (1 + Ne^2)

where n = sample size

N = population size

e = margin of error

A group of researchers will conduct a survey to find out the opinion of residents of a particular
community regarding the oil price hike. If there are 11,060 residents in the community and the
researchers plan to use a sample using a 6% margin of error, what should the sample size be? – 271

N = 16825, e = 8% -- 155

N = 1016, e = 8% - 135

N = 12028, e = 7% - 201

N = 48089, e = 1% - 8279

Sampling Techniques

To obtain samples that are unbiased (that give each subject in the population an equally likely chance of
being selected) statisticians use four basic methods of probability or random sampling: random,
systematic, stratified, and cluster sampling

Simple random sampling – subjects are selected by random numbers; number each subject in the
population then draw lots; generate numbers; works for small geographic area, unbiased; homogeneous
sample
students are selected using random numbers to partake in a survey

Stratified sampling – dividing the population into groups (strata) according to some characteristic that is
important to the study (same characteristic), then sampling from each group; samples within each
stratum should be randomly selected

Systematic sampling – numbering each subject of the population and then selecting every kth subject;
must be careful about how the subjects in the population are numbered/samples must not be
patterned; a researcher may select every kth item; used when the targets are moving, i.e. respondents
needed are building visitors in a given day
every 100th hamburger manufactured is checked to determine its fat content

Cluster sampling – population is divided into groups called clusters by some means, then the researcher
randomly selects some of these clusters and uses all members of the selected clusters as the subjects of
the samples; used when the population is large or when it involves subjects residing in a large
geographic area; instead of selecting a random sample of patients, select a few hospitals at random
instead

Convenience sampling – convenient sample; do what is easy; kind of biased/self-selection bias when ppl
selected are already interested in the study

Ways to classify statistical studies

Observational study – researcher merely observes that is happening or what has happened in the past
and tries to draw conclusions based on these observations

Experimental study – researcher manipulates one of the variables and tries to determine how the
manipulation influences other variables

You might also like