2 Collecting Data

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 43

Collecting Data Sensibly

Consider the following headlines which


occurred on September 25, 2009.
“Spanking lowers a child’s IQ” (Los Angeles
Times)

“Do you spank” Studies indicate it could lower


your kids’ IQ.” (SciGuy, Houston Chronicle)

“Spanking can lower IQ” (NBC4i, Columbus,


Ohio)

“Smacking hits kids’ IQ” (newscientist.com)


Observation versus Experimentation
Look at the following two examples:

•A social scientist studying a rural community wants to


determine whether gender and attitudes toward abortion
are related. Using a telephone survey, 100 residents are
contacted at random and their gender and attitude toward
abortion are recorded.

•A professor might wonder what would happen to final test


scores if the required lab time for a chemistry course is
increased from 3-hours to 6-hours. For 100 chemistry
students, half were randomly assigned to the 3-hour lab
and half to the 6-hour lab. The rest of the course
remained the same for the two groups. The difference in
their final test scores will be examined.
Definitions:
Observational study – a study in which the
researcher observes characteristics of a
sample selected from one or more
populations.

Experiment - a study in which the


researcher observes how a response
variable behaves when one or more
explanatory variables (factors) are
manipulated.
Let’s return to the study on spanking and IQ
In this study, two groups of children were followed for 4 years; 806
children ages 2 to 4 and 704 children ages 5 to 9. IQ was measured
at the beginning of the study and again four years later. Researchers
found that the average IQ of children, ages 2 to 4, who were not
spanked was 5 points higher than those who were spanked and 2.8
points higher for children, ages 5 to 9.

Does spanking “CAUSE” a decrease in IQ?


Why or why not?

Are there other variables connected to the response


(decreased IQ) and the groups of children?
Definition:
Confounding variable – a variable that is
related to both group membership and
the response variable of interest in a
research study
• Observational studies CAN be
generalized to the population if the
sample is randomly selected from the
population of interest, but CANNOT
show cause-effect relationships.

• Well-designed experiments CAN show


cause-effect relationships, but CANNOT
be generalized to the population if the
groups are volunteers or are not
randomly assigned.
Sampling

Section 2.2
Census versus Sample
Why might we prefer to take select a
sample rather than perform a census?
1. Measurements that require destroying
the item
Measuring how long batteries last
Safety ratings of cars
2.Difficult to find entire population
Length of fish in a lake
3. Limited resources
Time and money
Methods of selecting random samples

Simple Random Sample (SRS)

A sample of size n is selected from the


population in a way that ensures that every
different possible sample of the desired size
has the same chance of being selected.
Methods of selecting random samples

Simple Random Sample (SRS) continued

A sample of size n is selected from the


population in a way that ensures that every
different possible sample of the desired size
has the same chance of being selected.

Sampling frame – list of all the objects or


individuals in the population.
How to use a Random digit table
The following is part of the random digit table
found in the back of your textbook:
Row
6 0 9 3 8 7 6 7 9 9 5 6 2 5 6 5 8 4 2 6 4
7 4 1 0 1 0 2 2 0 4 7 5 1 1 9 4 7 9 7 5 1
8 6 4 7 3 6 3 4 5 1 2 3 1 1 8 0 0 4 8 2 0
9 8 0 2 8 7 9 3 8 4 0 4 2 0 8 9 1 2 3 3 2
Methods of selecting random samples
Simple Random Sample (SRS) continued

A sample of size n is selected from the population


in a way that ensures that every different possible
sample of the desired size has the same chance of
being selected.

Although sampling with and without replacement


are different, they can be treated as the same
when the sample size n is relatively small compared
to the population size (no more than 10% of the
population).
Methods of selecting random samples
Stratified Random Sample

• Population is divided into non-overlapping


subgroups called strata
• Simple random samples are selected from each
stratum
• Sometimes easier to implement and is more cost
effective than simple random sampling
• Sometimes allows more accurate inferences
about a population than simple random sampling
Methods of selecting random samples

Cluster Sampling

• Population is divided into non-overlapping


subgroups called clusters

• Randomly select clusters and then all the


individuals in the clusters are included in the
sample

• Cluster sampling is often easier to perform


and more cost effective.
Methods of selecting random samples
Systematic Sampling

• A value k is specified (for example k = 50 or


k = 200).
• One of the first k individuals is selected at
random.
• Then every kth individual in the sequence is
included in the sample.
• This method works reasonably well as long as
there are no repeating patterns in the
population list.
Identify the sampling design
1)The Educational Testing Service (ETS)
needed a sample of colleges. ETS first
divided all colleges into groups of similar
types (small public, small private, medium
public, medium private, large public, and
large private). Then they randomly selected
3 colleges from each group.

Stratified random
sample
Identify the sampling design
2) A county commissioner wants to survey
people in her district to determine their
opinions on a particular law up for adoption.
She decides to randomly select blocks in
her district and then survey all who live on
those blocks.

Cluster sampling
Identify the sampling design
3) A local restaurant manager wants to survey
customers about the service they receive.
Each night the manager randomly chooses a
number between 1 & 10. He then gives a
survey to that customer, and to every 10th
customer after them, to fill it out before
they leave.

Systematic sampling
Consider the following example:

In 1936, Franklin Delano Roosevelt had been President


for one term.  The magazine, The Literary Digest,
predicted that Alf Landon would beat FDR in that
year's election by 57 to 43 percent.  The Digest mailed
over 10 million questionnaires to names drawn from
lists of automobile and telephone owners, and over 2.3
million people responded - a huge sample.
At the same time, a young man named George Gallup
sampled only 50,000 people and predicted that
Roosevelt would win.  Gallup's prediction was ridiculed
as naive.  After all, the Digest had predicted the
winner in every election since 1916, and had based its
predictions on the largest response to any poll in
history.  But Roosevelt won with 62% of the vote.  The
size of the Digest's error is staggering. 
Sources of bias
Selection bias

• Occurs when the way the sample is selected


systematically excludes some part of the
population of interest –called undercoverage

• May also occur if only volunteers or self-


selected individuals are used in a study
Sources of bias

Convenience sampling

• Using an easily available or convenient group to


form a sample.
– The group may not be representative of the
population of interest
– Results should not be generalized to the population

• Can also occur when samples rely entirely on


volunteers to be part of the sample – called
voluntary response
Sources of bias
Measurement or Response bias

• Occurs when the method of observation tends


to produce values that systematically differ
from the true value in some way
– Improperly calibrated scale is used to weigh items
– Tendency of people not to be completely honest when
asked about illegal behavior or unpopular beliefs
– Appearance or behavior of the person asking the
questions
– Questions on a survey are worded in a way that tends
to influence the response
Sources of bias

Nonresponse

• occurs when responses are not obtained from all


individuals selected for inclusion in the sample

• To minimize nonresonse bias, it is critical that a


serious effort be made to follow up with
individuals who did not respond to the initial
request for information
Identify a potential source of bias.
1) Before the presidential election of 1936, FDR
against Republican ALF Landon, the magazine
Literary Digest predicting Landon winning the
election in a 3-to-2 victory. A survey of 2.3
million people. George Gallup surveyed only
50,000 people and predicted that Roosevelt
would win. The Digest’s survey came from
magazine subscribers, car owners, telephone
directories, etc.
Identify a potential source of bias.

2) Suppose that you want to estimate the


total amount of money spent by students
on textbooks each semester at a local
college. You collect register receipts for
students as they leave the bookstore
during lunch one day.
Identify a potential source of bias.

3) To find the average value of a home in Plano,


one averages the price of homes that are
listed for sale with a realtor.
Comparative Experiments

Sections 2.3 & 2.4


Room temperature experiment continued . . .
We decide to use two temperature settings, 65°
and 75°.

How many treatments would our experiment


have?
the 2 treatments are the
2 temperature settings
Room temperature experiment continued . . .
Suppose we have 10 sections of first-semester
calculus that have agree to participate in our study.

On who or what will we impose the treatments?


the 10 sections of calculus

How would we determine which sections would be in


rooms with the temperature set at 65° and which
sections in rooms set at 75°?
we need to randomly assign them
to the treatments
Room temperature experiment continued . . .
To randomly assign the 10 sections of first-
semester calculus to the 2 treatment groups, we
would first number the classes 1-10.

Sections assigned
98
5
73 Treatment 1 (65°) 9 7 5 8 3
Treatment 2 (75°) 1 2 4 6 10
Room temperature experiment continued . . .
Notice that there are five sections assigned to
each treatment.

This is called replication.


Sections assigned
Treatment 1 (65°) 9 7 5 8 3
Treatment 2 (75°) 1 2 4 6 10
Room temperature experiment continued . . .
Remember – the explanatory variable is the
room temperature setting, 65° and 75°. The
response variable is the grade on the calculus
exam.

Are there other variables that could affect the


response?
Instr
uctor k ?
?
y? b oo
f da x t
o
Ti me Te
Ability
level of
student
s?
Room temperature experiment continued . . .
Suppose that there were five instructors who
taught the first-semester calculus. We do not
have direct control of this variable; however, we
could have each instructor teach 2 sections.
Then we could randomly assign which one of the
2 sections would have a temperature setting of
65° and the other would have a temperature
setting of 75°.
Room temperature experiment continued . . .
What about extraneous variables that we
cannot control directly or that we cannot block
for or that we don’t even think about?

Random assignment should evenly spread all


extraneous variables, that are not controlled
directly or that are not blocked, into all
treatment groups. We expect these variables
to affect all the experimental groups in the
same way; therefore, their effects are not
confounding.
Room temperature experiment continued . . .
Would the students in each section of calculus
know to which treatment group, 65° or 75°, they
were assigned?

If the students knew about the experiment,


they would probably know which treatment
group they were in.

So this experiment is probably NOT


blinded.
In the room temperature experiment, we only
have 2 treatment groups, 65° and 75°. We do
NOT have a control group.

Control group is an experimental group that


does NOT receive any treatment.

The use of a control group allows the


experimenter to assess how the response
variable behaves when the treatment is not
used.
This provides a baseline against which the
treatment groups can be compared to determine
whether the treatment had an effect.
Consider Anna, a waitress. She decides to
perform an experiment to determine if writing
“Thank you” on the receipt increases her tip
percentage.

She plans on having two groups. On one group


she will write “Thank you” on the receipt and on
the other group she will not write “Thank you”
on the receipt.
Suppose we want to test an herbal supplement
to determine if it aided in weight loss.

Why would it not be beneficial have two


groups in the experiment; one that takes the
supplement and a control group that takes
nothing?

What could be done to remedy this problem?

Give one group the supplement and give the other


group a pill that is the same size, color, taste, smell,
etc. as the supplement, but contains no active
ingredient.
Let’s recap some ideas-

Random assignment removes the


potential for confounding variables.

Blocking uses extraneous variables to


create groups (blocks) that are similar.
All treatments are then tried in each
block.

Direct control holds extraneous variables


constant so their effects are not
confounded with the treatments.
The ONLY way to show a
cause-effect relationship
is with a well-designed,
well-controlled
experiment!!!

You might also like