PSYCH 240: Statistics For Psychologists: Interval Estimation: Understanding The T Distribution

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 44

PSYCH 240:

Statistics for Psychologists

Interval
Estimation:

Understanding
the t
Distribution
A Probability Distribution
.30

.25
Rela tive Freq uency

.20

.15

.10

.05

0.00
0 1 2 3 4 5 6 7 8 9 10

Number of Tails
Probability Distribution with
Means

 We want to assess the likelihood of getting a


particular result (rare events are interesting)
 Based on population parameters and sample
statistics, we can use sampling distributions to
get this likelihood
 But, just as with scores, sampling distributions
of the mean based on raw data are not helpful
 Thus, we standardize our sample means to get
a more interpretable result
Standardizing Means

 The standard normal distribution of z


scores before told us the likelihood
of getting individuals scores
 Now, we can get the likelihood of
getting particular sample means
using what we know about sampling
distributions and the central limit
theorem
How rare is too rare?
Inferential Statistics

 Definition: Numbers that will help us to


make conclusions about populations on
the basis of our samples
 General Techniques
 Hypothesis Testing: Determine statistical
significance (uniqueness) of a parameter
 Confidence Interval Estimation: Estimate
the likely interval for a parameter
Logic of Hypothesis Tests

 Hypothesize a parameter value upon which to


center the sampling distribution
 Use Central Limit Theorem to identify shape of the
sampling distribution
 Identify where data lies on the sampling distribution
 Decide whether the data seems likely/consistent
with the hypothesized parameter
Conventional Decision Rules
 But what exactly is the probability that an
outcome is considered too low to be purely by
chance?
 Alpha (α ) reflects the cutoff probability where
we say the event has occurred because of
some characteristic of the study
 Convention suggests that an appropriate value
for alpha is either .05 or .01
 We almost always use .05
Characterizing Decision
(Rejection) Regions
 The location of the cutoffs on the
distribution can be identified two
distinct but equivalent ways
 Using the probability (alpha)
 Using the z-score (now called a critical
value) associated with α
Determining Critical Values

α /2 α /2

Significant Not Significant Significant

Lower CV Upper CV
The Use of the Critical
Values
 Statistical Significance refers to the
rarity of an event
 If a sample mean is particularly rare,
we call it statistically significant
 Over samples of size N, the
probability for an event is
determined by:
µ − (CV Z )(σ M ) ≤ µ ≤ µ + (CV Z )(σ M )
The p value revisited

 Statistical significance is telling us that


there is something different or unique
about our sample
 But compared to what?
 Since we need a comparison, we use
what is called the Null Hypothesis
H0:M = µ
H1: M ≠ µ
The p value defined

The p value tells you the probability of


getting a particular result if the null
hypothesis were true

Thus, given an alpha of .05, any p


value less than .05 (p < .05) is called
“statistically significant”
Steps of Hypothesis Testing

1. Calculate the appropriate statistic


(such as z)
2. Estimate the p value (such as p < .
05 or p > .05) by using critical
values
3. Calculate the p value (if possible)
4. Make the appropriate conclusion
about statistical significance
Sampling as an Inexact
Science
 In most cases, we do not know the
population parameters
 The very reason we are sampling is
that we wish to infer the parameters
 Both our samples’ means AND
standard deviations will have
sampling error
When Variances are
Unknown
 When the population variance is unknown,
we need to estimate it on the basis of our
sample
 Thus, our standard error of the mean is
directly influenced:
Standard Error based σ
on Population Scores
σ M =
N
Standard Error SD
estimated from Sample SEM =
Scores N
Distinguishing Sampling
Distributions

M −µM M −µM
z= t=
σM SE M
What is the t distribution???

 When we don’t know the population


characteristics, we cannot use z
scores
 We know this because our samples
are imperfect and would probably
cause use to have biased results
 Thus, we make a correction with a
family of distributions that differ in
shape depending on sample size
Shape and Estimation

 Is the overall shape affected in any


systematic way by estimation? Yes . . .
 Degrees of freedom are what determine
the shape of the sampling distribution
 Every statistic in a specific context has a
certain number of degrees of freedom
 The degrees of freedom (df) for any
statistics is the number of components in its
calculation that are free to vary
 Practically speaking, df = N - 1
Shape of the t Distribution
df = infinity

df = 25

df = 9

df = 1
qer F e vit al e R

-4 -3 -2 -1 0 1 2 3 4
Using the t Distribution
for Probabilities
 Therefore, we can NOT use the normal
distribution for cases in which we estimate
the population variance
 Instead, we use the t distribution
 Directly calculating probabilities is too difficult
because we no longer have 1 distribution in
total, but instead 1 distribution for every degree
of freedom
 Often, we are only interested in certain
locations (i.e., critical values) in the distribution
Determining Critical Values

α /2 α /2

Significant Not Significant Significant

Lower CV Upper CV
Calculating Probability of a
Statistic
 In order to calculated the probability
of a statistic, we need to know
 The mean of the possible means
 The standard deviation of the possible
means
 The shape of the distribution of means
Comparison of z scores for
individuals and samples

 Using the z Formula for Scores:

Y −M
z= = ?, p = ?
SD
 Using the z Formula for Means:

M −µ
z= = ?, p = ?
σM
Comparison of z and t

 Using the z Formula M −µ


for Means: z= = ?, p = ?
σM

 Using the t Formula


for Means:
M − µM
t (df ) = = ?, p = ?
SEM
Using the t Distribution

Example: College students exercise 3


times per week on average. I select
a sample of 16 students who
exercise 3.5 time per week (SD =
1.5). What is was the probability of
getting a sample this distinct?
M −µM
t ( df ) = =?, p =?
SE M
Using the t Distribution

Example: College students exercise 3


times per week on average. I select
a sample of 16 students who
exercise 3.5 time per week (SD =
1.5). What is was the probability of
getting a sample this distinct?
3.5 −3
t (15 ) = =1.333 , p ≅.20
1.5
16
Some Problems

 Determine the two-tailed probabilities for


each of the following samples (μ = 3.0):
 N = 100, M = 3.2, SD = 1.0
 N = 25, M = 3.6, SD = 1.3
 N = 9, M = 2.75, SD = 1.9
 What conclusions would you make about
the likelihood of each of these samples?
Problems
For N = 100, M = 3.2, SD = 1.0

3.2 − 3.0 0.2 0.2


t (99) = = = = 2.0
1.0 1 0 .1
100 10

t (99) = 2.00, p < .05


Problems
For N = 25, M = 3.6, SD = 1.3

3.6 − 3.0 0.6 0.6


t (99) = = = = 2.308
1.3 1.3 0.26
25 5

t (24) = 2.31, p < .05 Or you could report:

t (24) = 2.31, p ≅ .03


Problems
For N = 9, M = 2.75, SD = 1.9

2.75 − 3.0 − 0.25 − 0.25


t (8) = = = = 0.395
1.9 1.9 0.633
9 3

t (8) = 0.40, p > .05


How Can We Interpret These
Outcomes?
 What implications might these have
for how we think about our samples?
 Possibilities:
 Our sample happened by total chance
 Our sample is not representative of the
population
Reporting non-significance
 There are several ways of
reporting non-significant
inferential stats
t (8) = 0.39, p ≅ .70
This is the most
specific

This is least specific


t (8) = 0.39, p > .05
t (8) = 0.39, n.s.
This is most common
Reporting non-significance

t <1
This is the most succinct for t values < 1

No t value less than 1 for any level of df can be significant, so why bother reporting everything?
Point vs. Interval Estimation

 Point Estimation: The process of deriving a


single best estimate of a parameter (such
as a mean)
 Interval Estimation: The process of
deriving a range of values that may
include a parameter (such as a mean)
 Example: Polling and the prediction of
election results
Confidence Intervals
 This creates a range of values that
is expected to include the
population parameter
 Width is therefore determined by:
 Standard Error (Standard deviation
and Sample size)
 Probability level
The Concept of Sampling as an
Inexact Process

 The “center”
is the
parameter –
the value of
interest
 The “arrows”
are the
samples
taken
One-Sample Problem #1

The Scenario: An instructor is interested in


the exam performance of his students.
His newest class (N = 29) had a M =
88.14 and a SD = 10.53.

With 95% confidence, what is the likely


mean for the population exam score?
The Logic Behind Calculating a
Confidence Interval

 In research, “skill
level” is equivalent
to the standard error
of the mean
 The “radius” is
therefore
determined by our
confidence level
AND the standard
error
Steps in Creating the
Confidence Interval

1. Establish the center of the interval.


2. Obtain the t score appropriate for the
level of confidence.
3. Estimate the standard error.
4. Calculate the confidence limits in raw
score values by using the following
equation:
CI M = M ± (CVt )( SEM )
One-Sample Problem #1

The Scenario: An instructor is interested in


the exam performance of his students.
His newest class (N = 29) had a M =
88.14 and a SD = 10.53.

With 95% confidence, what is the likely


mean for the population exam score?
Calculating Our Example

CI M = M ± (CVt )( SE M )

 Since CVt = 2.048 and the SE =


1.96, the 95% confidence interval
ranges from 92.14 to 84.14.
One-Sample Problem #2

The Psychology Department is interested


in learning about its statistics teachers. A
sample of 15 psychologists have a mean
teacher evaluation of 1.2 and a SD = .3.

With 99% confidence, what is the likely


mean for the population evaluation?
Interpreting the Meaning of
Confidence Intervals

 Each interval
is an
estimate of
the
parameter
 Not every
interval is
guaranteed
to include
the
parameter

You might also like