C 3 Inferential Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

CHAPTER 3

INFERENTIAL STATISTICS

3.1 Hypothesis testing


3.2 Types of statistical hypotheses
3.3 Types of error
3.4 The probability value (p-value)
3.5 Statistical power
3.6 Confidence interval for the mean

1
The purpose of inferential statistics is to designate how likely it is that a given
finding is simply the result of chance. Inferential statistics would not be
necessary if investigators studied all members of a population. However,
because we can rarely observe and study entire populations, we try to select
samples that are representative of the entire population so that we can
generalize the results from the sample to the population.

3.1 HYPOTHESIS TESTING

Here, we will mention some basic concepts related to the subject of hypothesis
testing.

The Purpose of Hypotheses Testing

Research typically involves measuring one or more variables for a sample and
computing descriptive statistics for that sample. In general, however, the
researcher’s goal is not to draw conclusions about that sample but to draw
conclusions about the population that the sample was selected from. Thus,
researchers must use sample statistics to conclude about the corresponding
values in the population. These corresponding values in the population that
correspond to variables measured in a study are called parameters.

For example, a researcher measures the number of depressive symptoms


exhibited by each of 50 clinically depressed adults and computes the mean
number of symptoms. The researcher probably wants to use this sample
statistic (the mean number of symptoms for the sample) to conclude about the
corresponding population parameter (the mean number of symptoms for
clinically depressed adults).

2
Example 3.1:

For example, the mean, median, and variance are parameters of a population
and they are denoted by  , M and  2 respectively.

The corresponding values for the sample are called the sample mean, sample
median, and the sample variance, and they are denoted by X , m and S 2
respectively.

3.2 TYPES OF STATISTICAL HYPOTHESES

There are two types of statistical hypotheses, the null hypothesis which is
denoted by 𝐻0 and the alternative hypothesis which is denoted by 𝐻1 .

• The null hypothesis is the statement that the value of the parameter is
equal to the claimed value. We assume that the null hypothesis is true
until we prove that it is not.

• The alternative hypothesis is the statement that the value of the


parameter differs in some way from the null hypothesis.

3
Example 3.2:

For the data represent the heart rate values (in beats per minute or bpm) for a
sample of 100 patients.
Let 𝜇 denote the mean of heart rate beats per minute for the population of all
patients. The following are examples of statistical hypotheses

1. 𝐻0 : 𝜇 = 65
𝐻1 : 𝜇 ≠ 65 (Two-sided)

2. 𝐻0 : 𝜇 = 65
𝐻1 : 𝜇 > 65 (One side)

3. 𝐻0 : 𝜇 = 65
𝐻1 : 𝜇 < 65 (One side)

Hypothesis testing is an approach to inferential statistics. Hypothesis testing


involves using sampling distributions and the laws of probability to make an
objective decision about whether to accept or reject the null hypothesis.

4
3.3 TYPES OF ERRORS

There are two types of errors when we decide whether to accept or reject the
null hypothesis.

1. Type I error
This error happens if we reject 𝐻0 when it is true.

2. Type II error
This error happens if we fail to reject (accept) 𝐻0 when it is false.

The level of significance is the probability of a type I error and it is denoted


by 𝛼.
Usually, the value of 𝛼 is small and most of the researchers take 𝛼 equals one
of the following values, 0.01, 0.05, or 0.10.

5
3.4 THE PROBABILITY VALUE

The Probability value (p-value) is the smallest value of 𝛼 for which we can
reject the null hypothesis 𝐻0 .

Using the p-value as a decision tool


1. If the P-value is less than or equal to α (P-value ≤ α), we reject the null
hypothesis
2. If the p-value is greater than α (P-value > α), we do not reject the null
hypothesis

A test statistic is a number calculated from a statistical test of a hypothesis.


It shows how closely your observed data match the distribution expected
under the null hypothesis of that statistical test.

The test statistic is used to calculate the p-value of your results, helping to
decide whether to reject your null hypothesis.

Types of statistical tests


There are two types of statistical tests used in hypothesis tests

1. Parametric tests: In parametric tests, the probability distribution of the


population must be known.

2. Non-Parametric tests: In non-parametric tests, the probability


distribution of the population must not be known.

6
3.5 STATISTICAL POWER

Statistical power

i. In statistics, power is the capacity to detect a difference if there is one.


ii. Increasing statistical power allows us to detect what is happening in the
data.

iii. Power is directly related to type II error: 1 – β = Power

iv. There are a number of ways to increase statistical power. The most
common is to increase the sample size.

Example 3.3:

Answer: E (The power of the test)


7
Example 3.4:

Type II error 𝛽 = 0.20

The power of the test = 1 − 𝛽 = 0.80

Answer: D

Example 3.5:

Since 95% 𝐶𝐼 does not include 1, then alcohol consumption plays a role in

the occurrence of breast carcinoma. Therefore, the null hypothesis is

rejected.

If 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.05, so 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.03

Answer: A
8
Example 3.6:

Answer: A

9
3.6 CONFIDENCE INTERVAL FOR THE MEAN

Confidence Intervals

Confidence intervals are a way of admitting that any measurement from a


sample is only an estimate of the population. Although the estimate given
from the sample is likely to be close, the true values for the population may
be above or below the sample values. A confidence interval specifies how far
above or below a sample-based value the population value lies within a given
range, from a possible high to a possible low. Reality, therefore, is most likely
to be somewhere within the specified range

Practice Questions

1. Assuming the graph presents 95% confidence intervals, which groups, if


any, are statistically different from each other?

10
Answer: When comparing two groups, any overlap of confidence intervals
means the groups are not significantly different. Therefore, if the graph
represents 95% confidence intervals, Drugs B and C are no different in their
effects; Drug B is no different from Drug A; Drug A has a better effect than
Drug C.

Confidence interval for the mean of a normally distributed population

𝐴 (1 − 𝛼)100% confidence interval (CI) for the unknown mean of a

normally distributed population is given by

𝑆𝐷
𝑀𝑒𝑎𝑛 ± 𝑧𝛼
2 √𝑛

𝑀𝑒𝑎𝑛 = 𝑋̅ (The mean of the sample)

𝑆𝐷 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

𝑛 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

𝑧𝛼 = z-score for the confidence interval


2

Confidence level 90% 95% 99%


𝜶 0.10 0.05 0.01
𝜶 0.05 0.025 0.005
𝟐
𝒛𝜶 1.645 1.96 2.58
𝟐

𝑺𝑫
𝑨 𝟗𝟎% confidence interval is 𝑴𝒆𝒂𝒏 ± 𝟏. 𝟔𝟓
√𝒏

𝑺𝑫
𝑨 𝟗𝟓% confidence interval is 𝑴𝒆𝒂𝒏 ± 𝟏. 𝟗𝟔
√𝒏

𝑺𝑫
𝑨 𝟗𝟗% confidence interval is 𝑴𝒆𝒂𝒏 ± 𝟐. 𝟓𝟖
√𝒏

11
Example 3.7:

Answer: B.

Example 3.8:

A physical therapist wished to estimate, with 99 percent confidence, the mean


maximal strength of a particular muscle in a certain group of individuals. He
is willing to assume that strength scores are approximately normally
distributed with a variance of 144. A sample of 15 subjects who participated
in the experiment yielded a mean of 84.3.

Answer: 76.3 to , 92.3 (76.3, 92.3)

We say we are 99 percent confident that the population mean is between76.3


and 92.3 since, in repeated sampling, 99 percent of all intervals that could be
constructed in the manner just described would include the population mean.

12
Nonnormal Populations

It will not always be possible to assume that the population of interest is


normally distributed. Using the central limit theorem, we have learned that for
large samples (greater than 30), the sampling distribution of 𝑥̅ is
approximately normally distributed regardless of how the population is
distributed.

Example 3.9:

Punctuality of patients in keeping appointments is of interest to a research


team. In a study of patient flow through the offices of general practitioners, it
was found that a sample of 35 patients was 17.2 minutes late for appointments,
on the average. Previous research had shown the standard deviation to be
about 8 minutes. The population distribution was felt to be nonnormal. What
is the 90 percent confidence interval for 𝜇, the true mean amount of time late
for appointments?

Answer: 15.0 to 19.4 (15.0, 19.4)

Remark

Frequently, when the sample is large enough for the application of the central
limit theorem, the population variance is unknown. In that case we use the
sample variance as a replacement for the unknown population variance in the
formula for constructing a confidence interval for the population mean.

13
Confidence intervals for relative risk and odds ratios

If the given confidence interval contains 1.0, then there is no statistically

significant effect of exposure.

Example 3.8:

1. If RR > 1.0, then subtract 1.0 and read as percent increase. So, 1.77

means one group has 77% more cases than the other.

2. If RR < 1.0, then subtract from 1.0 and read as reduction in risk. So, 0.78

means one group has a 22% reduction in risk.

14

You might also like