Biostateunit 3
Biostateunit 3
Biostateunit 3
Simple Hypothesis
Suppose the company claims that the sales are in the range
of 900 to 1000 units
Composite Hypothesis
Hypothesis testing: Null and alternative hypotheses
• H0: µ ≤ µ0 Ha : µ > µ0
Define:
The power of a test is the probability, using that test, that we will
reject H0 when it is false.
• Thus, Power = 1 −β .
• Some tests have higher power than others, but often at the
price of more restrictive assumptions
Hypothesis testing: Six-step procedure
1. Formulate H0 and Ha, based on the scientific question of
interest.
X − µ0.
• ±t1−α/2,n−1 if Ha is two-sided
• t1−α,n−1 if Ha is µ > µ0
• tα,n−1 if Ha is µ < µ0
A researcher wanted to test the hypothesis that the mean body tem-
perature of African elephants was 96.0◦F. He has no prior notion
about which direction the mean body temperature of elephants will
differ from 96.0◦F if it is not equal to 96.0.
● The critical region (or rejection region) is the set of all values of
the test statistic that cause us to reject the null hypothesis
P-value
● The P-value is known as the probability value.
● represents the probability of occurrence of the given event
● If the P-value is small, then there is stronger evidence in
favour of the alternative hypothesis
Decision Criterion for hypothesis testing
Decision Criterion for hypothesis testing
Looking at the P-value table, the p-value of 0.0219 is less than the
level of significance of 0.05, we reject the null hypothesis.
Decision Criterion for hypothesis testing
Sample_1 : 535.5
Sample_2 : 495.2
Sample_3 : 510.5
Sample_4 : 497.7
Sample_5 : 504.3
P-value = 0.001 * 2
P-value = 0.002
problem statement and the hypothesis, finding a sample mean
value of 535.5 has a probability of 0.002 or 0.2%, which is
extremely less than the significance level (0.05 or 5%) and is
considered too far away from the population mean (500).
Z=-4.96
Z> Zα
The null hypothesis is rejected and the hospital is efficient in bringing down the fatality rate.
The statistical tests enables us to make decisions on the basis of observed pattern from data. There is
a wide range of statistical tests. The choice of which statistical test to utilize relies upon the
• structure of data
• the distribution of the data
• variable type.
Parametric tests are used if the data is normally distributed.
A parametric statistical test makes an assumption about the population parameters and the
distributions that the data came from.
These types of test includes
• t-tests,
• z-tests
• anova tests, which assume data is from normal distribution.
Chi-square test( χ2 test)- chi-square test is used to compare two categorical variables. Calculating the
Chi-Square statistic value and comparing it against a critical value from the Chi-Square distribution allows
to assess whether the observed frequency are significantly different from the expected frequency.
Z-test- A z-test is a statistical test used to determine whether two population means are
different when the variances are known and the sample size is large. In z-test mean of the
population is compared. The parameters used are population mean and population standard
deviation. Z-test is used to validate a hypothesis that the sample drawn belongs to the same
population.
T-test-In t-test the mean of the two given samples are compared. A t-test is used when the
population parameters (mean and standard deviation) are not known. There are different
categories including: one sample t-test, independent t-test, paired t-test.
Analysis of variance (ANOVA) is a statistical technique that is used to check if the means
of two or more groups are significantly different from each other. ANOVA checks the
impact of one or more factors by comparing the means of different samples. If we use a t-test
instead of ANOVA test it won’t be reliable as number of samples are more than two and it will
give error in the result.
P-Value Method: Data Set lists a sample of 106 body temperatures having a mean of Assume that
the sample is a simple random sample and that the population standard deviation s is known to be
Use a 0.05 significance level to test the common belief that the mean body temperature of healthy
adults is equal to 98.6F.
n=106
Sample mean=98.2
Standard deviation=0.62
Testing one population mean:
T-test statistic
This test is used when the variable is numerical and only one population or group is being
studied. For example, Dr. Phil says that the average time that working mothers spend talking
to their children is 11 minutes per day.
Conditions to be met:
• σ known (not from data)
• Population approximately Normal or large sample (central limit theorem)
• Data valid
Steps to be followed:
• In the 1970s, 20–29 year old men in the U.S. had a
mean μ body weight of 170 pounds. Standard
deviation σ was 40 pounds. We test whether mean
body weight in the population now differs
1. Null hypothesis H0: μ = 170 (“no difference”)
2. The alternative hypothesis can be either
Ha: μ > 170 (one-sided test) or
Ha: μ ≠ 170 (two-sided test)
3.
x 0
z stat
SE x
where 0 population mean assuming H 0 is true
and SE x
n
4. Finding the P-value: What is the probability of the
observed test statistic or one more extreme when H0 is
true?
• This corresponds to the AUC in the tail of the Standard
Normal distribution beyond the zstat.
• Convert z statistics to P-value :
For Ha: μ > μ0 P = Pr(Z > zstat) = right-tail beyond zstat
For Ha: μ < μ0 P = Pr(Z < zstat) = left tail beyond zstat
For Ha: μ μ0 P = 2 × one-tailed P-value
If we found a sample mean of
185, then Zstat = 3
Examples
P =.27 non-significant evidence against H0
P =.001 highly significant evidence against H0
Reject H0 when P ≤ α
Retain H0 when P > α
Set α = .10.
For P = 0.27 retain H0
For P = .001 reject H0
Testing one population proportion
• This test is used when the variable is categorical (for example, gender or political
party) and only one population is being studied (for example, all U.S. citizens).
• The test is looking at the proportion (p) of individuals in the population who
have a certain characteristics.
• For example, the proportion of people who carry cellphones.
• Suppose Cavifree toothpaste claims that four out
of five dentists recommend Cavifree toothpaste to
their patients. In this case, the population is all
dentists, and p is the proportion of all dentists who
recommended Cavifree to their patients. The claim
is that p is equal to “four out of five,” which means
that po is 4 / 5 = 0.80. You suspect that the
proportion is actually less than 0.80.
• Your hypotheses are Ho: p=0.80 versus
• Ha: p <0.80.
• Suppose that 150 out of 200 dental patients
sampled received a recommendation for Cavifree.
Test statistic comparing two
population means
• This test is used when the variable is numerical (for
example,income, cholesterol level, or miles per gallon) and
two populations or groups are being compared (for
example, cars versus
• SUVs).
• Two separate random samples need to be selected, one
from each population, in order to collect the data needed
for this test.
• The null hypothesis is that the two population means are
the same; in other words, that their difference is equal to 0.
• The notation for the null hypothesis is
•
Testing the mean difference Paired data
• This test is used when the variable is numerical (for example,
• cholesterol level or miles per gallon), and the individuals in the
sample are either paired up in some way (identical twins are
often used) or the same people are used twice (for example,
using a pretest and post-test).
• Paired tests are used for comparisons where you want to
minimize the chance of the treatment and control groups being
too different (and hence biased).
• Testing paired data amounts to testing one population mean,
where the null hypothesis is that the mean (of the paired
differences) is 0, and the alternative hypothesis is that the mean
(of the paired differences) is greater, lesser or not equal to 0.
Chi-Square Test
12 + 8 + 20 + 2 + 14 + 10 + 15 + 6 + 9 + 4 = 100
● χ2 value of 4.726
Solution