2 Parametric Test Part I

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 120

STATISTICS

WITH SOFTWARE APPLICATION


PART I

MS. CYPRESS C. SAVILLA.


Instructor
Parametric Statistics
Parametric statistical procedures are
inferential procedures that rely on testing
claims regarding parameters such as the
population mean, the population standard
deviation, or the population proportion.
In some circumstances, the use of
parametric procedures requiresthat
certain requirements regarding the
distribution of the population, such
normality, be satisfied. as
Parametric Statistics
✦ Assume underlying statistical
distributions in the data. Therefore,
several conditions of validity must be
met so that the result of a parametric
test is reliable.
✦ Apply to data in ratio scale, and some
apply to data in interval scale.
Two Common Forms of
Statistical Inference

1. Estimation
2. Hypothesis Testing
Estimating the Value of
a Parameter

In statistics, estimate is use to


approximate the value of an
unknown population parameter.
Two Types of
Estimation

1.Point estimation -(single points that


are used to infer parameters directly).

2.Interval estimation - (also


called confidence interval for
parameter).
Parameter and Statistic
A parameter is a
numerical characteristic
of the population. Any
characteristics of a
population are called a
parameter.
A statistic is a
numerical value that
describes a sample or a number computed
What Properties make a
Good Point Estimator?
1.It's desirable that the sampling
distribution be centered around the true
population parameter. An estimator with
this property is called unbiased.

2.It's desirable that our chosen estimator


have a small standard error in comparison
with other estimators we might have
chosen.
Confidence Interval
Confidence interval provides more
information than point estimates and it consist
of an interval of numbers.

Level of confidence represents the expected


proportion of intervals that will contain the
parameter if a large number of different
samples is obtained.

The level of confidence is × 100%


Confidence Interval
Confidence interval estimates are of the form
Point estimate margin of error
Margin of Error
Point Estimate±

Estimate + Margin of
Estimate - Margin of Error Estimate Error
Margin of Error
The margin of error of the estimate can
be computed using this formula:
𝜎 𝑠
𝐸 = 𝑧𝛼 Τ2 or𝐸 = 𝑧𝛼Τ2
𝑛 𝑛

Standard Error of Estimate


Margin of Error
The margin of error of a confidence
interval estimate a parameter
of
depends on three factors:
1. Level of Confidence
2. Sample Size
3. Standard Deviation
Interpretation of
Confidence Interval
A 1 − 𝛼 × 100% confidence interval
indicates that, if we obtained
many simple random samples of size
n
the population from
whose , is
mean unknown, then of the
approximately intervals will
contain .
Interpretation of
Confidence Interval

In OtherWords:
We are (insert level of confidence) confident
that the population mean is between (lower
bound) and (upper bound). This is an
abbreviated way of saying the method is
correct 1 − 𝛼 the time.
× 100% of
Interpretation of
Confidence Interval
Example:
If we constructed a 90% confidence interval with a
lower bound of 12 and an upper bound of 18, we
would interpret the intervals as follows:

“We are 90% confident that the population mean,


is between 12 and 18”.
Remember:
A 95% confidence interval does not
mean that there is 95% probability
that the interval contains population
mean.
Estimating the Value of a Parameter
Using Confidence Intervals

1. Constructing confidence intervals about


a population mean where the
population standard deviation is (known
or unknown).
2. Constructing confidence intervals about
a population proportion.
3. Constructing confidence intervals about
a population standard deviation.
Confidence intervals about a
population Mean where the Population
Standard Deviation is Known
Case 1:
𝜎 is Known and 𝑛 ≥ 30
𝜎
𝑥 ± 𝑧 𝛼 Τ2
𝑛

Point Estimator
Margin of Error
Confidence intervals about a
population Mean where the Population
Standard Deviation is Unknown
Case 2:
𝜎 is unknown and 𝑛 ≥ 30
𝑠
𝑥 ± 𝑧𝛼 Τ2
𝑛
Note:

If the sample size is large (n > 30), then the sample


standard deviations can be used to estimate the
population standard deviation.
Confidence intervals about a
population Mean where the Population
Standard Deviation is Unknown
Case 3:
𝜎 is unknown and 𝑛 < 30
𝑠
𝑥 ± 𝑡 𝛼 Τ2
𝑛

Where is computed with n - 1 degrees of


𝑡 𝛼 Τ2
freedom.
Example 1
How much do Filipinos sleep each night? Based
on a random sample of 1120 Filipinos 15 years of
age or older, the mean amount of sleep per night
is 8.17 hours according to the Filipino Time. Use
Survey conducted by the Bureau of Labor
Statistics. Assuming the population standard
deviation for amount of sleep per night is 1.2
hours, construct and interpret a 95% confidence
interval for the mean amount of sleep per night of
Filipinos 15 years of age or older.
Solution:
Given:
The z – score for confidence level 95% in
the z – table is 1.96. Apply Case 1.
𝑛 = 1,120 𝑥 = 8.17 𝜎 = 1.2
1.2 “We are 95% confident
8.17 ± 1.96 that the population mean
1120 is between 8.10 and
8.24”.

8.17 ± 0.0703 = (8.0997,8.2403)


Example 2
Suppose we would like
to estimate the mean
amount of money spent
on books by BS
Statistics students in a
semester. We have
data from 20 randomly
selected
students. Construct
and interpret a 95%
confidence interval.
Solution:
We will apply Case 3, since n <30 and
unknown. 𝜎 is

To determine the confidence interval, we will use


R-Studio.
The t.test ( ) command used to find confidence intervals with levels of
confidence 95%.
The t.test ( ) command used to find confidence intervals with levels
of confidence 95%.
“We are 95% confident that the
population mean for the amount of
money spent on books is between
Php. 132.35 and Php. 173.35”.
Example 3
A simple random sample of size n = 40
is drawn from a population. The
sample mean is found to be 20.1, and
the sample standard deviation is found
to be 3.2. Construct and interpret a
90% confidence interval about the
population mean.
Solution:
Given:
The z – score for confidence level 90% in
the z – table is 1.645. Apply Case 2.
𝑛 = 40 𝑥 = 20.1 𝑠 = 3.2 “We are 90%
3.2 confident that the
20.1 ± 1.645 population
40 is between 19.27
mean
and 20.93”.
20.1 ± 0.8323 = (19.2677,20.9323)
Example 4
A corporation monitors time spent by office
workers browsing the web on their computers
instead of working. In a sample of computer
records of 15 workers, construct a 99%
confidence interval for the mean time spent
by selected office workers in browsing the
web in an eight-hour day.
Solution:
We will apply Case 3, since n <30
and 𝜎
is unknown.

To determine the confidence interval, we


will use R-Studio.
The t.test ( ) command can also be used to find
confidence intervals with levels of confidence
different from 95%. We can specify the desired
level of confidence using the conf.level command.
“We are 99.5% confident that the
population mean time spent by selected
office workers in browsing the web is
between 24.51 mns. and 41.22 mns.”.
Confidence Intervals About
a Population Proportion

The point estimate for the


proportion is population
𝑝̂ = 𝑥
𝑛
where x is the number of individuals in the
sample with the specified characteristic and
n is the sample size.
Confidence Intervals About
a Population Proportion
Suppose a simple random sample of size
n is taken from a population. A confidence
interval for p is given by the following
quantities:

̂ 𝑝(1 −
Note: 𝑝𝑝)± 𝑧𝛼Τ2
𝑛
𝑛 ≤ 0.05𝑁
It must be the case that 𝑛𝑝 (1 − 𝑝 ) ≥ 10
̂ ̂
to construct this interval.
and
Example 1
In a poll conducted by the Research Center for
the People and the Press, a simple random
sample of 1505 Filipino adults was asked
whether they were in favor of tighter enforcement
of government rules on TV content during hours
when children are most likely to be watching. Of
the 1,505 adults, 1,129 responded yes. Obtained
a 95% confidence interval for the proportion of
Filipinos who are in favor of tighter enforcement
of government rules on TV content during hours
when children are most likely to be watching.
Solution:
Given:
The z – score for confidence level 95% in the z –
table is 1.96. ̂ 𝑥 1,129
𝑝= = = 0.7502
𝑛 = 1,505 𝑥 = 𝑛 1,505
1,129
Check:
(1,505)(0.7502)(1 − 0.7502) ≥ 10
282.0369 ≥ 10
Solution:
0.7502(1 − 0.7502)
0.7502 ± 1.96
1,505
0.7502 ± 0.0219 = (0.7283,0.7721)
“We are 95% confident that the proportion
of Filipinos who are in favor of tighter
enforcement
of government rules on TV content during
hours when children are most likely to be
watching is between 0.73 and 0.77.
Example 2
Suppose a consumer advocacy group
would like to conduct a survey to find the
proportion of consumers who bought the
newest generation of an MP3 player were
happy with their purchase. The advocacy
group took a random sample of 1000
consumers who recently this
purchased
MP3 player and found that 400 were happy
with their purchase. Find a 90% confidence
interval for p.
Solution:
Given:
̂ 𝑥 400
𝑝= = = 0.40
𝑛 = 1000 𝑥 = 400 𝑛 1000
Check:
(1000)(0.40)(1 − 0.40) ≥ 10
240 ≥ 10
The prop.test(x,n,p) command can also be
used to find confidence intervals with levels
of confidence different from 95%. We can
specify the desired level of confidence
using the conf.level command.
“We are 90% confident that the population
proportion of consumers who bought the newest
generation of an MP3 player were happy with their
purchase is between 0.37 and 0.43.
Confidence Intervals About
a Population Variance
If a simple random sample of size n is
taken
from a normal population with mean and
1−𝛼 ×
then a
standard 𝜎 deviation
confidence interval about is, given by
100%

(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2

2 < 𝜎 2 < 2
𝜒 𝛼 Τ2 𝜒 1−𝛼/2
with n - 1 degrees of freedom.
Remember:
A confidence interval about the
population variance or standard
deviation not of the form “point
is
estimate margin of error” because the
sampling distribution of the sample
variance is not symmetric.
Example 1
A simple random sample of size n = 12
is drawn from a population that is
normally distributed. The sample
variance is found to be 𝑠 = 23.7 .
2

Construct a 90% confidence


interval about the population variance.
Solution:
Given:
𝑛 = 12 𝑠 2 = 23.7 𝐶𝐼 = 90%
𝛼 = 1 − 𝐶𝐼 = 1 − 0.90 = 0.10
𝑑𝑓 = 𝑛 − 1 = 12 − 1 = 11
2
𝜒 /
𝛼 =𝜒 2
= 𝜒 2
0.05,11 = 19.675
0.10Τ2,11
2,𝑑 𝑓
𝜒 2
1−𝛼 / = 𝜒1−0.10/2,11 = 𝜒0.95,11
2 2 = 4.575
2,𝑑 𝑓
Solution:
(𝑛 − 1)𝑠2 (𝑛 − 1)𝑠2
2 < 𝜎 2 < 2
𝜒 𝛼Τ2 𝜒1−𝛼/2
(12 − 1)23.7 (12 − 1)23.7
< 𝜎2 <
19.675 4.575

13.2503 < 𝜎 2 < 56.9836


“We are 90% confident that the population
variance is between 13.25 and 56.98.
Example 2
A jar of peanut is supposed to have 16 ounces of
peanuts. The filling machine inevitably experiences
fluctuations in filling, so a quality-control manager
randomly samples 12 jars of peanuts from the
storage facility and measures their contents. She
obtained the following data:

Determine the standard deviation and


construct asample
90% confidence interval for the
population standard deviation of the number
ounces of peanuts. of
Exercises
Exercises 1:
Jane wants to estimate the
proportion of students on her campus
who eat cauliflower. After
surveying
20 students, she finds 2 who eat
cauliflower.
95% Obtain and interpret
confidence for the a
interval
proportion of students who eat
cauliflower on Jane’s campus.
Exercises 2:
Alan wants to estimate the proportion
of adults who walk to work. In a
survey of 10 adults, he finds 1 who
walk to work. Obtain and interpret a
95% confidence interval for the
proportion of adults who walk to
work.
Exercises 3:
Suppose a sample of 30 Stats
students are given an IQ test. If the
sample has a standard deviation of
12.23 points, find a 90% confidence
interval for the population standard
deviation and interpret the result.
What is HYPOTHESIS
TESTING?

Defintion:
Hypothesistesting is a
procedure onsample evidence and
probability,
used to claims regardinga
test of one or more
characteristic
populations.
What is HYPOTHESIS?
Defintion:
A statement or claim regarding a
characteristic of one or more
populations.
A preconceived idea, assumed to
be true but has to be tested for
its truth or falsity.
Example of Hypothesis
✦ The mean body temperature for patients
admitted to elective surgery is not equal to
37.0oC.
✦ A consumer advocate would like to know if
the mean lifetime of a bulb is less than 500
hours.
✦ A real estate broker believes that because
of changes in interest rates, as well as other
economic factors, the mean price has
increased since then.
Procedures for Testing
Hypothesis
1. State the null and alternative hypothesis.

2. Set the level of significance or alpha level (𝛼).

3. Determine the test distribution to use.


4. Determine the critical region.
5. State the decision rule.
6. Calculate a test statistic.
1. State the Null and
Alternative Hypothesis

Two Types of Hypothesis


1. Null Hypothesis

2. Alternative Hypothesis
Null Hypothesis
• Denoted by 𝐻𝑜
• The statement being tested.
• Assumed true until evidence
indicates otherwise.
• Must contain the condition of equality and
must be written with the symbol =, ≤ , or ≥
Example:
✦ Students who eat and not eat breakfast will
perform the same on a math exam.
✦ Students who experience and not
experience test anxiety prior to an English
exam will get the same scores.
✦ Motorists who talk and not talk on the
phone while driving will get the same errors
on a driving course.
Alternative Hypothesis

Denoted by 𝐻𝑎
• Statement that must be true if the
null hypothesis is false.
• Sometimes referred to as the
research hypothesis.
• Must contain the condition of
equality and must be written with the
Example:
✦ Students who eat breakfast will perform better
on a math exam than students who do not eat
breakfast.
✦ Students who experience test anxiety prior to
an English exam will get higher scores than
students who do not experience test anxiety.
✦ Motorists who talk on the phone while driving
will be more likely to make errors on a driving
course than those who do not talk on the
phone.
Remember:
If you are conducting a research
study and you wantto use a
hypothesis test to support your
claim, the claim must be stated in
such a way that it becomes the
alternative hypothesis, so it cannot
contain the condition of equality.
Two Types of Alternative Test

1. One - tailed test


✦ Left tailed
✦ Right tailed
2. Two - tailed test
2. Set the Level of Significance
or Alpha Level

Defintion:
The level of significance, 𝛼 , is
the probability of making a type
I error.
Exercises 5:
Investors not only desire a high return on their
money, but they would also like the rate of return to
be stable from year to year. An investment
manager invests with the goal of reducing volatility
(year-to-year fluctuations in the rate of return). The
following data represent the rate of return (in
percent) for his mutual fund for the past 12 years.
13.8, 14.9, 10.0, 12.3, 11.2, 6.7,
9.8, 12.5, 10.4, 8.9, 15.9, 6.6
Construct a 95% confidence interval for the
population standard deviation of the rate of return.
Two Types of Error
Example:
𝐻𝑜 : The defendant is innocent.

𝐻𝑎 : The defendant is not


innocent.

What happen to the defendant if the


jury made type I and type II error?
Answer:
A type I error is like putting
an innocent person in jail.

A type II error is like letting a


guilty person go free.
Example:
Type I Error
BFAD allows the release of an
ineffective medicine.
Type II Error
BFAD does not allow the release of an
effective drug.
Remember:
It is important to note that we want to
set ( 𝛼 ) before we start our study
because the Type I error is the more
‘grevious’ error to make.
The smaller ( 𝛼) is, the smaller
the region of rejection.
3. Determine the Test
Distribution to Use

Determine the best statistical


test to be use, based on the
objective, and the assumptions
that are satisfied.
List of Common
Parametric
Test
1. One Sample z - Test
2. One Sample t - Test
3. One Sample Proportion Test
4. Independent Sample z - Test
5. Independent Sample t - Test
6. Two Sample Proportion Test
List of Common
Parametric Test
7. Paired Sample t - Test
8. Analysis of Variance (ANOVA) Test
9.Tukey Test (Post Hoc Analysis
of ANOVA)
10.Two Way Analysis of Variance
11.Pearson Product Moment Correlation
12.Regression Analysis
4. Determine the Critical Region
Defintion:
Rejection of region or critical
region is the set of all values of
the test statistic which will lead to
the rejection of .
Acceptance Region is the set of
all values of the test statistic that
leads the researcher to retain .
5. State the Decision
Rules

✦ Using confidence
interval
✦ Using p-value approach
✦ Using traditional method
Using Confidence Interval

Decision Rule:
Reject the null hypothesis if the test
statistic is not within the range
specified by the confidence interval.
Using P - Value
Approach
Decision Rule:
Reject the null hypothesis if the
computed p-value is less than or equal
to the set significance level , otherwise
do not reject the null hypothesis.
Example:
If the level of significance 𝛼 = 0.05 ,
P-value Decision
0.01 Reject
0.05 Reject
0.10 Failed to
reject
Using Traditional Method

Decision Rule:

Reject Ho if the computed value


of the test statistic falls in the
region of rejection.
6. Calculate Test Statistic

Once you determine the appropriate


statistical test to be used on step no.
3, calculate the test statistic. The
value computed using different
statistical test is used to compare to
the critical value.
Defintion:
Test statistic - a statistic
computed from the sample data
that is especially sensitive to the
differences between 𝐻𝑜 and 𝐻𝑎 .
7. Make Statistical Decision

✦ Fail to reject the null


hypothesis/Retain the
null hypothesis.
✦ Reject the null
hypothesis.
Remember:
It is important to recognize that
we never accept the
null hypothesis.
We are merely saying that the
sample evidence is notstrong
enough to warrant rejection of
the null hypothesis.
Normal Distribution
This graph is called the normal curve, which
is bell-shaped curve and which
approximately describes many phenomena
that occur in nature, industry, and research.

Normal Curve
Properties of a Normal Curve

1.The normal curve is bell-shaped and


symmetric about the mean.
2. The mean, median and mode are equal.
3.The total area under the curve is equal to
one.
4.The normal curve approaches, but never
touches the x-axis as it extends farther and
farther away from the mean.
Testing Normality of the Data

To determine if the data is follows a


normality distribution, we can use the
graphical or numerical method.
Graphical:
Histogram and Normal Q-Q Plot
Numerical:
Kolmogorov Smirnov Test
Lilliefors
Anderson - Darling Test
Shapiro Wilk Test
How to Check Normality?
Histogram plots the observed values against their
frequency, states a visual estimation whether
the
distribution is bell shaped or not.
How to Check Normality?
Q-Q probability plots display the observed values
against normally distributed data (represented by
the line).
Remember:
Graphical methods are typically not
very useful when the sample size is
small.
Common Statistical Test for Normality

Distribution
Test Method Statistic n Range
Based

Kolmogorov-
D n≥3 EDF
Smirnov

Lilliefors L n 4 EDF

Anderson-Darling A-square n 8 EDF

Shapiro-Wilk W 3 n <= 5000 -


Common Statistical Test for
Normality
Kolmogorov Smirnov Test
It was first derived by Kolmogorov (1933) and later
modified and proposed as a test by Smirnov (1948).
The test is non-parametric and entirely agnostic to what
this distribution actually is.

This test has been shown to be less powerful than the


other tests in most situations. It is included only
because of its historical popularity. Some published
articles would say “The Kolmogorov-Smirnov test is
only a historical curiosity. It should never be used."

Tie scores should not be present in the data.


Common Statistical Test for
Normality
Lilliefors Test
Adaptation of the Kolmogorov - Smirnov Test
for the case when the mean and variance of the
normal distribution is unknown.

It is also use as correction for Kolmogorov -


Smirnov Test since the parameters of 𝐶𝐷𝐹 are
estimated from the sample, the test becomes
conservative and loses power.
Common Statistical Test for
Normality
Anderson - Darling Test
It is a modified Kolmogorov-Smirnov test, but
more weight to the tails of the distribution is
given.
This test, developed by Anderson and Darling
(1954), is a popular among those tests that
are based on EDF statistics.
Common Statistical Test for
Normality
Shapiro - Wilk Test

One of the most popular tests for normality


assumption diagnostics which has good
properties of power and it based on correlation
within given observations and associated
normal scores.
The Shapiro-Wilk test statistic is derived by
Shapiro and Wilk (1965).
Doesn’t work well if several values in the data
set are the same/tie scores occur in the data.
Hypotheses of Normality Test

Ho: The sample data follows a


normal distribution.
Ha: The sample data does not follow a normal
distribution.

When we are testing normality:


• If P value > alpha, it means that the data are
normal.

•If P value ≤ alpha, it means that the data are


NOT normal.
Example:
Construct a graphical and numerical
method in testing the normality of these
data. Diameters of 36 rivet heads in
1/100 of an inch:
Normal Q - Q Plot
To construct normal Q - Q
plot use the command:
qqnorm(x)
qqline(x)
“x” is a numeric vector of
data values
Histogram
To construct Histogram use
the command:
hist(x,probability=TRUE,col=“choose your
color”) lines(density(x),col=“choose your color”)

“x” is a numeric vector of data values


There is a warning message because some of the data points are the same.
There is a warning message because some of the data points are the
same.
Summary of Result
Test Method P - value Decision Remarks

Kolmogorov-
< 0.000 Reject Ho Not Normal
Smirnov

Failed to
Lilliefors 0.0571 Normal
Reject Ho

Failed to
Anderson-Darling 0.2178 Normal
Reject Ho

You might also like