VIII - Estimation

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 60

Estimation

Estimation
• Methods of inference usually fall into one of two
broad categories: estimation or hypothesis testing.
• For now, we will focus on using the observations in a
sample to estimate a population parameter.
Cont’d…
• Up until this point, we have assumed that the values of
the parameters of a probability distribution are known.
• In the real world, the values of these population
parameters are usually not known
• Instead, we will try to say something about the way in
which a random variable is distributed using the
information contained in a sample of observations.
Estimation
• Is concerned with estimating the values of
specific population parameters based on
sample statistics.
• is about using information in a sample to
make estimates of the characteristics
(parameters) of the source population.
Example
• A sample survey revealed:
• Proportion of smokers among a certain group of
population aged 15 to 24.
• Mean of SBP among sampled population
• Prevalence of HIV-positive among people
involved in the study

The next question is what can we predict about the


characteristics of the population from which the
sample was drawn .
Estimation, Estimator & Estimate
 Estimation is the computation of a statistic from sample data,
often yielding a value that is an approximation (guess) of its
target, an unknown true population parameter value.
 The statistic itself is called an estimator and can be of two
types - point or interval.
 The value that the estimator assumes are called estimates.
cont’d…
• Two methods of estimation are commonly used:
point estimation and interval estimation
• Point estimation involves the calculation of a
single number to estimate the population
parameter.
• Interval estimation specifies a range of
reasonable values for the parameter.
Point versus Interval Estimators
 An estimator that represents a "single best guess" is
called a point estimator.
 When the estimate is of the form of a "range of
values", it is called an interval estimator.
 Thus,
• A point estimate is of the form: [ Value ],
• Whereas, an interval estimate is of the form:
• [ lower limit, upper limit ]
1. Point Estimate
• A single numerical value used to estimate the
corresponding population parameter.
Sample Statistics of Population Parameters
µ

2

P
2. Interval Estimation
•Interval estimation specifies a range of
reasonable values for the population
parameter based on a point estimate.
•A confidence interval is a particular
type of interval estimator.
Confidence Intervals
• Is a measure of precision of a sample
statistics.
• Gives a range of values of the estimate
likely to include the “true” (population)
value with a given confidence level.
• Such interval estimates are called
confidence intervals.
General Formula:
• 
Standard error
• 
General Formula:
The general formula for all CIs is:
The value of the statistic in my
sample (eg., mean, proportion, etc.)

point estimate  [measure of how confident we want to be (k)]  (SE)

From a Z table or a T table, e.g. at Standard error of the


95% confidence level = 1.96.
statistic.
• A confidence interval is a guess (point estimate) together with a
(interval) of guesses of a population characteristic.
• It has 3 components:
1) A point estimate (e.g. the sample mean)
2) The standard error of the point estimate ( e.g. SEM =σ/√ n )
3) A confidence level
The “safety net” (confidence interval) that we construct has
“lower” and “upper” limits, defined as:
• Lower limit = (point estimate) – (confidence level)(SE)
• Upper limit = (point estimate) + (confidence level)(SE)
Confidence Level
• Confidence Level
• Confidence in which the interval will contain
the unknown population parameter
• A percentage (less than 100%)
• Example: 95%
• Also written (1 - α) = 0.95
Probabilistic Interpretation
I. If we were to select 100 random samples
from the population and use these samples to
calculate 100 different C.I., approximately 95
of the intervals would include the true
population parameter and 5 would not.
Practical interpretation
• We are 95% confident that the single
computed interval contains the unknown
population parameter.
Commonly Used confidence levels
Width of Confidence Interval
• For a given Confidence level (i.e. 90%, 95%, 99%) the width
of the CI depends on the standard error of the estimate.
• SE in turn depends on:
1) Sample size: the larger the sample size the narrower the
confidence interval and the more precise our estimate.
2) The amount of variation among individual values – the
more the variation the wider the CI and the less precise the
estimate
Calculating Confidence Intervals (CI)
• Given different scenarios you should be able to
calculate CI for:
1) Single mean
2) Difference between two means
3) Single proportions
4) Difference between two proportions
CI for Single Mean
• Consider the task of computing a CI estimate of μ
for a population distribution that is normal with σ
known.
• Available are data from a random sample of size = n.
Example:
1. Waiting times (in hours) at a particular hospital
are believed to be approximately normally
distributed with a variance of 2.25 hr.
a. A sample of 20 outpatients revealed a mean
waiting time of 1.52 hours. Construct the 95% CI
for the estimate of the population mean.
• 
B. Now Suppose that the mean of 1.52 hours
had resulted from a sample of 32 patients. Find
the 95% CI.
C. What effect does larger sample size have on
the CI?
• 
CI for single mean…
• When constructing CIs, it has been assumed that
the standard deviation of the underlying
population,  , is known
• In this case, the SE of the population can be
replaced by the SE of the sample if the sample
size is large enough (n>30)-(Central limit
theorem). With large sample size, we assume a
normal distribution.
Assumptions
• Population is normally distributed or the sample
size is fairly large.
• The population variance is known
CI for the difference between population
means (normally distributed)
A. Known variances (2 independent samples)
• When 1 and 2 are known and both populations are
normal or both with large sample sizes (are at least
30), the test statistic is a z-value…
Assumptions
• Samples are randomly and independently drawn
• Population distributions are normal or both
sample sizes are ≥30
• Population standard deviations are known
Illustration
• A researcher performs a drug trial involving two
independent groups.
• A control group is treated with a placebo while,
separately;
• The intervention group is treated with an active
agent.
• Interest is in a comparison of the mean control
response with the mean intervention response
under the assumption that the responses are
independent.
Examples
• We are interested in the similarity of the
two groups.
1) Is mean blood pressure the same for males and
females?
2) Is body mass index (BMI) similar for breast
cancer cases versus non-cancer patients?
3) Is length of stay (LOS) for patients in hospital
“A” the same as that for similar patients in
hospital “B”?
Example
• Researchers are interested in the difference between
serum uric acid levels in patients with and without
Down’s syndrome.
• Patients without Down’s syndrome
• n=12, sample mean=4.5 mg/100ml, 2=1.0
• Patients with Down’s syndrome
• n=15, sample mean=3.4 mg/100ml, 2=1.5
• Calculate the 95% CI.
Example…
• SE = 0.43,
• 95% CI = 1.1 ± 1.96 (0.43)
= (0.26, 1.94)
• WE are 95% confident that the true
difference between the two population
means is between 0.26 and 1.94.
CIs for single population proportion, p

Is based on three elements of CI.


• Point estimate
• SE of point estimate
• Confidence coefficient
Lower limit = Point Estimate - (Critical Value) x (Standard Error of Estimate)

Upper limit = Point Estimate + (Critical Value) x (Standard Error of Estimate)

Hence,

is an approximate 95% CI for the true proportion


p.
Example 1
•A random sample of 100 people
showed that 25 are left-handed.
Compute a 95% CI for the true
proportion of left-handers.
solution
• 
Interpretation
• We are 95% confident that the true percentage of left-
handers in the population is between 16.5% and
33.5%.
Changing the sample size
• Increasing the sample size reduces the width of the
confidence interval.
• Example:
 If the sample size is doubled in the above example to
200, and 50 are left-handed in the sample, the
proportion is still 0.25. However the width of the CI
will be narrower, hence more precise.
 The 95% CI will be (0.19, 0.31 )
Example 2
• It was found that 28.1% of 153 cervical-cancer cases had
never had a Pap smear prior to the time of case’s
diagnosis. Calculate a 95% CI for the percentage of
cervical-cancer cases who never had a Pap test.


Example 3
• Suppose that among 10,000 female operating-room
nurses, 60 women have developed breast cancer over five
years. Find the 95% for p based on point estimate.
• Point estimate = 60/10,000 = 0.006
• The 95% CI for p is given by the interval:

• The 95% CI for p is: (0.0045, 0.0075)


CIs for Two Population Proportions
• We are often interested in comparing
proportions from 2 populations:
• Is the incidence of disease A the same
in two populations?
• Patients are treated with either drug D,
or with placebo. Is the proportion
“improved” or is it the same in both
groups?
Confidence Interval for
Two Population Proportions
• SE of the difference =

• The confidence interval for p1 – p2 is:


• An approximate 95% confidence interval can also take
the form
Example
• In a clinical trial for a new drug to treat hypertension,
N1 = 50 patients were randomly assigned to receive
the new drug, and N2 = 50 patients to receive a
placebo. 34 of the patients receiving the drug showed
improvement, while 15 of those receiving placebo
showed improvement.

• Compute a 95% CI estimate for the difference between


proportions improved.
• p1 = 34/50 = 0.68, p2 = 15/50 = 0.30
• The point estimate for the difference is:
p1-p2 = [0.68−0.30]=0.38

• SE of the difference =

• 95% CI
• Lower = ( point estimate ) - (Zα/2) (SE)
= 0.38 – (1.96)(0.0925) = 0.20
• Upper = ( point estimate ) + (Zα/2) (SE)
= 0.38 + (1.96)(0.0925) = 0.56
• 95% CI = (0.20, 0.56)
Sample size estimation for cross-
sectional studies
• We must decide how many people need to be studied
in order to answer a particular research question.
• If the sample size is too small – then we may fail to
detect important effects or it’ll be less precise.
• If the study is large – wastage of resources.
Single population proportion
• Research questions such as;
1) What proportion of people adhere to anti-TB treatment in
City A?
2) What is the prevalence of HIV in Ethiopia?
3) What proportion of children under the age of 15 are
vaccinated against polio?
lead to estimation of a proportion. To answer such
questions and our estimate to have an acceptable level
of precision, the sample size has to be calculated.
Steps
• 
Formula
• 
Exercise
• We wish to estimate the proportion of Ethiopian males
who smoke. What sample size do we require to
achieve a 95% confidence interval of width ± 5%? A
study some years ago found that 20% were smokers.
NB:
The initial sample size approached in the study
may need to be increased in accordance with the
expected response rate, loss to follow up, lack of
compliance, and any other predicted reasons for
loss of subjects.
Summary
• Point and Interval estimation
• Confidence Interval
• C.I. for
 single mean
 Difference between two means
 Single proportions
 Difference between 2 proportions
Thank you

You might also like