Sampling and Estimation

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

READING NO.

10
SAMPLING AND ESTIMATION

Disclaimer: Certain materials contained within this text are the copyright property of CFA
Institute. The following is the source for these materials: “CFA® Program Curriculum
Level I Volume 1”
READING NO. 10
SAMPLING AND ESTIMATION
LOS 10.a: Define simple random sampling and a sampling distribution.

LOS 10.b: Explain sampling error.

LOS 10.c: Distinguish between simple random and stratified random sampling.

 Simple random sampling is a method of selecting a sample in such a way that each item or person in the
population being studied has the same probability of being included in the sample.

 Stratified random sampling involves randomly selecting samples proportionally from subgroups that are formed
based on one or more distinguishing characteristics, so that the sample will have the same distribution of these
characteristics as the overall population.

2
• [Example: Stratified Random Sampling] In a college, there are a total of 350 students, bifurcation given below. Construct a stratified
random sample of 40 students
150 males studying full-time, 50 males studying part-time, 125 females studying full-time, 25 females studying part-time.
The first step is to calculate the percentage of students in each group (stratum).
• male, full-time = (150/350) × 100 = 42.86%
• male, part-time = (50/350) × 100 = 14.29%
• female, full-time = (125/350) × 100 = 35.71%
• female, part-time = (25/350) × 100 = 7.14%
Therefore, in our sample of 40, we will choose:
• 43% of 40 = 17.2 ≈ 17 male full-time students.
• 14% of 40 = 5.6 ≈ 6 male part-time student.
• 36% of 40 = 14.4 ≈ 14 female-full time student.
• 7% of 40 = 2.8 ≈ 3 female part-time students.
Remember that within each stratum, observations are selected randomly.

3
The sampling error of the mean = Sample mean−Population mean

LOS 10.g: Identify and describe desirable properties of an estimator.


Unbiasedness: expected value is equal to the parameter being measured.
Efficiency: has the lowest variance as compared to other unbiased estimators of the same
population.
Consistency: as sample size increase, the sampling error keeps on declining.

4
LOS 10.d: Distinguish between time-series and cross-sectional data.
Time series data
Cross sectional data

Longitudinal data are observations over time of multiple characteristics of the same
entity, such as unemployment, inflation, and GDP growth rates for a country over 10
years.
Panel data contain observations over time of the same characteristic for multiple entities,
such as debt/equity ratios for 20 companies over the most recent 24 quarters.

5
 LOS 10.e: Explain the central limit theorem and its importance.

LOS 10.f: Calculate and interpret the standard error of the sample mean.
A sampling distribution is the distribution of all sample values that a sample statistic can take on
when computed from samples of identical size randomly drawn from the same population.
 Suppose that a random sample of 50 stocks is selected from a population of 10,000 stocks, and the
average return on the 50-stock sample is calculated. If this process were repeated several times, say 10
times, with samples of the same size (50), the sample mean (estimate of the population mean) calculated
will be different each time due to the different individual stocks making up each sample. The distribution
of these sample means is called the sampling distribution of the mean.

The central limit theorem states that for a population with a mean μ and a finite variance , the
sampling distribution of the sample mean of all possible samples of size n (for n ≥ 30) will be
approximately normally distributed with a mean equal to μ and a standard deviation equal to (a.k.a.
standard error)

6
LOS 10.h: Distinguish between a point estimate and a confidence interval estimate of a population
parameter.
LOS 10.i: Describe properties of Student’s t-distribution and calculate and interpret its degrees of freedom.
LOS 10.j: Calculate and interpret a confidence interval for a population mean, given a normal distribution
with 1) a known population variance, 2) an unknown population variance, or 3) an unknown population
variance and a large sample size.
Student's t-distribution is a bell-shaped probability distribution that has the following properties:
 It is symmetrical.
 It is defined by a single parameter, the degrees of freedom (df), where degrees of freedom equal sample size
minus one (n − 1).
 It has a lower peak than the normal curve, but fatter tails.
 As the degrees of freedom increase, the shape of the t-distribution approaches the shape of the standard normal
curve.

7
8
 Confidence interval = point estimate (reliability factor x standard error/standard deviation)

Confidence interval = mean (z value or t value x standard error or standard deviation)


• [Example: Confidence Intervals with the z-Distribution] 36 students are taking a mock SAT
exam to evaluate their level of preparedness for the actual test. The average score of these
students is 1750. The standard deviation of scores of all students (population) who take the
actual test is 200 points. Construct and interpret a 99% confidence interval for the average
score of all students who take the SAT given the average score of these 36 students.
• [Example: Confidence Intervals with the t-Distribution] A sample of the monthly returns of T
Ltd. stock over the last two and a half years has a mean return of 3% and a standard deviation
of 15%. Compute the 95% confidence interval for the average monthly returns on T stock.

9
10
 A population has a non-normal distribution with mean µ and variance . The sampling
distribution of the sample mean computed from samples of large size from that population
will have:
A. the same distribution as the population distribution.
B. its mean approximately equal to the population mean.
C. its variance approximately equal to the population variance.
B is correct. Given a population described by any
probability distribution (normal or non-normal) with
finite variance, the central limit theorem states that
the sampling distribution of the sample mean will be
approximately normal, with the mean approximately
equal to the population mean, when the sample size
is large

11
A sample mean is computed from a population with a variance of 2.45. The sample size is
40. The standard error of the sample mean is closest to:
A. 0.039.
B. 0.247.
C. 0.387.

B is correct.

12
An estimator with an expected value equal to the parameter that it is intended to estimate is
described as:
A. efficient.
B. unbiased.
C. consistent.

B is correct. An unbiased estimator is one for


which the expected value equals the parameter
it is intended to estimate.
13
If an estimator is consistent, an increase in sample size will increase the:
A. accuracy of estimates.
B. efficiency of the estimator.
C. unbiasedness of the estimator.

A is correct. A consistent estimator is one for which the probability


of estimates close to the value of the population parameter increases
as sample size increases. More specifically, a consistent estimator’s
sampling distribution becomes concentrated on the value of the
parameter it is intended to estimate as the sample size approaches
infinity.

14
For a two-sided confidence interval, an increase in the degree of confidence will result in:
A. a wider confidence interval.
B. a narrower confidence interval.
C. no change in the width of the confidence interval.

A is correct. As the degree of confidence increases (e.g., from 95% to


99%), a given confidence interval will become wider. A confidence
interval is a range for which one can assert with a given probability 1
– α, called the degree of confidence, that it will contain the parameter
it is intended to estimate.

15
As the t-distribution’s degrees of freedom decrease, the t-distribution most likely:
A. exhibits tails that become fatter.
B. approaches a standard normal distribution.
C. becomes asymmetrically distributed around its mean value.

A is correct. A standard normal distribution has tails that approach


zero faster than the t-distribution. As degrees of freedom increase,
the tails of the t-distribution become less fat and the t-distribution
begins to look more like a standard normal distribution. But as
degrees of freedom decrease, the tails of the t-distribution become
fatter.

16
For a sample size of 17, with a mean of 116.23 and a variance of 245.55, the width of a
90% confidence interval using the appropriate t-distribution is closest to:
A. 13.23.
B. 13.27.
C. 13.68.

B is correct.

17
For a sample size of 65 with a mean of 31 taken from a normally distributed population
with a variance of 529, a 99% confidence interval for the population mean will have a lower
limit closest to:
A. 23.64.
B. 25.41.
C. 30.09.

A is correct.

18
An increase in sample size is most likely to result in a:
A. wider confidence interval.
B. decrease in the standard error of the sample mean.
C. lower likelihood of sampling from more than one population.

B is correct. All else being equal, as the sample size


increases, the standard error of the sample mean
decreases and the width of the confidence interval
also decreases.

19
LOS 10.k: Describe the issues regarding selection of the appropriate sample size, data-
mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.
Sample selection bias: selection is non-random
Survivorship bias: using only surviving mutual funds, hedge funds, etc.
Look-ahead bias: data not available at that time
Time period bias: the relation does not hold over other time periods.
Data-mining: significant relationships that have occurred by chance
 Test a potentially profitable trading rule on a data set different from the one you used to
develop the rule (i.e., use of out-of-sample data)

20

You might also like