Sampling and Estimation
Sampling and Estimation
Sampling and Estimation
10
SAMPLING AND ESTIMATION
Disclaimer: Certain materials contained within this text are the copyright property of CFA
Institute. The following is the source for these materials: “CFA® Program Curriculum
Level I Volume 1”
READING NO. 10
SAMPLING AND ESTIMATION
LOS 10.a: Define simple random sampling and a sampling distribution.
LOS 10.c: Distinguish between simple random and stratified random sampling.
Simple random sampling is a method of selecting a sample in such a way that each item or person in the
population being studied has the same probability of being included in the sample.
Stratified random sampling involves randomly selecting samples proportionally from subgroups that are formed
based on one or more distinguishing characteristics, so that the sample will have the same distribution of these
characteristics as the overall population.
2
• [Example: Stratified Random Sampling] In a college, there are a total of 350 students, bifurcation given below. Construct a stratified
random sample of 40 students
150 males studying full-time, 50 males studying part-time, 125 females studying full-time, 25 females studying part-time.
The first step is to calculate the percentage of students in each group (stratum).
• male, full-time = (150/350) × 100 = 42.86%
• male, part-time = (50/350) × 100 = 14.29%
• female, full-time = (125/350) × 100 = 35.71%
• female, part-time = (25/350) × 100 = 7.14%
Therefore, in our sample of 40, we will choose:
• 43% of 40 = 17.2 ≈ 17 male full-time students.
• 14% of 40 = 5.6 ≈ 6 male part-time student.
• 36% of 40 = 14.4 ≈ 14 female-full time student.
• 7% of 40 = 2.8 ≈ 3 female part-time students.
Remember that within each stratum, observations are selected randomly.
3
The sampling error of the mean = Sample mean−Population mean
4
LOS 10.d: Distinguish between time-series and cross-sectional data.
Time series data
Cross sectional data
Longitudinal data are observations over time of multiple characteristics of the same
entity, such as unemployment, inflation, and GDP growth rates for a country over 10
years.
Panel data contain observations over time of the same characteristic for multiple entities,
such as debt/equity ratios for 20 companies over the most recent 24 quarters.
5
LOS 10.e: Explain the central limit theorem and its importance.
LOS 10.f: Calculate and interpret the standard error of the sample mean.
A sampling distribution is the distribution of all sample values that a sample statistic can take on
when computed from samples of identical size randomly drawn from the same population.
Suppose that a random sample of 50 stocks is selected from a population of 10,000 stocks, and the
average return on the 50-stock sample is calculated. If this process were repeated several times, say 10
times, with samples of the same size (50), the sample mean (estimate of the population mean) calculated
will be different each time due to the different individual stocks making up each sample. The distribution
of these sample means is called the sampling distribution of the mean.
The central limit theorem states that for a population with a mean μ and a finite variance , the
sampling distribution of the sample mean of all possible samples of size n (for n ≥ 30) will be
approximately normally distributed with a mean equal to μ and a standard deviation equal to (a.k.a.
standard error)
6
LOS 10.h: Distinguish between a point estimate and a confidence interval estimate of a population
parameter.
LOS 10.i: Describe properties of Student’s t-distribution and calculate and interpret its degrees of freedom.
LOS 10.j: Calculate and interpret a confidence interval for a population mean, given a normal distribution
with 1) a known population variance, 2) an unknown population variance, or 3) an unknown population
variance and a large sample size.
Student's t-distribution is a bell-shaped probability distribution that has the following properties:
It is symmetrical.
It is defined by a single parameter, the degrees of freedom (df), where degrees of freedom equal sample size
minus one (n − 1).
It has a lower peak than the normal curve, but fatter tails.
As the degrees of freedom increase, the shape of the t-distribution approaches the shape of the standard normal
curve.
7
8
Confidence interval = point estimate (reliability factor x standard error/standard deviation)
9
10
A population has a non-normal distribution with mean µ and variance . The sampling
distribution of the sample mean computed from samples of large size from that population
will have:
A. the same distribution as the population distribution.
B. its mean approximately equal to the population mean.
C. its variance approximately equal to the population variance.
B is correct. Given a population described by any
probability distribution (normal or non-normal) with
finite variance, the central limit theorem states that
the sampling distribution of the sample mean will be
approximately normal, with the mean approximately
equal to the population mean, when the sample size
is large
11
A sample mean is computed from a population with a variance of 2.45. The sample size is
40. The standard error of the sample mean is closest to:
A. 0.039.
B. 0.247.
C. 0.387.
B is correct.
12
An estimator with an expected value equal to the parameter that it is intended to estimate is
described as:
A. efficient.
B. unbiased.
C. consistent.
14
For a two-sided confidence interval, an increase in the degree of confidence will result in:
A. a wider confidence interval.
B. a narrower confidence interval.
C. no change in the width of the confidence interval.
15
As the t-distribution’s degrees of freedom decrease, the t-distribution most likely:
A. exhibits tails that become fatter.
B. approaches a standard normal distribution.
C. becomes asymmetrically distributed around its mean value.
16
For a sample size of 17, with a mean of 116.23 and a variance of 245.55, the width of a
90% confidence interval using the appropriate t-distribution is closest to:
A. 13.23.
B. 13.27.
C. 13.68.
B is correct.
17
For a sample size of 65 with a mean of 31 taken from a normally distributed population
with a variance of 529, a 99% confidence interval for the population mean will have a lower
limit closest to:
A. 23.64.
B. 25.41.
C. 30.09.
A is correct.
18
An increase in sample size is most likely to result in a:
A. wider confidence interval.
B. decrease in the standard error of the sample mean.
C. lower likelihood of sampling from more than one population.
19
LOS 10.k: Describe the issues regarding selection of the appropriate sample size, data-
mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.
Sample selection bias: selection is non-random
Survivorship bias: using only surviving mutual funds, hedge funds, etc.
Look-ahead bias: data not available at that time
Time period bias: the relation does not hold over other time periods.
Data-mining: significant relationships that have occurred by chance
Test a potentially profitable trading rule on a data set different from the one you used to
develop the rule (i.e., use of out-of-sample data)
20