0% found this document useful (0 votes)
11 views

Tutorial 7 - Questions

This document provides an overview of key statistical concepts like the central limit theorem and standardization of distributions. It also discusses how random number generation works in computers using pseudorandom number generation algorithms and how setting a seed value fixes the random sequence. Examples are provided of using R commands like set.seed and for loops to replicate random samples for different distributions and calculations.

Uploaded by

Gowshika Sekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Tutorial 7 - Questions

This document provides an overview of key statistical concepts like the central limit theorem and standardization of distributions. It also discusses how random number generation works in computers using pseudorandom number generation algorithms and how setting a seed value fixes the random sequence. Examples are provided of using R commands like set.seed and for loops to replicate random samples for different distributions and calculations.

Uploaded by

Gowshika Sekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Nanyang Business School

AB1202 – STATISTICS AND ANALYSIS

Tutorial :7
Topics : Central Limit Theorem

1. CLT builds the bridge for any distribution to converge to a normal distribution (in their
sample means). In this exercise, we examine how different normal distributions can be
standardized.

(𝑋 − 1)/√0.25

Figure 1 (a) left: A normal distribution with mean=0 and variance 1 (N(0,1), the standard normal)
(b) right: A normal distribution with mean=1 and variance 0.25 (N(1,0.25))

In Week 4, Q.11, we learn about converting an arbitrary normal distribution to a normal


distribution with mean=0 and variance=1---the standard normal (Figure 1(a)).
Consider the normal distribution in Figure 1(b). Call it “ 𝑋 .” We know the random
variable,(𝑋 − 1)/√0.25,1 will have the distribution of a standard normal, which we call
𝑍. The standard normal variable, also known as the Z-score, is an often-used RV in statistics.

The fact that these two random variables have the same distribution means specifically
("#$)
that Pr / ≤ 21 = Pr (𝑍 ≤ 2). The equality between the two probabilities holds for
√'.)*

any other number than 2 as well.2

As such, we can use this conversion to calculate the probability distribution of 𝑋, based
on the probability distribution of 𝑍 (and vice versa).
(1) Use pnorm() in R to calculate Pr (𝑋 ≤ 1.5).
(2) What is the Z-score corresponding to 𝑋 = 1.5?
(3) Use R to calculate Pr (𝑍 ≤ 𝑍 − 𝑠𝑐𝑜𝑟𝑒). Compare this answer with the answer of (1).

2. The manager of one coffee store Anne wants to determine the amount of money

1
Please review W4.Q11 if necessary
("#$)
2
We used the inequality ≤ 2 or 𝑍 ≤ 2 because they are continuous RVs—this again is
√'.)*

how probability is defined when the RV is continuous


1
Nanyang Business School

customers spend on iced coffee. Anne is aware that customers spent an average of $4.18
on iced coffee with a standard deviation of $0.84 based on a sample of 50 customers.
While not knowing the exact distribution of customer spending, she believes from her
work experience that $4.18 and $0.84 are close to the mean and standard deviation of the
customer population. Can you find a way to help her find out approximately how likely the
average customer spending on iced coffee will be $4.26 or more?

3. Despite its nutritional value, seafood is a tiny part of most people’s diet. David and Nina
both work in the seafood industry and they decide to create their own random samples
and document the average seafood diet in their sample.
(1) David samples 42 Singaporeans and finds an average seafood consumption of 18 kg
and sample standard deviation of 7 kg per year. Given David’s firm belief that the two
statistics must be very close to the population values, can you find a way to help him
calculate approximately how likely the average is 16 kgs or more?
(2) Nina samples 90 Indonesians and finds an average seafood consumption of 17.5 kg
and sample standard deviation of 7 kg per year. Given Nina’s firm belief that the two
statistics must be very close to the population values, can you find a way to help her
calculate approximately how likely is it to get an average of 16 kgs or less?

4. R commands such as rbinom, rnorm(), generate samples from the underlying distribution.
For example, rbinom(2,5,.5) generates 2 samples from a binominal distribution that
involves 5 trials and success probability of 0.5. The output from this command is
equivalent of flipping a fair coin 5 times and observe # of heads (and you do the same
experiment twice). Experiments like this is simple for us to do. But, how do computers do
this? Computers cannot flip coins or throw dice. How does a computer generate a
“randomized” result?

In fact, computers, like robots, can only follow a prescribed set of instructions. Computer
scientists and mathematicians alike must define the steps to generate random numbers,3
which can be subsequently used to generate the values for random variables. Fortunately,
scientists have developed formulas to calculate different lists of numbers that are
seemingly random, which are called pseudo-random numbers. These pseudo-random
numbers are good enough for our practice, as a proxy of random samples drawn from a
specific distribution.

There exist different ways to generate such random numbers, and also ways to convert
them into any “random variables” available in software such as R (e.g., those commands
that start with “r,” such as rnorm, runif, rbinom, etc.). To show you how it works, below is
an example, the Mid-Square method, which was developed by a famous mathematician
John Von Neumann in the 40’s.

3
A random number is a discrete uniform distribution over a fixed interval. Result from a toss of dice
will give a random number from 1 to 6.
2
Nanyang Business School

You can view the method as a formula that generate a long sequence of numbers by
substitutions. The starting number is what we call the “seed” of the sequence. As a
simplified illustration, we pick an arbitrary 4-digit seed value (e.g., 4687) and then square
it (4687^2=21967969). Then, we extract the middle four digits (9679) and use it as the
second seed and then square it again (9679^2=93683041). With more of these iterations,
we can obtain a sequence of 4-digit pseudo random numbers 9679, 6830,…

a) If we set the initial seed as 7788, what will be the three pseudo-random numbers
by mid-square method?

Observe that the above algorithm can generate a sequence of pseudo random numbers
up to 9999. They are random numbers because we can’t predict (without the knowledge
about the seed value you use) the next number in the sequence.

Before proceeding, it is instructive to pause here and discuss the significance of the seed
values. A random number, by the definition of random, should be entirely unpredictable.
In normal use, R will choose the initial seed value arbitrarily. Thus, without knowing the
seed value, the random numbers that will be generated will indeed appear random to us,
the users.
However, once we deliberately set a starting seed value, the sequence of random numbers
(and the random variables) to be generated are fixed and fully predictable.

For example, suppose we agree that the walking path of a drunk man is random. Since
you’re sober, you cannot randomize your steps that fully mimic how a drunken man would
walk. What should you do if you, for some reason, need to analyze that? An easier way is
to use a hidden camera to film how different drunk men walk. Since their walking path
have been recorded on file, you are able to predict how a particular drunk man will walk.
Yet, you can use this footage for your research, as if the steps are random. Different drunk
men would represent different seed values, the recorded image would be the sequence
of pseudo-random numbers.

For our purpose, I’ll be able to set standard answers for our exercises; otherwise, if the
answers depend on random samples, each of us will have different answers for the same
question.

5. Although R does not use the mid-square algorithm, the algorithm in R or other software
packages works in a similar way. In R, we can use the set.seed command to set the seed
value:
>set.seed(100)
>x=rnorm(5,0,1)
>x
-0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127

Now, if you run rnorm(5,0,1) (or rnorm(1,0,1) 5 times) after set.seed(100), you will obtain

3
Nanyang Business School

the same five values for this normal distribution. Without specifying the set.seed value
upfront, the rnorm command would generate different values each time you use it (as it is
supposed to).

a) Replicate 6 values of a random sample for continuous uniform distribution U[0,1] in R,


by setting the seed number = 200.
b) Replicate 3 values of a random sample for continuous uniform distribution U[10,20] in
R, by setting the seed number = 300.
c) Replicate 6 values of a random sample for continuous uniform distribution U[10,20] in
R, by setting the seed number = 300.

6. A for-loop let us repeat (loop) through the elements in a vector and run the same code
on each element. The basic syntax for creating a for loop statement in R is
for (value in vector) {
statements
}

(1) Loop through the sequence 1 to 5 to print the square of each number (hint: use the
“print” command to display the number in the console window). Try also print 2 to
200?
(2) Use the for-loop to calculate the sum of the first 100 squares using a for-loop, that is
1) + 2) + ⋯ + 100) =?

7. Generate a random sample from the binomial distribution with p=0.3 and # of trials n=10.
The sample size is 3. To generate the distribution, generate 1000 observation points (that
is, 1000 sample mean observations).

(1) Set the seed value =1 prior to the for-loop. This value will fix the samples that we will
draw thereafter. Show the sample distribution of sample means by a histogram. If you
do this correctly, you should get the same histogram as your classmates’.

(2) As a programming exercise, suppose for some funny reason we want to set the seed
as equal to the serial number of the iterations (i.e. in the first iteration you set the
seed.value(1); in the second you set it equal to 2, etc.). That is, we want the seed value
to follow the serial number of the for-loop and thus the set.seed command should
now be placed in the for-loop. Show the sample distribution of sample means by a
histogram. Again, if you do this correctly, you should get the same histogram as your
classmates’.

You might also like