Tutorial 7 - Questions

This document provides an overview of key statistical concepts like the central limit theorem and standardization of distributions. It also discusses how random number generation works in computers using pseudorandom number generation algorithms and how setting a seed value fixes the random sequence. Examples are provided of using R commands like set.seed and for loops to replicate random samples for different distributions and calculations.

Uploaded by

Gowshika Sekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Tutorial 7 - Questions

Uploaded by

Gowshika Sekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Nanyang Business School

AB1202 – STATISTICS AND ANALYSIS

Tutorial :7
Topics : Central Limit Theorem

1. CLT builds the bridge for any distribution to converge to a normal distribution (in their
sample means). In this exercise, we examine how different normal distributions can be
standardized.

(𝑋 − 1)/√0.25

Figure 1 (a) left: A normal distribution with mean=0 and variance 1 (N(0,1), the standard normal)
(b) right: A normal distribution with mean=1 and variance 0.25 (N(1,0.25))

In Week 4, Q.11, we learn about converting an arbitrary normal distribution to a normal

distribution with mean=0 and variance=1---the standard normal (Figure 1(a)).
Consider the normal distribution in Figure 1(b). Call it “ 𝑋 .” We know the random
variable,(𝑋 − 1)/√0.25,1 will have the distribution of a standard normal, which we call
𝑍. The standard normal variable, also known as the Z-score, is an often-used RV in statistics.

The fact that these two random variables have the same distribution means specifically
("#$)
that Pr / ≤ 21 = Pr (𝑍 ≤ 2). The equality between the two probabilities holds for
√'.)*

any other number than 2 as well.2

As such, we can use this conversion to calculate the probability distribution of 𝑋, based
on the probability distribution of 𝑍 (and vice versa).
(1) Use pnorm() in R to calculate Pr (𝑋 ≤ 1.5).
(2) What is the Z-score corresponding to 𝑋 = 1.5?
(3) Use R to calculate Pr (𝑍 ≤ 𝑍 − 𝑠𝑐𝑜𝑟𝑒). Compare this answer with the answer of (1).

2. The manager of one coffee store Anne wants to determine the amount of money

1
Please review W4.Q11 if necessary
("#$)
2
We used the inequality ≤ 2 or 𝑍 ≤ 2 because they are continuous RVs—this again is
√'.)*

how probability is defined when the RV is continuous

1
Nanyang Business School

customers spend on iced coffee. Anne is aware that customers spent an average of $4.18
on iced coffee with a standard deviation of $0.84 based on a sample of 50 customers.
While not knowing the exact distribution of customer spending, she believes from her
work experience that $4.18 and $0.84 are close to the mean and standard deviation of the
customer population. Can you find a way to help her find out approximately how likely the
average customer spending on iced coffee will be $4.26 or more?

3. Despite its nutritional value, seafood is a tiny part of most people’s diet. David and Nina
both work in the seafood industry and they decide to create their own random samples
and document the average seafood diet in their sample.
(1) David samples 42 Singaporeans and finds an average seafood consumption of 18 kg
and sample standard deviation of 7 kg per year. Given David’s firm belief that the two
statistics must be very close to the population values, can you find a way to help him
calculate approximately how likely the average is 16 kgs or more?
(2) Nina samples 90 Indonesians and finds an average seafood consumption of 17.5 kg
and sample standard deviation of 7 kg per year. Given Nina’s firm belief that the two
statistics must be very close to the population values, can you find a way to help her
calculate approximately how likely is it to get an average of 16 kgs or less?

4. R commands such as rbinom, rnorm(), generate samples from the underlying distribution.
For example, rbinom(2,5,.5) generates 2 samples from a binominal distribution that
involves 5 trials and success probability of 0.5. The output from this command is
equivalent of flipping a fair coin 5 times and observe # of heads (and you do the same
experiment twice). Experiments like this is simple for us to do. But, how do computers do
this? Computers cannot flip coins or throw dice. How does a computer generate a
“randomized” result?

In fact, computers, like robots, can only follow a prescribed set of instructions. Computer
scientists and mathematicians alike must define the steps to generate random numbers,3
which can be subsequently used to generate the values for random variables. Fortunately,
scientists have developed formulas to calculate different lists of numbers that are
seemingly random, which are called pseudo-random numbers. These pseudo-random
numbers are good enough for our practice, as a proxy of random samples drawn from a
specific distribution.

There exist different ways to generate such random numbers, and also ways to convert
them into any “random variables” available in software such as R (e.g., those commands
that start with “r,” such as rnorm, runif, rbinom, etc.). To show you how it works, below is
an example, the Mid-Square method, which was developed by a famous mathematician
John Von Neumann in the 40’s.

3
A random number is a discrete uniform distribution over a fixed interval. Result from a toss of dice
will give a random number from 1 to 6.
2
Nanyang Business School

You can view the method as a formula that generate a long sequence of numbers by
substitutions. The starting number is what we call the “seed” of the sequence. As a
simplified illustration, we pick an arbitrary 4-digit seed value (e.g., 4687) and then square
it (4687^2=21967969). Then, we extract the middle four digits (9679) and use it as the
second seed and then square it again (9679^2=93683041). With more of these iterations,
we can obtain a sequence of 4-digit pseudo random numbers 9679, 6830,…

a) If we set the initial seed as 7788, what will be the three pseudo-random numbers
by mid-square method?

Observe that the above algorithm can generate a sequence of pseudo random numbers
up to 9999. They are random numbers because we can’t predict (without the knowledge
about the seed value you use) the next number in the sequence.

Before proceeding, it is instructive to pause here and discuss the significance of the seed
values. A random number, by the definition of random, should be entirely unpredictable.
In normal use, R will choose the initial seed value arbitrarily. Thus, without knowing the
seed value, the random numbers that will be generated will indeed appear random to us,
the users.
However, once we deliberately set a starting seed value, the sequence of random numbers
(and the random variables) to be generated are fixed and fully predictable.

For example, suppose we agree that the walking path of a drunk man is random. Since
you’re sober, you cannot randomize your steps that fully mimic how a drunken man would
walk. What should you do if you, for some reason, need to analyze that? An easier way is
to use a hidden camera to film how different drunk men walk. Since their walking path
have been recorded on file, you are able to predict how a particular drunk man will walk.
Yet, you can use this footage for your research, as if the steps are random. Different drunk
men would represent different seed values, the recorded image would be the sequence
of pseudo-random numbers.

For our purpose, I’ll be able to set standard answers for our exercises; otherwise, if the
answers depend on random samples, each of us will have different answers for the same
question.

5. Although R does not use the mid-square algorithm, the algorithm in R or other software
packages works in a similar way. In R, we can use the set.seed command to set the seed
value:
>set.seed(100)
>x=rnorm(5,0,1)
>x
-0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127

Now, if you run rnorm(5,0,1) (or rnorm(1,0,1) 5 times) after set.seed(100), you will obtain

3
Nanyang Business School

the same five values for this normal distribution. Without specifying the set.seed value
upfront, the rnorm command would generate different values each time you use it (as it is
supposed to).

a) Replicate 6 values of a random sample for continuous uniform distribution U[0,1] in R,

by setting the seed number = 200.
b) Replicate 3 values of a random sample for continuous uniform distribution U[10,20] in
R, by setting the seed number = 300.
c) Replicate 6 values of a random sample for continuous uniform distribution U[10,20] in
R, by setting the seed number = 300.

6. A for-loop let us repeat (loop) through the elements in a vector and run the same code
on each element. The basic syntax for creating a for loop statement in R is
for (value in vector) {
statements
}

(1) Loop through the sequence 1 to 5 to print the square of each number (hint: use the
“print” command to display the number in the console window). Try also print 2 to
200?
(2) Use the for-loop to calculate the sum of the first 100 squares using a for-loop, that is
1) + 2) + ⋯ + 100) =?

7. Generate a random sample from the binomial distribution with p=0.3 and # of trials n=10.
The sample size is 3. To generate the distribution, generate 1000 observation points (that
is, 1000 sample mean observations).

(1) Set the seed value =1 prior to the for-loop. This value will fix the samples that we will
draw thereafter. Show the sample distribution of sample means by a histogram. If you
do this correctly, you should get the same histogram as your classmates’.

(2) As a programming exercise, suppose for some funny reason we want to set the seed
as equal to the serial number of the iterations (i.e. in the first iteration you set the
seed.value(1); in the second you set it equal to 2, etc.). That is, we want the seed value
to follow the serial number of the for-loop and thus the set.seed command should
now be placed in the for-loop. Show the sample distribution of sample means by a
histogram. Again, if you do this correctly, you should get the same histogram as your
classmates’.

03 - CT3S Introduction To Probability Simulation and Gibbs Sampling With R Solutions
100% (1)
03 - CT3S Introduction To Probability Simulation and Gibbs Sampling With R Solutions
270 pages
Probability and Statistics With Examples Using R Siva Athreya, Deepayan Sarkar, and Steve Tanner
No ratings yet
Probability and Statistics With Examples Using R Siva Athreya, Deepayan Sarkar, and Steve Tanner
258 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
75 pages
782 Assignment 1
100% (4)
782 Assignment 1
18 pages
Data Science - Probability
No ratings yet
Data Science - Probability
53 pages
UNIT-4
No ratings yet
UNIT-4
38 pages
Simulation: An Introduction
No ratings yet
Simulation: An Introduction
51 pages
DA UNIT-4
No ratings yet
DA UNIT-4
37 pages
Mathematical Computations Using R
No ratings yet
Mathematical Computations Using R
53 pages
randomnumbers-5
No ratings yet
randomnumbers-5
42 pages
sujal 4
No ratings yet
sujal 4
31 pages
Simulation: Programming in R For Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen
No ratings yet
Simulation: Programming in R For Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen
19 pages
Data Science Probability
No ratings yet
Data Science Probability
75 pages
R03 Simulation.128
No ratings yet
R03 Simulation.128
18 pages
Summary I 2018-2019
No ratings yet
Summary I 2018-2019
72 pages
00 Lab Notes
No ratings yet
00 Lab Notes
10 pages
Intro to Statistics for Engineers using Python
No ratings yet
Intro to Statistics for Engineers using Python
147 pages
Comp 03
No ratings yet
Comp 03
10 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
115 pages
Generating Random Variables
No ratings yet
Generating Random Variables
7 pages
Introstat
No ratings yet
Introstat
16 pages
Lecture 1
No ratings yet
Lecture 1
41 pages
Probst at Book
No ratings yet
Probst at Book
539 pages
Course Notes
No ratings yet
Course Notes
111 pages
Probability in Computer Science
100% (1)
Probability in Computer Science
353 pages
Lab 8
No ratings yet
Lab 8
5 pages
Lecture Notes in Probability: Raz Kupferman Institute of Mathematics The Hebrew University April 5, 2009
No ratings yet
Lecture Notes in Probability: Raz Kupferman Institute of Mathematics The Hebrew University April 5, 2009
159 pages
Stats Semis
No ratings yet
Stats Semis
18 pages
Lec3 Inverse Transformation Rejection
No ratings yet
Lec3 Inverse Transformation Rejection
46 pages
SlidesCourse 21 Oct
No ratings yet
SlidesCourse 21 Oct
10 pages
AM207 2 Transforms Sampling
No ratings yet
AM207 2 Transforms Sampling
50 pages
S29
No ratings yet
S29
40 pages
Random Numbers (2/3) : Non - Uniform Distribu/ons
No ratings yet
Random Numbers (2/3) : Non - Uniform Distribu/ons
19 pages
main (4)
No ratings yet
main (4)
13 pages
PTSP Lab Record
No ratings yet
PTSP Lab Record
27 pages
From Algorithms To ZScores SHORT
100% (2)
From Algorithms To ZScores SHORT
409 pages
Chapter 2 Choosing Random Numbers From Distributions: 2.1 Direct Inversion
No ratings yet
Chapter 2 Choosing Random Numbers From Distributions: 2.1 Direct Inversion
21 pages
Matlab 1 Microsoft Word Document 1 PDF
No ratings yet
Matlab 1 Microsoft Word Document 1 PDF
26 pages
EEN330 Lab 1 and Lab 2
No ratings yet
EEN330 Lab 1 and Lab 2
9 pages
doc-cours_MathsV
No ratings yet
doc-cours_MathsV
69 pages
Chapter 1 Probability
No ratings yet
Chapter 1 Probability
13 pages
Data Science Probability
No ratings yet
Data Science Probability
97 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
111 pages
Lec3.Generating RVs
No ratings yet
Lec3.Generating RVs
26 pages
Hofman Notes
No ratings yet
Hofman Notes
114 pages
Randomizedd Algorithms
No ratings yet
Randomizedd Algorithms
195 pages
Prob Stat Book
No ratings yet
Prob Stat Book
543 pages
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
100% (9)
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
142 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
Random Experiments With R
No ratings yet
Random Experiments With R
3 pages
STAT 230 Course Notes Fall 2019
No ratings yet
STAT 230 Course Notes Fall 2019
425 pages
toc
No ratings yet
toc
4 pages
Chapter 0 Introduction
No ratings yet
Chapter 0 Introduction
14 pages
8ed8 PDF
No ratings yet
8ed8 PDF
126 pages
Lab Manual Ch4
No ratings yet
Lab Manual Ch4
10 pages
1.Random Variable
No ratings yet
1.Random Variable
19 pages
A Short Introduction To Probability
No ratings yet
A Short Introduction To Probability
123 pages
Math 1280 Notes
No ratings yet
Math 1280 Notes
91 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
10 1108 - Ijm 12 2020 0548
No ratings yet
10 1108 - Ijm 12 2020 0548
20 pages
Group Project Brief-2
No ratings yet
Group Project Brief-2
7 pages
Week2 Clone
No ratings yet
Week2 Clone
5 pages
Tutorial 10 - Questions
No ratings yet
Tutorial 10 - Questions
3 pages
Tutorial 6 - Questions
No ratings yet
Tutorial 6 - Questions
2 pages
Tutorial 1 - Questions
No ratings yet
Tutorial 1 - Questions
2 pages
CC0005 Obtl S2 Ay2022-2023
No ratings yet
CC0005 Obtl S2 Ay2022-2023
20 pages
CC0007 Syllabus AY22S2
No ratings yet
CC0007 Syllabus AY22S2
18 pages
Double Indeterminacy - Neutrosophic Study of An Approximation Techniques Used To Find Random Variables
No ratings yet
Double Indeterminacy - Neutrosophic Study of An Approximation Techniques Used To Find Random Variables
8 pages
BRM Unit-4
No ratings yet
BRM Unit-4
47 pages
Intro To Stats 2018
No ratings yet
Intro To Stats 2018
44 pages
Reliability and Risk Analysis (What Every Engineer Should Know) 2nd Edition Mohammad Modarres download
No ratings yet
Reliability and Risk Analysis (What Every Engineer Should Know) 2nd Edition Mohammad Modarres download
55 pages
Lecture 5 PDF
No ratings yet
Lecture 5 PDF
14 pages
Normand (1999) - Meta-Analysis. Formulating, Evaluating, Combining, and Reporting
No ratings yet
Normand (1999) - Meta-Analysis. Formulating, Evaluating, Combining, and Reporting
39 pages
Estimation One Population Review Questions Fall2023 Solution
No ratings yet
Estimation One Population Review Questions Fall2023 Solution
14 pages
Normal Distribution
No ratings yet
Normal Distribution
54 pages
Unit-4 Probability
No ratings yet
Unit-4 Probability
21 pages
Cambridge International Advanced Subsidiary and Advanced Level
No ratings yet
Cambridge International Advanced Subsidiary and Advanced Level
4 pages
Tutorial: Using SAGA For Least Cost Path Analysis: Developed by Kim Cimmery (Kapcimmery at Hot Mail Dot Com) March 2013
No ratings yet
Tutorial: Using SAGA For Least Cost Path Analysis: Developed by Kim Cimmery (Kapcimmery at Hot Mail Dot Com) March 2013
124 pages
Study About Mussel Shells Effectiveness To Lead On Wastewater
No ratings yet
Study About Mussel Shells Effectiveness To Lead On Wastewater
27 pages
Where can buy (Ebook) Reliability and Risk Analysis in Engineering and Medicine by Chandrasekhar Putcha, Subhrajit Dutta, Sanjay K. Gupta ISBN 9783030804534, 3030804534 ebook with cheap price
100% (5)
Where can buy (Ebook) Reliability and Risk Analysis in Engineering and Medicine by Chandrasekhar Putcha, Subhrajit Dutta, Sanjay K. Gupta ISBN 9783030804534, 3030804534 ebook with cheap price
81 pages
ECG Denoising and Compression Using A Modified Extended Kalman Filter Structure
No ratings yet
ECG Denoising and Compression Using A Modified Extended Kalman Filter Structure
9 pages
Tomo Takahashi - Non-Gaussianity in The Curvaton Scenario
No ratings yet
Tomo Takahashi - Non-Gaussianity in The Curvaton Scenario
4 pages
Tabla de Kolmogorov-Smirno
No ratings yet
Tabla de Kolmogorov-Smirno
2 pages
BUDGETED-LESSON-PLAN-2nd SEMESTER
No ratings yet
BUDGETED-LESSON-PLAN-2nd SEMESTER
14 pages
The Effect and Implications of Work Stress and Workload On Job Satisfaction
No ratings yet
The Effect and Implications of Work Stress and Workload On Job Satisfaction
5 pages
حل تمرین کتاب قابلیت اطمینان سازه Nowak و Collins - ویرایش دوم
No ratings yet
حل تمرین کتاب قابلیت اطمینان سازه Nowak و Collins - ویرایش دوم
8 pages
Structural Equation Modelling of Construction Project Performance Based On Coordination Factors
No ratings yet
Structural Equation Modelling of Construction Project Performance Based On Coordination Factors
23 pages
Lecture Notes - Kristiaan Pelckmans
100% (1)
Lecture Notes - Kristiaan Pelckmans
153 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
FEM Analysis of Underground Structures
No ratings yet
FEM Analysis of Underground Structures
120 pages
Forecasting Volatility Based On Wavelet Support Vector Machine-Przejrzane
No ratings yet
Forecasting Volatility Based On Wavelet Support Vector Machine-Przejrzane
9 pages
QQ Plots For Normality Check
No ratings yet
QQ Plots For Normality Check
1 page
Instant Ebooks Textbook (Ebook PDF) Essential Statistics 2nd Edition Download All Chapters
100% (4)
Instant Ebooks Textbook (Ebook PDF) Essential Statistics 2nd Edition Download All Chapters
51 pages
OBINA, Marie Hyacinth B.: Arithmetic Average Rate of Return
No ratings yet
OBINA, Marie Hyacinth B.: Arithmetic Average Rate of Return
3 pages