RMDS 2023 Q1 Q&Alecture Unit 19 20 PM_canvas

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36


Resit Test 1 – Tomorrow

• Remindo test: including a scientific calculator
• Chrome books of the UT
• 40 MC questions
• Therm 2
• 13:45-15:45 hrs.

• Test 1: Unit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23

Test 2 & R test – Friday 3 Nov 2023 – 13:35-
16:45 hrs
• Remindo tests
• Chrome books of the UT
• Test 2:
• 40 MC questions
• Unit 13, 24, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22
• including a scientific calculator
• R-test: similar to R practice test:

• NH205: 65 persons: surnames start with A - G

• NH207: 60 persons: surnames starting with H - M
• NH209: 75 persons: surnames starting with N - Z
What else today..
• Unit 19 - Sampling:
Probability vs. non-probability sampling
“Could you explain the last question of asg unit 19? (The researcher
adapts his plans and decides to focus on the wide variety of first year
psychology students at the University of Twente...).

Unit 20 – Certainty about the mean:

Population, sample, sampling distribution (of the mean)
 assignment Q7 (start with Q5)
Research variable(s)
I. Conceptualization
II. Operationalization
III. Measurement

v1 v2
Sampling unit1
All Units unit2
Sampling 6

• Before sampling, you need a clear definition on

the population of interest (target population)
 Units of Analysis in your RQ

• A population refers to all members of a certain

group of people/companies/countries
Probability sampling 7

Selected randomly from the population  Individuals have an equal or known

chance to participate in the study.
• Goal: your sample represents well the (target) population (external validity)
• a sampling frame is necessary

Sampling frame
Types of probability sampling 8

1. Simple random sampling (SRS)

2. Stratified random sampling
3. Cluster random sampling
4. Multi-stage random sampling
Non-probability sampling 9

Selected non-randomly: individuals have an unequal or unknown

chance to participate in the study.

• No sampling frame available

• Sample is not representative for the population
• Goal: Inductive reasoning (explorative empirical research)
Types of non-probability sampling 10

Selected non-randomly: individuals have an unequal or unknown chance to

participate in the study.

1. Convenience
2. Purposive
3. Systematic

4. Quota (purposive, with fixed final size)

5. Snowball (social network of resp.)
Could you explain the last question of unit 19?
The researcher adapts his plans and decides to focus on the wide
variety of first year psychology students at the University Twente.
Because of the diverse educational careers sofar, the quality of the
psychology study might be improved by offering a more personalized
study program in BA-1.
The researcher wants to explore whether the present program fits to
the diverse personal needs of the BA-1 students. He wants to cover a
wide variety of BA-1 students by defining subgroups.
• Which type of sampling method would you advise to researcher?
• Describe the sampling method in more detail.
• Unit 19 - Sampling:
Probability vs. non-probability sampling

• Unit 20 – Certainty about the mean:

Population, sample, sampling distribution (of the mean)
 assignment Q5 & Q7
Sample distribution vs. sampling
Which statement is/are correct?
A. A Sample distribution displays which values of a variable you have
obtained after drawing a sample of a give size from a population.
B. A Sampling distribution displays the values of a statistic (e.g. mean, SD,
var) from repeatedly drawing samples of a give size from a population.

1. A and B are correct

2. Only A is correct
3. Only B is correct
4. A and B are not correct
5. I don’t know 
Q5: Average height in the
• website

• Population: Height is normally

distributed, ranges 0-32

Population distribution

Sample distribution =
distribution of height
(after SRS): n=5
Sampling distribution of
the mean height of 5

Sampling distribution of
→ .. the sample “mean”
→ .. 25 persons
Q5: Average height in the
• website http://onlinestatbook.com/stat_sim/sampling_dist/

• Population: Height is normally distributed,

ranges 0-32

1. Session: “animated” (one by one)

•  3rd graph: draw 1 sample of n=2 persons
and determine their mean heights
•  4th graph: draw 1 sample of n=25 persons
and determine their mean heights
website http://onlinestatbook.com/stat_sim/sampling_dist/
Q5: Height of a population
• website http://onlinestatbook.com/stat_sim/sampling_dist/

• Height is normally distributed, range 0-32

Q5a. Session: “10.000”

•  3rd graph: drawing 10.000 samples of n=2 persons and determine
their mean heights
•  4th graph: drawing 10.000 sample of n=25 persons and determine
their mean heights

Compare mean and SD of both “sampling distribution of the
(sample) mean”. Are they different?
Compare mean and SD of the sampling distribution of the

• Means are almost similar

• SD for n=25 < SD for n=2

• “With a sample size of 25 you

are more confident about the
value of the population mean
compared to a sample size of
2“(the range of graph is larger)
Q7/Q5c: What is the formula for the SD of the
sampling distribution of the (sample) mean?

Standard Deviation
of the population!

= SEM = Standard Error of the Mean

Q5d: Compute SD of the sampling distribution of
the (sample) mean? What is the SEM?
Why should you memorize this SEM

Why should you memorize this SEM formula?
The common situation in research …..
• RQ: What is the average height of all UT students?
• The distribution of height in the population is unknown
→  and  are unknown
• You just have one sample, randomly drawn from the
population (SRS)
• and s can be calculated easily.

of a samp
 “Variance is the average of the squared differences from the mean”
Why should you memorize this SEM formula?
Information from one random sample (n > 30) can be used to say what the average height
of UT students probably is (inference).
Central Limit Theorem:
“If you draw many random samples from the population that are large enough (n > 30) and
calculate the sample mean each time…..


…. the sampling distribution of the sample means is approximately normal, irrespective the
shape of the population distribution.”

Why should you memorize this SEM formula?
Information from one random sample (n > 30) can be used to say what the average
height of UT students probably is (inference).
Central Limit Theorem:
“If you draw many random samples from the population which are sufficient large (n > 30)
that are large enough (n > 30) and calculate the sample mean each time…..


….the sampling distribution of the sample means is approximately normal, irrespective the
shape of the population distribution.”

 That’s . The shape of a normal distribution is well known and very useful.

Q7c: Empirical rule (68-95-99.7 rule)

0.954 (or 95.4%)

Why should you memorize this SEM
Central Limit Theorem:
1. “The shape of sampling distribution is normal”.
• - 1 SD to +1 SD: 68% of the values
• - 2 SD to +2 SD: 95.4% of the values
• -1.96 SD to +1.96 SD: 95.0 % of the values

• We can use these characteristics to express our uncertainty about

the mean of the population with only one sample!
 “95% Confidence interval of the mean”
Why should you memorize this SEM
Information from one random sample (n > 30) can be used to say what the average height
of UT students probably is (inference).
Central Limit Theorem:
1. Shape of sampling distribution is normal;
2. Mean of the sampling distribution almost equals to population mean.

→ the goal of answering our RQ is close, despite we don’t known distribution of

height in the UT population
Why should you memorize this SEM
Information from one random sample (n > 30) can be used to say what the average height
of UT students probably is (inference).
Central Limit Theorem:
1. Shape of sampling distribution is normal;
2. Mean of the sampling distribution almost equals to population mean.
3. SD of the sampling distribution (of the sample means) = SEM =

We don’t know , but our best estimate is s.

Why should your memorize this SEM
You can use information from one random sample (n > 30, ) to say what the
average height of UT students probably is (inference).

• 95% Confidence Interval of the mean = sample mean ± 1.96 * SEM

Why should you memorize this SEM
You can use information from one random sample (n > 30) to say what the average height
of UT students probably is (inference).

“The average height of UT students is 1.71 cm (95 CI: [1.65, 1.77])”

Interpret the 95% CI of the mean:

• “95% of all sample means closest to the actual population mean are between
1.65 and 1.77 cm”.
• “With repeated (random) sampling, the interval 1.65 to 1.77 cm holds the actual
population mean 95% of the times”.

• “I am 95% confident that the interval 1.65 to 1.77 cm holds the actual population

You might also like