S1 Chp1 DataCollection

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Stats1 Chapter 1 :: Data

Collection
jfrost@tiffin.kingston.sch.uk
www.drfrostmaths.com
@DrFrostMaths

Last modified: 1st February 2018


The chapters of Stats Year 1 could be broadly organised as follows:

Experimental Chp2: Measures of Chp3: Representation


i.e. Dealing with collected data.
Location/Spread of Data
Chp1: Data Collection Statistics used to summarise Producing and interpreting
data, including mean, visual representations of
Methods of sampling, types standard deviation, quartiles, data, including box plots and
of data, and populations vs percentiles. Use of linear histograms.
samples. interpolation for estimating
medians/quartiles. Chp4: Correlation
Measuring how related two
variables are, and using linear
Theoretical regression to predict values.
Deal with probabilities and modelling to make inferences about what we ‘expect’ to see or make
predictions, often using this to reason about/contrast with experimentally collected data.

Chp5: Probability Chp6: Statistical Chp7: Hypothesis


Venn Diagrams, mutually Distributions Testing
exclusive + independent Common distributions used to Determining how likely
events, tree diagrams. easily find probabilities under observed data would have
certain modelling conditions, happened ‘by chance’, and
e.g. binomial distribution. making subsequent deductions.
This Chapter Overview
Interestingly, most of this chapter is from the old S3 module (a Further Maths module!) with also
some S2. There is little ‘calculation’ involved in this chapter; consider this a ‘bookwork’ one!

1:: Populations vs samples 2:: Random Sampling


“Suggest why we would not Describe the disadvantages
test all the light bulbs.” of systematic sampling.
“Identify the sampling frame.”

4:: Types of data


Continuous vs discrete, terms
such as class intervals, class
3:: Non-Random Sampling boundaries, class width.
Describe how a stratified
sample would be conducted,
including strata sizes. 5:: Edexcel’s ‘Large Data Set’
What you’re expected to know
about the ‘large data set’ of
weather data, and how to use it.
Populations and samples
Population

Sample

!
A population is: the whole set of items that are of ?
interest.
A sample is: ? to represent the population.
some subset of the population intended

You’re probably used to a ‘population’ meaning all humans/animals


within a country/ecosystem. But a population could be “all the
lightbulbs in a factory” or “all the cars in the UK”.
Sampling key terms
Population

Sample

! Each individual thing


in the population that can ! Often sampling units of a
be sampled is known as a population are individually named or
sampling unit. numbered to form a list called the
sampling frame.
Populations vs Samples
We could collect data either from a sample, or from the entire population.
Data collected from the entire population is known as a census.?

Advantages Disadvantages
Census Should give completely • Time consuming and expensive.
accurate result. • Can not be used when testing involves destruction.
? • Large volume of data to?process.
Sample • Cheaper. • Data may not be accurate.
• Quicker. • Data may not be large enough to represent small
• Less data?to process. sub-groups. ?

Example: A supermarket wants to test a delivery of avocados for ripeness by cutting them in half.
a. Suggest a reason why the supermarket should not test all the avocados in the delivery.
b. The supermarket tests a sample of 5 avocados and finds that 4 of them are ripe. They estimate
that 80% of the avocados in the deliver are ripe. Suggest one way that the supermarket could
improve their estimate.

a Testing the avocados destroys them?(and thus can’t be sold).

b Use a larger sample size (as this would be better estimate of


the proportion of ripe avocados).
?
Exercise 1A
Pearson Statistics & Mechanics Year 1/AS
Page 3
Types of Sampling
I recommend laying out your notes like this for next bit of the chapter. Use a full page.

Type How to carry out Advantages Disadvantages

Simple Random
Sampling
Random Sampling

Systematic
Sampling

Stratified
Sampling

Quota
Non-Random

Sampling

Opportunity
Sampling
Random Sampling
Population

Ordinarily, we would want each thing in our sampling frame to have


an equal chance of being chosen, in order to avoid bias.

This is known as random sampling.


There are a few ways of doing this…
Simple Random Sampling
Type How to carry out Advantages Disadvantages
What is it : • Bias free. • Not suitable when
Every sample has an • Easy and cheap to population size is

selected.
?
equal chance of being

implement.
Each number has a •
large.
Sampling frame
known equal chance needed.
Method: of being selected.
Simple Random
Sampling
In sampling frame each
item has identifying
? ?
number. Use random
?
number generator, or
‘lottery sampling’
(names in a hat).

Edexcel S3 June 2004 Q1a


There are 64 girls and 56 boys in a school. Explain briefly how you could take a random
sample of 15 pupils using a simple random sample. (3)
Mark for allocating identifier
to each sampling unit.

? Mark for one (bias-free) method to


select such a number.

Mark for explicitly mentioning how


that number is actually used.
Systematic Sampling
Type How to carry out Advantages Disadvantages
What is it : • Simple and quick to • Sampling frame
Required elements are use. again needed.
chosen at regular • Suitable for large • Can introduce bias
intervals in ordered list. samples/ if sampling frame
populations. not random.
i.e. Take every kth
Systematic
Sampling
elements where:
? ? ?
starting at random item
between 1 and .

Edexcel S3 June 2009 Q1a


A telephone directory contains 50 000 names. A researcher wishes to select a systematic
sample of 100 names from the directory. Explain in detail how the researcher should
obtain such a sample. (2)

We need a random
? first item.
Stratified Sampling
Population
We want to sample
20% of the population.
If the population were
divided into distinct
groups (e.g. age
ranges), known as
‘strata’, we could
randomly sample 20%
from each group,
ensuring each group is
equally represented.

Type How to carry out Advantages Disadvantages


What is it : • Reflects population • Population must be
Population divided into structure. clearly classified
groups (strata) and a simple
• Guarantees into distinct strata.
random sample carried out
in each group. proportional • Selection within
representation of each stratum suffers
Stratified Same proportion sampled groups within from same
Sampling ?
from each strata. ?
population. ?
disadvantages as
simple random
Used when sample is large
sampling.
and population naturally
divides into groups.
Example Question
Edexcel S3 Jan 2006 Q1
A school has 15 classes and a sixth form. In each class there are 30 students. In the sixth form there
are 150 students. There are equal numbers of boys and girls in each class. There are equal numbers
of boys and girls in the sixth form. The head teacher wishes to obtain the opinions of the students
about school uniforms. Explain how the head teacher would take a stratified sample of size 40. (7)

You would
certainly want
to know your
? mark scheme
on this one!
Exercise 1B
Pearson Statistics & Mechanics Year 1/AS
Pages 6-7
Non-Random Sampling
Famous Lefties
Consider the following scenario: You wish to
conduct a survey in the UK on whether being
left-handed affects IQ. We need to choose
people to assess.

Why would random sampling be


problematic?
Because we don’t know the sampling frame,
i.e. don’t have a list of?all left-handed (and
OK,
maybe
non-left-handed) people in the UK. not so
famous.

For this scenario we’d likely use quota sampling, i.e.


1. As with stratified sampling, divide population into groups according to characteristic of interest,
then determine size of each group in sample to reflect proportions within the population.
2. But instead of random sampling within each group, we actively choose people within each
group via suitable means (e.g. advertising), until the ‘quota’ for each group is filled.

A variant of this is opportunity sampling, where we find people at the same time the survey is being
carried out (e.g. exit polls at polling stations). This is not a suitable method for the left-handed example,
because giving the likely time-consuming nature of assessment coupled with resources required, we’d
likely arrange with the people taking part before the actual assessment tasks took place.
Quota & Opportunity Sampling
Type How to carry out Advantages Disadvantages
What is it : • Allows small sample • Non-random
Population divided into to still be sampling can
groups according to representative of introduce bias.
characteristic. A quota population. • Population must be
of items/people in each • No sampling frame divided into groups,
group is set to try and required. which can be costly
Quota reflect the group’s • Quick, easy, or inaccurate.
Sampling ?
proportion in the whole ?
inexpensive. • ?
Increasing scope of
population. Interviewer • Allows for easy study increases
selects the actual comparison number of groups,
sampling units. between different adding
groups in time/expense.
population. • Non-responses are
not recorded.

Sample taken from • Easy to carry out. • Unlikely to provide a


people who are • Inexpensive. representative
available at time of sample.
study, who meet • Highly dependent
Opportunity/
Convenience
criteria.
? ? ?
on individual
researcher.
Sampling
Example Question
Edexcel S3 June 2010 Q2

?
?

?
Exercise 1C
Pearson Statistics & Mechanics Year 1/AS
Pages 8-9
Types of Data

Qualitative/Categorical Quantitative
Non-numerical values, e.g. colour. Numerical values.

Note that while discrete


variables only allow specific Discrete Continuous
values, the range could still be
infinite, e.g. “number of Can only take specific Can take any decimal
attempts before success”.
values, e.g. shoe size, value (possible with
number of children. a specified range).

Data can be grouped for conciseness, at the expense


Weight (kg) Frequency of losing the exact original values.
This is known as a
20 ≤ 𝑤 <70 ?
class interval.

Lower class Upper class Class width =


?
boundary
Midpoint ?
boundary ?
= 45 ?
Exercise 1D
Pearson Statistics & Mechanics Year 1/AS
Page 10

(This exercise could probably be skipped)


Name That Sampling Method!
Simple Random Systematic Stratified Quota Opportunity
Sampling Sampling Sampling Sampling Sampling

Suggest a suitable sampling method.

“You wish to test lightbulbs produced Probably systematic sampling, as the method of
choosing items is simpler than simple random sampling
by a factory in a daily batch.” ?
(where it would be time-consuming to find specifically
chosen random light bulbs). Sampling frame is known.

“You wish to survey consumer Quota sampling or opportunity sampling. We’d


realistically not have access to the sampling frame
opinion on your new drink FizzGuzz ?
(i.e. a list of all UK residents).
released in the UK.”

“You wish to determine students’ Stratified sampling. We (probably) have access to


favourite TV programmes in your the sampling frame (i.e. a list of all students).
school, that is fairly representative of ?
Stratified sampling ensures that each stratum (year
group) is proportionately represented.
each year group.”
Large Data Set
All A Level exam boards are obligated to provide a ‘large data set’. Data in exam
questions will often be from this set, and you are encouraged to explore this data
(which is publicly available) in Microsoft Excel.

It is important to note that you are expected to be familiar with this data set
before you go into your exam, including some basic geographic knowledge!

Edexcel’s data set


concerns weather
data from a number of
weather stations.
Let’s explore what you
might be expected to
know…

https://qualifications.pearson.com/content/dam/pdf/A%20Level/Mathematics/2017/specification-and-sample-assesment/Pearson%20Edex
cel%20GCE%20AS%20and%20AL%20Mathematics%20data%20set%20-%20Issue%201%20(1).xls
What You Need To Be Familiar With…

Northern
Hemisphere
Southern
Hemisphere

You should know the names and rough locations of the 5 UK


1 weather stations, as well as the 3 international weather stations.

The data was recorded for:


• May-Oct 1987
• May-Oct 2015
All the
You should be familiar with the variables
following
are daily…
2 involved and their respective units.

Total rainfall Mean Windspeed Mean Visibility


(in mm) kn/knot is “nautical mile per 0 = Calm How far (in metres) can
hour”. 1-3 = Light 1-10kn be seen into the horizon
tr/trace means less
Windspeed also given on 4 = Moderate 11-16kn during daylight hours.
than 0.05mm
Beaufort Scale: 5 = Fresh 17-21kn
Wind Direction

Mean Pressure
In hectopascals (hPa)

Mean
temperature
(in ) Maximum Gust Humidity Mean Cloud Cover
Textbook claims this Total sunshine (in kn) is highest is the % of air saturation Oktas means the number
is max temp for UK, (nearest of an hour) with water vapour. 100% of ths of the sky covered.
instantaneous is the maximum % water
but it is mean temp
for all locations. wind speed. content air can contain.
You should have a vague idea of the range of
3 values for each location.

UK Location Temp Wind Speed World Location Temp Wind Speed


(2015) Range Range (2015) Range Range
Camborne 10-20 3-18 Beijing 8-33 2-9

Heathrow 8-29 3-19 Jacksonville 15-31 1-12


Hurn 6-24 2-19 Perth 8-25 4-14
Leeming 4-23 3-17
Leuchars 4-19 3-23 Beijing temp range relatively large.
Min Jacksonville temp high.
Perth similar to UK.

From new A Level sample assessment materials:


Mean wind speed in UK “A meteorologist believes that there is a relationship between the daily mean windspeed,
across full period was kn, and the daily mean temperature, °C. A random sample of 9 consecutive days is taken
roughly 9 nm. But 4 nm in from past records from a town in the UK in July and the relevant data is given in the table
Beijing (i.e. lower), 5 in below. …
Jacksonville (again lower), 8 Using the same 9 days, a location from the large data set gave and .
in Perth (similar to UK). (d) Using your knowledge of the large data set, suggest, giving your reason, the location
that gave rise to these statistics.”

(Note to teachers: I will not otherwise use SAM questions in these slides. I made one exception here!)
You should have a vague idea of the range of values
4 for each variable for the data set as a whole.

Variable Typical value(s)


Gust (UK only) 8 – 52 nm
Rainfall 0 – 60 mm in UK, but more
extreme maximums elsewhere
(e.g. 102mm in Perth)
Pressure 988 – 1038 hPa
Wind Speed on Beaufort scale Max is ‘fresh’ (5). Most Light
or Moderate.
Sunshine (UK only) 0 – 16 hrs
Cloud Cover 0 – 8 ocktas (i.e. full spread)
Example Questions
Hurn [Textbook]
(a) Describe the type of data represented
© Crown Copyright Met Office 1987
by daily total rainfall.

Alison is investigating daily maximum


gust. She wants to select a sample of size
5 from the first 20 days in Hurn in June
1987. She uses the first two digits of the
date as a sampling frame and generates
five random numbers between 1 and 20.
b) State the type of sample selected by
Alison.
c) Explain why Alison’s process might not
generate a sample of size 5.

a Continuous quantitative data.


?
Simple random sample.
b
?
c Some of the data values are
As previously noted, the actual data set has mean temperature
for all locations. I changed to maximum temperature for this ?
not available (n/a).
example for consistency with the textbook.
Example Questions
[Textbook] Calculate:
Hurn a) The mean daily maximum temperature for
© Crown Copyright Met Office 1987 the first five days of June in Hurn in 1987.
b) The median daily total rainfall for the week
of 14th June to 20th June inclusive.
c) The median daily total rainfall for the same
week in Perth was 19.00mm. Karl states
that more southerly countries experience
higher rainfall during June. State with a
reason whether your answer to part (b)
supports this statement.

a ?
b Values in ascending order:

Median is 0.1mm.
?
0, 0, tr, 0.1, 3.7, 5.6, 7.4.

Perth is in Australia, which is south of


c
the UK, and the median rainfall was
higher. However, this is a very small
?
sample from a single location in each
country so does not provide enough
evidence to support Karl’s statement.
Example Questions
Calculate:
Hurn a) The mean daily maximum temperature for
© Crown Copyright Met Office 1987 the first five days of June in Hurn in 1987.
b) The median daily total rainfall for the week
of 14th June to 20th June inclusive.
c) The median daily total rainfall for the same
week in Perth was 19.00mm. Karl states
that more southerly countries experience
higher rainfall during June. State with a
reason whether your answer to part (b)
supports this statement.

a
b

c
Exercise 1E
Pearson Statistics & Mechanics Year 1/AS
Pages 13-15

You might also like