STAT 111: Introduction To Statistics and Probability: Lecture 2: Data Reduction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

STAT 111: Introduction to Statistics

and Probability

Lecture 2: Data Reduction

Ridwan R.D

Department of Statistics and Actuarial Science


University of Ghana
1
Outline

 Populations and samples

 Data types: qualitative versus quantitative

 Data displays: pie chart, bar chart, stem-and-leaf


plot, dot plot, cumulative frequency diagram,
histogram

 Data numerical summaries: sample mean, sample


median, sample variance and sample moments
Populations and samples

Sample is a subset taken from a larger population.

• Example: all the students in this lecture theatre form a


sample taken from the population of all students in UG.

Elements in the sample are called sampling units.

• Example above: students are the sampling units.


Populations and samples

• Random sample is a collection of sampling units chosen


entirely at random. We call this fair sampling.

• If selected unfairly, then it is called biased sample.

Example: in an opinion poll a predetermined proportion is


selected from each social group.
Data types
• Variates are quantities (e.g., height of students) or items
observed (e.g., colour of shoes of students) which are
associated to the sampling units (e.g., the students):

 Qualitative (descriptive, non-numerical)

• Nominal - named, unordered categories

• Ordinal - ordered categories


Data types
 Quantitative (numerical)

• Discrete - distinct values (usually integer values)

• Continuous - any real value in a range

Data consists of a collection of observations (e.g., heights of

students in this lecture theatre).


Qualitative variates

• A variate is qualitative if there is no meaningful way to


perform arithmetic operations with it.

Examples:

• collect autumn leaves; the variate can be the species of


tree from which they come

• observe vehicles passing a census point and record their


type (car, bus, bicycle); the variate here is the vehicle type
Qualitative variates
• Variates which appear numerical can still be qualitative.

• Example: look at the next 20 buses to go past and record


the numbers on the front. 73 is bigger than 10 as a
number, but a Number 73 bus is not \greater" than a
Number 10 bus - this is simply a characterization of the
buses, the bus number is a nominal (unordered)
qualitative variate
Qualitative variates
• All previous examples are unordered, also called
nominal, variates. It is possible to have ordered, also
called ordinal, qualitative variates:

• feedback forms have a 5-point scale from ”Very poor" to


“Very good”

• clothes sizes: S, M, L, XL
Quantitative variates
• A quantitative variate is a numerical variate for which a
quantity such as the difference between two observations
can be defined.
• Examples:
• number of passengers in vehicles passing a checkpoint
• heights of students in a class
• volume of noise made by aircrafts taking o
• number of students in lecture theatres The first and last of
these examples are discrete, the second and third are
continuous.
Data displays
A good visual display aids understanding and can highlight
features which may be worth exploring.

• Displays should be appropriate, have impact and be


accurate.
• Displays should be uncluttered and straightforward to
grasp.
• A graphical summary of the data is a very useful starting
point.
Pie and bar charts

• Both useful for categorical data with a few number of


categories.

• In a pie chart, the areas of a segment represents the


category value.

• In a bar chart, the height of a bar represents the value.

• Generally bar charts are better for display. Relative lengths


are easier to judge than relative areas.
Good and bad examples: Bar chart
eg
Steps in constructing a bar chart

1. Draw a horizontal line with category names at regular


intervals underneath.

2. Draw a vertical axis scaled to the largest frequency


value.

3. Above each category draw a rectangle with height at the


category's frequency.
Steps in constructing a bar chart

Notes:

• Relative frequencies may be used instead.

• All rectangles must have the same width and not overlap.

• If the categories are integer values, centre the rectangles


on the category values.
Bar chart: Example.

Ridge hospital ward experienced the following distribution


on the number of deaths each month over a 47 month period.

• Number of deaths per month 0 1 2 3 4 5

• Number of such months 6 11 16 9 4 1


Bar chart: Example.
Solution
Pie chart

Suitable for qualitative (categorical) data: data values


associated with a small number of categories. Areas reflect
these values.

• Example: consider the following data set (coloured balls):

Red Orange Yellow Green Blue

9 4 12 21 17
Pie chart

• Assuming a circle of radius r = 1, what would be the area


associated with the Green slice in a pie chart? The total
count is 63,

21 1
• so the proportion of pie area for Green is = . Hence,
63 3

𝜋𝑟 2
the area for Green is = 1.047198.
3
Pie and bar charts

Bar chart is much better than pie charts for displaying


categorical data:

• relative lengths (of bars) are easier to judge than relative


areas (in a pie chart)

• separate bars represent distinct values observed:

• bar heights are proportional to the frequency (number of

observations)
Stem and leaf plots.

• Useful for a small set of numeric data.

• Gives an impression of location, spread and shape of


values.

• More usually there is only 1 digit in a leaf value, in which


case do not use spaces to separate values.
Steps to construct Stem-and-Leaf
Plots.
1. Select one or more leading digits to make the Stem
values. The remaining digits become the leaves.

2. List possible stem values vertically.

3. Record each observation by its leaf value in the


corresponding stem row.
Steps to construct Stem-and-Leaf
Plots.
4. Indicate the stem and leaf units nearby.

5. If there are some large or small isolated values, list those


separately near the stem plot.

6. Optionally, record the frequency or cumulative frequency


in another column.

For ordered stem-and-leaf plots, sort the leaf values in each


row.
Stem and leaf plots.
Example.

Consider 20 claim sizes on a certain insurance policy.

• 54 127 700 250 81 17 90 28 1100 320


• 547 14 38 51 130 31 148 600 130 200
Stem and leaf plots.
solution
Stem and leaf plot (for previous data)
Soln.
Variations of Stem-and-Leaf plots

• There is no unique convention for drawing stem-and-leaf


plots, you will find a range of conventions in different
textbooks.

• Some insist on having a cumulative frequency column; in


some the stems will increase going up, in others going
down.
Variations of Stem-and-Leaf plots

• There are variations that open up' the stems if the majority
of the observations are in one or two leaves.

• You should be aware of these variations in case you come


across them, but computer packages can output them
automatically so we will not go into their hand
construction here.

You might also like