Chapter One: Basic Statistical Concepts and Notations
Chapter One: Basic Statistical Concepts and Notations
Chapter One: Basic Statistical Concepts and Notations
Objectives:
1. define statistics;
2. differentiate descriptive from inferential statistics;
3. distinguish a variable from a constant, population from a sample, parameter
from an estimate;
4. differentiate qualitative from quantitative variables;
5. distinguish nominal, ordinal, interval and ratio measurement scales; and
6. understand the different theorems on summation notation;
Introduction
Students often wonder why statistics is very difficult and why they should bother
to study this subject. Many students consider statistics as a hindrance to finishing their
studies for the reason that many of them have a negative attitude towards it. Based on the
feedback of students, Statistics is the most feared subject since it deals with numerical
data. Further more, they said they are not inclined to working with numbers.
Much of the literature of any specialized field, particularly those dealing with
research, contains statistical symbols, ideas and concepts. It is therefore necessary to
build up an adequate vocabulary for statistics. Students are required to understand better
with the vocabulary in statistics so that they can follow through with the succeeding
lessons. When students familiarize the language and vocabulary in statistics, they will not
find the succeeding lessons difficult.
This chapter introduces the student to some basic statistical concepts and
notations.
1
What is Statistics?
Statistics is a collection of methods for planning experiments, obtaining data, and
then organizing, summarizing, presenting, analyzing, interpreting, and drawing
conclusions based on the data (Triola, 1998). It consists of a body of methods for
collecting and analyzing data (Agresti & Finlay, 1997). It is a method of dealing with
data. It is a tool concerned with the collection, organization, presentation, analysis and
interpretation of numerical information. Statistics is the science of collecting,
simplifying, and describing data, as well as making inferences (drawing conclusions)
based on the analysis of data (Chase and Bown, 1997). Its essential purpose is to describe
and draw inferences about numerical properties of a group or population (Walpole,
1989).
Example 1: If we measure the Intelligence Quotient (IQ) of all the students in the
School of Graduate Studies and calculate its mean, that mean is a
descriptive statistics because it describes the characteristics of a
complete population.
In a random sample, members of the population are selected in such a way that
each has an equal chance of being selected. In the second example, the population is the
total number of students enrolled in Inferential Statistics during the first semester of
school year 2017 – 2018.
Inferential Statistics is a method which involve the use of sample data to make
generalizations or inferences about a population (Triola, 1998). It is concerned with
generalizing this information more specifically, with making inferences about population
which are based upon samples taken from population (Runyon & Haber, 1986). It is
2
concerned with judgments (or inferences) about a population based on the properties of
some sample obtained from the population (Chase and Bown ,1997). Here, a sample is
selected with the intent of predicting what the larger population is like.
Example 1: If we wish to make a statement about the mean IQ of all students in
the School of Graduate Studies at the Liceo de Cagayan University computed on
a sample of 100 students and estimate the error involved, we use the procedure
from inferential statistics.
Example 2. A researcher randomly selected 520 college students from the School
of Education, School of Arts and Sciences, School of Business
Administration and Information Technology and School of Extension
and Community Education. Each school is well represented using the
stratified random sampling with proportionate allocation. If the
average grade from each school will be compared by the researcher,
inferential statistics procedure will be done.
Academic records of the graduating classes during the past years at Liceo de
Cagayan University show that 90% of the entering freshmen eventually graduated. The
numerical value, 90%, is a descriptive statistic. If you are a member of the present
freshman class and conclude from this study that your chances of graduating are better
than 80 %, you have made a statistical inference that is subject to uncertainty.
Some terms and concepts you will meet often are the following:
3
divisions, categories, characteristics, or values (Grimm & Wozniak, 1990). When we
assign names, numerals, and numbers to things that change in the real world, we have
variables. In fact, we would find it impossible to understand the world without variables.
Another examples of variables are: shirt in different sizes (small, medium, large,
extra large). Social class with categories of upper, middle and lower class. Religion with
categories of Roman Catholic, Protestant, Seventh Day Adventist, Mormons, etc.
Example: We may wish to investigate all the students grades after this course to find out
relationship between their Grade Point Average and their scores in other
foundation subjects.
Example: We may choose only 100 students from the School of Graduate Studies at the
Bukidnon State College. The 100 students are then the sample.
Census is the collection of data from every element in the population (Triola, 1998). In
census there is what we call as complete enumeration.
Closely related to the concepts of population and sample are the concepts of parameter
and statistic. The following definitions are easy to remember if we recognize the
alliteration in “population parameter” and sample statistic.”
4
Parameter and Estimates
Example 1. The grade point average and standard deviation of all students in the School
of Graduate Studies are examples of parameters.
Example 2. The average weekly amount spent by all the students of Bukidnon State
College.
Example 1. The grade point average and standard deviation (x and s) of a random
sample of 50 students in the School of Graduate Studies are examples of
parameters.
Example 2. The average weekly amount spent by all the students of Bukidnon State
College.
Table 1.
Some common notations of the characteristics with the corresponding parameters and
statistics
Characteristic Parameter Statistic
Mean μ, mu _
X
Standard deviation σ s
Variance σ2 s2
Number of Cases N n
5
The Nature of Data
Some data sets consists of numbers (such as heights, scores in the test, etc.) and others
are non-numerical (such as gender). The terms quantitative and qualitative data are often
used to distinguish between these two types.
The nominal level of measurement is characterized by data that consist of names, labels,
or categories only. The data cannot be arranged in an ordering scheme (such as low to
high). The simplest measurement scale is termed nominal or classificatory. A nominal
measurement scale is one in which the researcher assigns different numbers to mutually
exclusive categories. Mutually exclusive categories are those in which all observations
assigned to the same category and have a similar characteristic, and they differ on the
basis of a specific characteristic from observations in other categories. The categories of
nominal variables do not differ by quantity, degree, or amount, but only by kind.
Example 1. The two categories of the nominal variable “gender” (male and female) are
distinct, do not overlap, include possible sexes, and cannot be ordered or
ranked.
Example 2. The same would be true of the nominal variable “region” which might be
broken into the categories of NCR, Region I, Region II, Region III, Region
IV, Region V, Region VI, Region VII, Region VIII, Region IX, Region X,
Region XI, Region XII, and ARMM, etc.
Nominal scales represent the lowest level of measurement because they allow you
only to count and compare the number of cases in each category.
6
Other examples of nominal scales are given below:
The ordinal measurement scales involves data that may be arranged in some order, but
differences between data values either can not be determined or are meaningless. The
ordinal measurement scales classify people or things into types or kinds, but with one
additional feature. Here the classes or categories can be ranked. Ordinal categories are
distinct, mutually exclusive, and exhaustive, but they are also orderable in terms of
quantity, magnitude, or some other criteria. In other words, ordinal measurement scales
have the property of magnitude but not the property of equal intervals for the property of
absolute 0. It allows us to rank individuals or objects but not to say anything about the
meaning of the differences between the ranks.
Example. For example, the three categories of the ordinal scale “social classes” (upper,
middle, and lower) are distinct, do not overlap, include the entire range of
social class, and can be ranked: The upper class is higher than the middle class
and the middle class is higher than the lower class. No statement can be made
however about the amount of difference between categories. The differences
between upper and middle and between middle and lower are not calculable.
Another example is ranking students GPA. If you ranked 1st in a class of 400,
the rank indicates greater than or less than, but not how much higher or lower.
Example 2. A search for junior mathematicians screening committee ranks Dina 3rd,
Mercy 7th, and Kim 10th. We can find a difference between ranks of 3 and 7,
but the difference of 4 does not mean anything.
The interval level of measurement is like the ordinal level, with the additional property
that we can determine meaningful amounts of differences between data.
However, there is no inherent (natural) zero starting point (where none of the
quantity is present.
7
however, with ratio variables there is a true zero point where zero is equivalent to a total
absence of the variable.
Example1. For example, time measured by calendars temperature on the Fahrenheit scale,
and intelligence by IQ scores are interval variables because zero values do not
mean the total absence of time, temperature, or intelligence, respectively.
Example 2. Students scores in the college admission test and their grades in the card are
interval measurement scale.
The ratio level of measurement scale is the interval level modified to include the inherent
zero starting point (where zero indicates that none of the quantity is present).
For values at this level, differences and rations are both meaningfully.
For most statistical purposes interval and ratio scales are treated as a similar type
of measurement scales. Note, however, that a major difference is the fact that one cannot
form ratios with values of interval scale. For example, it is incorrect to say that 60 ois
twice as hot as 30o; but it is correct to say that PhP 60,000.00 is twice as much as PhP
30,000.00. Because of the scarcity of interval variables, the ambiguity concerning the
differences between interval and ratio scales, and their similar statistical treatment, it
makes sense to treat these two types of measurement scales as one type. In general, we
should not calculate averages for data at the nominal or ordinal levels of measurement.
9
C. Indicate whether the following variables are categorical or quantitative: 1.
Favorite food.
2. Favorite profession.
3. Number of goals scored by your favorite team last season.
4. Number of students at your school.
5. The eye color of your classmates.
6. IQ of your classmates
10
Summation Notation:
i = 1 to n notation below and above the summation indicate that i takes on the
successive values from 1, 2, 3, up to n.
There are three important theorems governing the use of the summation sign. Ferguson
(1989) gives them as follows:
Theorem 1
If every variate value in a group is multiplied by a constant number of factor, that
factor may be removed from under the summation sign and written outside as a factor. In
symbols;
∑CX = CX + CX + CX + + CX
i
i =
1
... 1 2 3
= C (X1 + X2 + X3 + … + Xn) n
n
= C Xi
∑
i = 1 5
i
Example 1. 5
∑ Xi
=
1
as 5
1
5 Xi
i
∑ =
Theorem 2
11
n
∑ =++++CCCCC
...
i = 1
= nC 5
= 5 X 6 = 30
Theorem 3
The summation of the sum of any number of terms is the sum of the summation of
these terms taken separately.
Thus
∑ X + Y + Z = X + Y + Z + X + Y + Z + X + Y + Z + + X + Y + Z ( ) ( ) ( ) ( ) ... ( )
i 1
111111222333= nnn
n n n
=
∑ ∑ ∑ XYZ I
++
i
i
i
===111
i
i
3
Example 3. AI Bi Ciis evaluated as ( )
∑
++
i = 1
3 3 3
=
∑ ∑ ∑ ABC i
++
i
i
i
===1i
i 1 1
12
4 3
a. c.
∑ Xi ∑3
i1= i = 1
b.
∑ 2X d. ∑4
i
4
2
X
i
=
1
Answers:
4
∑ =+++ i
a.1 2 3 4 X X X X X
i
=
1
= 3 + 2 +1 + 5
= 11
4 4
b.
∑ ∑ 22
Xi X
=
i
i
=−
1 i 1
4
c. 3 =3+3+3+3
∑
i = 14
= 12
2
d.
∑ Xi= 24
2
X1 + X + X + X
2 i 2 1
= 2
3
= 32 + 22 + 12 + 52
= 9 + 4 + 1 + 25
= 39
Another concrete illustrative example may prove helpful in grasping the nature of
summation notation. Let the following be paired scores. The values of X i below may be
viewed as scores for four people on a test, and the values of Y i may be viewed as scores for
the same four people in another test.
X1 = 5 Y1 = 2
X2 = 6 Y2 = 3
13
X3 = 12 Y3 = 7
X4 = 15 Y4 = 10
1.
∑ X = 5 + 6 + 12 + 15 = 38
i
2. Y
∑ = 2 + 3 + 7 + 10 = 22
i
3. 5X
∑ = 5 X 5 + 5X 6 + 5X 12 + 5 X 15
i
4.
∑ X = 5 X 5 + 6 X 6 + 12 X 12 + 15 X 15
2
i =
430
5.
∑(X − 5) = (5 − 5)+ (6 − 5)+ (12 − 5)+15 − 5
i
= 18
Exercises
1. X1 + X2 + X3 + … + Xn
2. (X1 + Y1) + (X2 + Y2) + … + (X9 + Y9)
2 2 2 2 2
1 X Y + X Y + X Y + X Y + ...+ X Y 3.12
12 23 34 4 12
7
1. +
∑ i1 i1
=
=
5
N
3
Xi ∑( ) a b
3.
i i
2. 4. Xc
∑ ∑ i
i
=
Yi Zi 1 5 i
= + 1
a. ()
∑
d.
Xi ∑2
i
7 + = Xi
i 1 1
=
5 5
3
2
b. e.
∑ Xi ∑ Xi
i 1 i
7 = + = −
1
14
5
4
c. ()
∑
f.
Xi Xi Xi ∑
2 i 1 i
= +− = Xi
2 1
a.
∑ ()
2
3
i =4 Xi Xi 1 −
b.
∑( )
i
1 = Xi +
2 2
c.
∑( ) X X i i
i 2
2/ =
+
X1 = 3 Y1 = 9
X2 = 5 Y2 = 2
X3 = 5 Y3 = 1
X4 = 4 Y4 = 5
X5 = 8 Y5 = 3
c. ()
∑
h. ()
XiYi ∑
22
X i Yi
i1
=5 1
d. XiYi
∑ 3
i
i.
=5
XiYi ∑2
i =5 1 i =5 1
3
e. ()
∑ Xi
j.
Xi Yi ∑ 2
i = 1 i = 1
15
Exercises: Calculator warm up exercises on basic skills and concepts. Given the
expressions, use your calculator to obtain the indicated values
4. 20!__
15!5!
5. 5(3! 4! )
24
16