Chapter One: Basic Statistical Concepts and Notations

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

Chapter One

Basic Statistical Concepts and Notations

Objectives:

At the end of this chapter, the student should be able to:

1. define statistics;
2. differentiate descriptive from inferential statistics;
3. distinguish a variable from a constant, population from a sample, parameter
from an estimate;
4. differentiate qualitative from quantitative variables;
5. distinguish nominal, ordinal, interval and ratio measurement scales; and
6. understand the different theorems on summation notation;

Introduction

Students often wonder why statistics is very difficult and why they should bother
to study this subject. Many students consider statistics as a hindrance to finishing their
studies for the reason that many of them have a negative attitude towards it. Based on the
feedback of students, Statistics is the most feared subject since it deals with numerical
data. Further more, they said they are not inclined to working with numbers.

In order to read and understand a foreign language, it is always necessary to build


up an adequate vocabulary. The beginner should regard statistics as a foreign language--
one that will not for long remain entirely foreign (Guilford & Fruchter, 1981). The
vocabulary consists of concepts that are symbolized by words and by letter symbols that
are substituted for them. Along with Mathematics in general, Statistics shares the
ordinary symbols for numerical operations. Thus, much of the vocabulary is already
known to the students. As for the new concepts, their meanings will continue to grow as
the student uses them.

Much of the literature of any specialized field, particularly those dealing with
research, contains statistical symbols, ideas and concepts. It is therefore necessary to
build up an adequate vocabulary for statistics. Students are required to understand better
with the vocabulary in statistics so that they can follow through with the succeeding
lessons. When students familiarize the language and vocabulary in statistics, they will not
find the succeeding lessons difficult.

This chapter introduces the student to some basic statistical concepts and
notations.

1
What is Statistics?
Statistics is a collection of methods for planning experiments, obtaining data, and
then organizing, summarizing, presenting, analyzing, interpreting, and drawing
conclusions based on the data (Triola, 1998). It consists of a body of methods for
collecting and analyzing data (Agresti & Finlay, 1997). It is a method of dealing with
data. It is a tool concerned with the collection, organization, presentation, analysis and
interpretation of numerical information. Statistics is the science of collecting,
simplifying, and describing data, as well as making inferences (drawing conclusions)
based on the analysis of data (Chase and Bown, 1997). Its essential purpose is to describe
and draw inferences about numerical properties of a group or population (Walpole,
1989).

As the definitions suggest, there are two branches of statistics. Descriptive


Statistics are methods used to summarize the key characteristics of known population
data (Triola, 1998). It is concerned with the presentation of information in a convenient,
usable and understandable form (Runyon and Haber, 1986). Furthermore, descriptive
statistics includes any kind of data processing which is designed to summarize, or
describe important features of the data without attempting to infer anything that goes
beyond the data themselves. Other writers refer to descriptive statistics as the procedure
used in describing properties of a sample, or of a population where complete population
data are available.

Example 1: If we measure the Intelligence Quotient (IQ) of all the students in the
School of Graduate Studies and calculate its mean, that mean is a
descriptive statistics because it describes the characteristics of a
complete population.

Example 2. The average grade of a random sample of 50 students in Inferential


Statistics during the first semester of school year 2017-2018 was 1.65
with a standard deviation of 0.23. These two values, 1.65 and 0.23 are
called descriptive statistics because they describe the properties of a
sample.

In a random sample, members of the population are selected in such a way that
each has an equal chance of being selected. In the second example, the population is the
total number of students enrolled in Inferential Statistics during the first semester of
school year 2017 – 2018.

Inferential Statistics is a method which involve the use of sample data to make
generalizations or inferences about a population (Triola, 1998). It is concerned with
generalizing this information more specifically, with making inferences about population
which are based upon samples taken from population (Runyon & Haber, 1986). It is

2
concerned with judgments (or inferences) about a population based on the properties of
some sample obtained from the population (Chase and Bown ,1997). Here, a sample is
selected with the intent of predicting what the larger population is like.
Example 1: If we wish to make a statement about the mean IQ of all students in
the School of Graduate Studies at the Liceo de Cagayan University computed on
a sample of 100 students and estimate the error involved, we use the procedure
from inferential statistics.

Example 2. A researcher randomly selected 520 college students from the School
of Education, School of Arts and Sciences, School of Business
Administration and Information Technology and School of Extension
and Community Education. Each school is well represented using the
stratified random sampling with proportionate allocation. If the
average grade from each school will be compared by the researcher,
inferential statistics procedure will be done.

Academic records of the graduating classes during the past years at Liceo de
Cagayan University show that 90% of the entering freshmen eventually graduated. The
numerical value, 90%, is a descriptive statistic. If you are a member of the present
freshman class and conclude from this study that your chances of graduating are better
than 80 %, you have made a statistical inference that is subject to uncertainty.

In institution of higher learning particularly in the School of Graduate Studies,


statisticians with relevant degree be identified to direct and assist students in their data
processing, analysis and interpretation. Indeed statistics is a very powerful tool if
properly used. The abuse and misuse of statistical procedures will lead to erroneous
results. One should be careful to apply the correct and most appropriate if not efficient
procedure for the given conditions to obtain maximum information from the available
data.
The procedure used to analyze a set of data depend to a large degree on the
method used to collect the information. For this reason, it is desirable in any investigation
to consult with the statistician from the time the research study is planned until the final
results are analyzed and interpreted.

Terms and Concepts

Some terms and concepts you will meet often are the following:

Variable and Constant

Variable refers to a characteristics or phenomenon which may take on different


values. In addition, a variable is something that has two or more meaningful and useful

3
divisions, categories, characteristics, or values (Grimm & Wozniak, 1990). When we
assign names, numerals, and numbers to things that change in the real world, we have
variables. In fact, we would find it impossible to understand the world without variables.

Example: 1. Grade point average 3. Weight 5. Age 2. Height 4. Tribe


These will take on different values when different individuals are observed.

Another examples of variables are: shirt in different sizes (small, medium, large,
extra large). Social class with categories of upper, middle and lower class. Religion with
categories of Roman Catholic, Protestant, Seventh Day Adventist, Mormons, etc.

A variable is contrasted with a constant, the value of which never changes.

Example: pi, π, is a constant which always takes the value of 3.1416….

Population, Sample and Census

Population is a complete set of individuals, objects or measurements of interest in a


study. Sometimes the population is a clearly defined set of subjects.

Example: We may wish to investigate all the students grades after this course to find out
relationship between their Grade Point Average and their scores in other
foundation subjects.

Sample is a subset of a population. It is a portion of the population. Oftentimes it is


impossible to take all the members of the population because of cost, time and
manpower constraints. A subgroup may be selected to represent the total
population.

Example: We may choose only 100 students from the School of Graduate Studies at the
Bukidnon State College. The 100 students are then the sample.

Census is the collection of data from every element in the population (Triola, 1998). In
census there is what we call as complete enumeration.

Closely related to the concepts of population and sample are the concepts of parameter
and statistic. The following definitions are easy to remember if we recognize the
alliteration in “population parameter” and sample statistic.”

4
Parameter and Estimates

Parameter is any characteristic of the population which is measurable. It is a numerical


measurement describing some characteristic of a population. Usually, parameter or
population values are unknown. We estimate them from sample values. In statistical
notation, the Greek letters (e.g.,μμ and σare to represent population parameters).

Example 1. The grade point average and standard deviation of all students in the School
of Graduate Studies are examples of parameters.

Example 2. The average weekly amount spent by all the students of Bukidnon State
College.

Estimate or statistic calculated from a sample in order to estimate the population


parameter. It is a numerical summary of the sample data. We shall employ the
Roman letters (X and s) to represent estimates. Different symbols are used for
parameters and statistics.

Example 1. The grade point average and standard deviation (x and s) of a random
sample of 50 students in the School of Graduate Studies are examples of
parameters.

Example 2. The average weekly amount spent by all the students of Bukidnon State
College.

Table 1.

Some common notations of the characteristics with the corresponding parameters and
statistics
Characteristic Parameter Statistic

Mean μ, mu _
X

Standard deviation σ s

Variance σ2 s2

Pearson Correlation Coefficient ℜ r

Number of Cases N n

5
The Nature of Data

Some data sets consists of numbers (such as heights, scores in the test, etc.) and others
are non-numerical (such as gender). The terms quantitative and qualitative data are often
used to distinguish between these two types.

1. Quantitative data consists of numbers representing counts or measurements.

Quantitative data can be described by distinguishing between the discrete and


continuous types.
Discrete data result from either a finite number of possible values or countable number
of possible values. The number of possible values is 0, or 1, or 2 and so on. Continuous
data result from infinitely many possible values that can be associated with points on a
continuous scale in such a way that there are no gaps or interruptions. When data
represent counts, they are discrete; when they represent measurements, they are
continuous. The number of students in this class are discrete data; the amount each one
has in the wallet now are continuous data because they are measurements that can assume
any value over a continuous span.

Another way to classify data is to use four levels of measurement: nominal,


ordinal, interval land ratio.

The nominal level of measurement is characterized by data that consist of names, labels,
or categories only. The data cannot be arranged in an ordering scheme (such as low to
high). The simplest measurement scale is termed nominal or classificatory. A nominal
measurement scale is one in which the researcher assigns different numbers to mutually
exclusive categories. Mutually exclusive categories are those in which all observations
assigned to the same category and have a similar characteristic, and they differ on the
basis of a specific characteristic from observations in other categories. The categories of
nominal variables do not differ by quantity, degree, or amount, but only by kind.

Example 1. The two categories of the nominal variable “gender” (male and female) are
distinct, do not overlap, include possible sexes, and cannot be ordered or
ranked.

Example 2. The same would be true of the nominal variable “region” which might be
broken into the categories of NCR, Region I, Region II, Region III, Region
IV, Region V, Region VI, Region VII, Region VIII, Region IX, Region X,
Region XI, Region XII, and ARMM, etc.

Nominal scales represent the lowest level of measurement because they allow you
only to count and compare the number of cases in each category.

6
Other examples of nominal scales are given below:

The numbers on baseball players uniforms are nominal in nature. In Social


Science research, groups in sample are commonly labeled with numbers (such as 1 =
Matigsalog, 2 = Talaandig, 3 = Higaonon, 4 = Manobo). However, when these numbers
have been attached to categories, averaging the numbers together is not usually advisable.
On the scale above for ethnic groups, the average score of 1.87 would have no meaning.

The ordinal measurement scales involves data that may be arranged in some order, but
differences between data values either can not be determined or are meaningless. The
ordinal measurement scales classify people or things into types or kinds, but with one
additional feature. Here the classes or categories can be ranked. Ordinal categories are
distinct, mutually exclusive, and exhaustive, but they are also orderable in terms of
quantity, magnitude, or some other criteria. In other words, ordinal measurement scales
have the property of magnitude but not the property of equal intervals for the property of
absolute 0. It allows us to rank individuals or objects but not to say anything about the
meaning of the differences between the ranks.

Example. For example, the three categories of the ordinal scale “social classes” (upper,
middle, and lower) are distinct, do not overlap, include the entire range of
social class, and can be ranked: The upper class is higher than the middle class
and the middle class is higher than the lower class. No statement can be made
however about the amount of difference between categories. The differences
between upper and middle and between middle and lower are not calculable.
Another example is ranking students GPA. If you ranked 1st in a class of 400,
the rank indicates greater than or less than, but not how much higher or lower.

Example 2. A search for junior mathematicians screening committee ranks Dina 3rd,
Mercy 7th, and Kim 10th. We can find a difference between ranks of 3 and 7,
but the difference of 4 does not mean anything.

The interval level of measurement is like the ordinal level, with the additional property
that we can determine meaningful amounts of differences between data.
However, there is no inherent (natural) zero starting point (where none of the
quantity is present.

Although the categories of nominal and ordinal scales cannot be further


subdivided on a measurement scale, the values of interval permit distances and
differences between values on a scale to be considered or measured. Some social
researchers even distinguish between interval and ratio scales. In both cases interval
scales are of equal size. Whereas with interval scales there is an arbitrary zero point,

7
however, with ratio variables there is a true zero point where zero is equivalent to a total
absence of the variable.

Example1. For example, time measured by calendars temperature on the Fahrenheit scale,
and intelligence by IQ scores are interval variables because zero values do not
mean the total absence of time, temperature, or intelligence, respectively.

Example 2. Students scores in the college admission test and their grades in the card are
interval measurement scale.

The ratio level of measurement scale is the interval level modified to include the inherent
zero starting point (where zero indicates that none of the quantity is present).
For values at this level, differences and rations are both meaningfully.

Example 1. Age, income, and urbanization (percent of a population living in urban


places) are ratio variables because zero values do indicate a total absence of
those attributes.

Example 2. Distances (in miles) traveled by cars in a test of fuel consumption.

For most statistical purposes interval and ratio scales are treated as a similar type
of measurement scales. Note, however, that a major difference is the fact that one cannot
form ratios with values of interval scale. For example, it is incorrect to say that 60 ois
twice as hot as 30o; but it is correct to say that PhP 60,000.00 is twice as much as PhP
30,000.00. Because of the scarcity of interval variables, the ambiguity concerning the
differences between interval and ratio scales, and their similar statistical treatment, it
makes sense to treat these two types of measurement scales as one type. In general, we
should not calculate averages for data at the nominal or ordinal levels of measurement.

Table 2. Levels of Measurement of Data


Level Summary Example

Nominal Categories only. Data cannot Student course


be arranged in an ordering SAS
scheme. SED
SECE
SBAIT

Ordinal Categories are ordered, but Results in a cheering contest


differences can not be First
determined or they are Second

Interval meaningless. Third

Differences between values Students Scores


can be found but there maybe 90
no inherent starting point. 87
Ratios are meaningless. 88
Ratio 80

Like interval, but with an Weights of college football


inherent starting point. Ratios players 150 lb
are meaningful. 195 lb
300 lb
Exercises

A. Identify each of the following as either a nominal, and ordinal, or an interval/ratio


scales.
1. shirt/blouse size (small, medium, large)
2. weight in kilograms
3. students grade average in fourth year
4. undergraduate major (Social Science, Science, English, Mathematics)
5. height in meters

B. Identify each of the following underlined word(s) as either a variable or a constant.


1. Tribe of students in this class
2. Monthly allowance of students
3. Dates reflected on the calendar
4. Temperature of boiling water in o C
5. BSC Teachers educational qualification

C. Determine whether the underlined variable is continuous or discrete


1. distance traveled from Cagayan de Oro City to Malaybalay City
2. number of votes obtained by a candidate for a mayor
3. load in metric tons of a cargo boat
4. monthly expenses of students
5. number of passengers in WG & A
6. number stocks sold every day in the stock exchange.
7. hourly temperatures recorded at an observatory.
8. lifetime of a car.
9. diameter of the wheels of several cars.
10. number of children from 50 families.
11. Annual Census of Filipinos.

9
C. Indicate whether the following variables are categorical or quantitative: 1.
Favorite food.
2. Favorite profession.
3. Number of goals scored by your favorite team last season.
4. Number of students at your school.
5. The eye color of your classmates.
6. IQ of your classmates

D. Classify the following variables as categorical, quantitative discrete or continuous.


1. The nationality of a person.
2. Number of liters of water contained in a tank.
3. Number of books on a library shelf.
4. Sum of points tallied from a set of dice.
5. The profession of a person.
6. The area of the different tiles on a building.

10
Summation Notation:

Suppose the following quantities are to be added:

X1 + X2 + X3 + … + Xn. We would represent these n


operations as follows: Xi

i = 1
where ∑ isthe summation sign (Greek letter sigma) indicating that a series of
values is to be added.

i = 1 to n notation below and above the summation indicate that i takes on the
successive values from 1, 2, 3, up to n.
There are three important theorems governing the use of the summation sign. Ferguson
(1989) gives them as follows:

Theorem 1
If every variate value in a group is multiplied by a constant number of factor, that
factor may be removed from under the summation sign and written outside as a factor. In
symbols;

∑CX = CX + CX + CX + + CX
i
i =
1
... 1 2 3
= C (X1 + X2 + X3 + … + Xn) n
n
= C Xi

i = 1 5
i
Example 1. 5
∑ Xi
=
1

By applying theorem 1, this can be written

as 5
1
5 Xi
i
∑ =

Theorem 2

The summation of a constant over N terms is equal to NC.

11
n

∑ =++++CCCCC
...
i = 1
= nC 5

Example 2. , If C = 6 and n = 5, then 6



i
=
1
=6+6+6+6+6

= 5 X 6 = 30

Theorem 3

The summation of the sum of any number of terms is the sum of the summation of
these terms taken separately.

Thus

∑ X + Y + Z = X + Y + Z + X + Y + Z + X + Y + Z + + X + Y + Z ( ) ( ) ( ) ( ) ... ( )
i 1
111111222333= nnn

= ( ... ) ( ... ) ( ... ) X1 + X2 + X3 + + Xn + Y1 + Y2 + Y3 + + Yn + Z1 + Z2 + Z3 + + Zn

n n n
=
∑ ∑ ∑ XYZ I

++
i
i
i
===111
i
i

3
Example 3. AI Bi Ciis evaluated as ( )

++
i = 1

= (A1 + B1 + C1) + (A2 + B2 + C2) + (A3 + B3 + C3)

= (A1 + A2 + A3) + B1 + B2 + B3) + (C1 + C2 + C3)

3 3 3
=
∑ ∑ ∑ ABC i

++
i
i
i
===1i
i 1 1

Example 4. Applying all the theorems, answer the following:

For X1 = 3, X2 = 2, X3 = 1, X4 = 5, find the values of:

12
4 3
a. c.
∑ Xi ∑3
i1= i = 1

b.
∑ 2X d. ∑4
i
4
2
X
i
=
1

Answers:
4

∑ =+++ i

a.1 2 3 4 X X X X X
i
=
1
= 3 + 2 +1 + 5
= 11

4 4
b.
∑ ∑ 22
Xi X
=
i
i
=−
1 i 1

= 2(X1) + 2(X2) + 2(X3) + 2(X4) =


2(3) + 2(2) + 2(1) + 2(5) = 6 + 4 +
2 + 10
= 22

4
c. 3 =3+3+3+3

i = 14
= 12

2
d.
∑ Xi= 24
2
X1 + X + X + X
2 i 2 1
= 2
3

= 32 + 22 + 12 + 52
= 9 + 4 + 1 + 25
= 39

Another concrete illustrative example may prove helpful in grasping the nature of
summation notation. Let the following be paired scores. The values of X i below may be
viewed as scores for four people on a test, and the values of Y i may be viewed as scores for
the same four people in another test.

X1 = 5 Y1 = 2

X2 = 6 Y2 = 3

13
X3 = 12 Y3 = 7

X4 = 15 Y4 = 10

The following relate to these scores:

1.
∑ X = 5 + 6 + 12 + 15 = 38
i

2. Y
∑ = 2 + 3 + 7 + 10 = 22
i

3. 5X
∑ = 5 X 5 + 5X 6 + 5X 12 + 5 X 15
i

4.
∑ X = 5 X 5 + 6 X 6 + 12 X 12 + 15 X 15
2
i =
430

5.
∑(X − 5) = (5 − 5)+ (6 − 5)+ (12 − 5)+15 − 5
i

= 18

Exercises

A. Write the following in summation notation

1. X1 + X2 + X3 + … + Xn
2. (X1 + Y1) + (X2 + Y2) + … + (X9 + Y9)
2 2 2 2 2

1 X Y + X Y + X Y + X Y + ...+ X Y 3.12

12 23 34 4 12

4.CY + CY + CY + + CYn ....... 1 2 3

B. Write each of the following in full:

7
1. +
∑ i1 i1
=
=
5
N
3

Xi ∑( ) a b
3.
i i

2. 4. Xc
∑ ∑ i

i
=
Yi Zi 1 5 i
= + 1

C. For X1 = 2, X2 = 6, X3 = 4, X4 = 7, X5 = 4 find the values of


5
3

a. ()

d.
Xi ∑2
i
7 + = Xi
i 1 1
=

5 5
3
2
b. e.
∑ Xi ∑ Xi
i 1 i
7 = + = −
1

14
5
4

c. ()

f.
Xi Xi Xi ∑
2 i 1 i
= +− = Xi
2 1

D. If X1 = 4, X2 = -3, X3 = 6, and X4 = -1, evaluate the following:


4

a.
∑ ()
2
3
i =4 Xi Xi 1 −

b.
∑( )
i
1 = Xi +
2 2

c.
∑( ) X X i i

i 2
2/ =
+

E. Consider the following paired observations

X1 = 3 Y1 = 9
X2 = 5 Y2 = 2
X3 = 5 Y3 = 1
X4 = 4 Y4 = 5
X5 = 8 Y5 = 3

Calculate the following:

a. ∑Xi f. ∑X2i b. ∑Yi g. ∑Y2i


5 5

c. ()

h. ()
XiYi ∑
22
X i Yi
i1
=5 1
d. XiYi
∑ 3
i
i.
=5
XiYi ∑2
i =5 1 i =5 1

3
e. ()
∑ Xi
j.
Xi Yi ∑ 2
i = 1 i = 1
15
Exercises: Calculator warm up exercises on basic skills and concepts. Given the
expressions, use your calculator to obtain the indicated values

1. 3.40 + 2.76 + 2.91 + 1.78 + 3.41


5

2. 12.95 + 10.45 + 14.75 + 15.35


4

3. (12 – 7.5)2 + (22 – 26.3)2


8.5 25.3

4. 20!__
15!5!

5. 5(3! 4! )
24

Terms and Concepts to Remember

Statistics Qualitative Variables


Descriptive Statistics Quantitative Variables
Inferential Statistics Nominal Variables
Population Ordinal Variables
Sample Interval/Ratio Variables
Parameter Summation Notation
Estimate Summation Sign
Variable Constant

16

You might also like