CSE 323 (1) Statistics in Education
CSE 323 (1) Statistics in Education
CSE 323 (1) Statistics in Education
Meaning of Statistics,
Types of statistics data in education.
Scales of measurement, tabular representation of data
Basic steps in using SPSS in calculating measure of central tendency
Measure of dispersion
T-test analysis, Correlation Analysis, One way and Two ways Analysis of
variance (ANOVA), Chi-Square Statistic.
Introduction
Statistics is a branch of applied mathematics that involves the collection, description, analysis,
and inference of conclusions from quantitative data. The mathematical theories behind statistics
rely heavily on differential and integral calculus, linear algebra, and probability theory.
Statisticians, people who do statistics, are particularly concerned with determining how to draw
reliable conclusions about large groups and general phenomena from the observable
characteristics of small samples that represent only a small portion of the large group or a limited
number of instances of a general phenomenon.
Data are the facts and figures that are collected, analyzed, and summarized for presentation and
interpretation. Data may be classified as either quantitative or qualitative. Quantitative data
measure either how much or how many of something, and qualitative data provide labels, or
names, for categories of like items. For example, suppose that a particular study is interested in
characteristics such as age, gender, marital status, and annual income for a sample of 100
individuals. These characteristics would be called the variables of the study, and data values for
each of the variables would be associated with each individual. Thus, the data values of 28, male,
single, and $30,000 would be recorded for a 28-year-old single male with an annual income of
$30,000. With 100 individuals and 4 variables, the data set would have 100 × 4 = 400 items. In
this example, age and annual income are quantitative variables; the corresponding data values
indicate how many years and how much money for each individual. Gender and marital status
are qualitative variables. The labels male and female provide the qualitative data for gender, and
the labels single, married, divorced, and widowed indicate marital status.
Sample survey methods are used to collect data from observational studies, and experimental
design methods are used to collect data from experimental studies. The area of descriptive
statistics is concerned primarily with methods of presenting and interpreting data using graphs,
1
tables, and numerical summaries. Whenever statisticians use data from a sample—i.e., a subset
of the population—to make statements about a population, they are performing statistical
inference. Estimation and hypothesis testing are procedures used to make statistical inferences.
Fields such as health care, biology, chemistry, physics, education, engineering, business, and
economics make extensive use of statistical inference.
Methods of probability were developed initially for the analysis of gambling games. Probability
plays a key role in statistical inference; it is used to provide measures of the quality and precision
of the inferences. Many of the methods of statistical inference are described in this article. Some
of these methods are used primarily for single-variable studies, while others, such as regression
and correlation analysis, are used to make inferences about relationships among two or more
variables.
The two major areas of statistics are known as descriptive statistics, which describes the
properties of sample and population data, and inferential statistics, which uses those properties to
test hypotheses and draw conclusions.
Understanding Statistics
Statistics are used in virtually all scientific disciplines such as the physical and social sciences, as
well as in business, the humanities, government, and manufacturing. Statistics is fundamentally a
branch of applied mathematics that developed from the application of mathematical tools
including calculus and linear algebra to probability theory.
2
In practice, statistics is the idea we can learn about the properties of large sets of objects or
events (a population) by studying the characteristics of a smaller number of similar objects or
events (a sample). Because in many cases gathering comprehensive data about an entire
population is too costly, difficult, or flat out impossible, statistics start with a sample that can
conveniently or affordably be observed.
Two types of statistical methods are used in analyzing data: descriptive statistics and inferential
statistics. Statisticians measure and gather data about the individuals or elements of a sample,
then analyze this data to generate descriptive statistics. They can then use these observed
characteristics of the sample data, which are properly called "statistics," to make inferences or
educated guesses about the unmeasured (or unmeasured) characteristics of the broader
population, known as the parameters.
Descriptive Statistics
Descriptive statistics mostly focus on the central tendency, variability, and distribution of sample
data. Central tendency means the estimate of the characteristics, a typical element of a sample or
population, and includes descriptive statistics such as mean, median, and mode. Variability refers
to a set of statistics that show how much difference there is among the elements of a sample or
population along the characteristics measured, and includes metrics such as range, variance, and
standard deviation.
The distribution refers to the overall "shape" of the data, which can be depicted on a chart such
as a histogram or dot plot, and includes properties such as the probability distribution function,
skewness, and kurtosis. Descriptive statistics can also describe differences between observed
characteristics of the elements of a data set. Descriptive statistics help us understand the
collective properties of the elements of a data sample and form the basis for testing hypotheses
and making predictions using inferential statistics.
Inferential Statistics
Inferential statistics are tools that statisticians use to draw conclusions about the characteristics
of a population from the characteristics of a sample and to decide how certain they can be of the
reliability of those conclusions. Based on the sample size and distribution of the sample data
statisticians can calculate the probability that statistics, which measure the central tendency,
3
variability, distribution, and relationships between characteristics within a data sample, provide
an accurate picture of the corresponding parameters of the whole population from which the
sample is drawn.
Inferential statistics are used to make generalizations about large groups, such as estimating
average demand for a product by surveying a sample of consumers' buying habits, or to attempt
to predict future events, such as projecting the future return of a security or asset class based on
returns in a sample period.
Regression analysis is a common method of statistical inference that attempts to determine the
strength and character of the relationship (or correlation) between one dependent variable
(usually denoted by Y) and a series of other variables (known as independent variables). The
output of a regression model can be analyzed for statistical significance, which refers to the
claim that a result from findings generated by testing or experimentation is not likely to have
occurred randomly or by chance but are instead likely to be attributable to a specific cause
elucidated by the data. Having statistical significance is important for academic disciplines or
practitioners that rely heavily on analyzing data and research.
Descriptive statistics are used to describe or summarize the characteristics of a sample or data
set, such as a variable's mean, standard deviation, or frequency. Inferential statistics, in contrast,
employs any number of techniques to relate variables in a data set to one another, for example
using correlation or regression analysis. These can then be used to estimate forecasts or infer
causality.
1. Statistics are Aggregate of Facts: Only those facts which are capable of being studied in
relation to time, place or frequency can be called statistics. Individual, single or unconnected
figures are not statistics because they cannot be studied in relation to each other. Due to this
reason, only aggregate of facts e.g., data relating to I.Q. of a group of students, academic
achievement of students, etc. are called statistics and are studied in relation to each other.
4
2. Statistics are Affected to a marked Extent by Multiplicity, of Causes: Statistical data are
more related to social sciences and as such, changes are affected to a combined effect of many
factors. We cannot study the effect of a particular cause on a phenomenon. It is only in physical
sciences that individual causes can be traced and their impact is clearly known. In statistical
study of social sciences, we come to know the combined effect of multiple causes.
For example, deterioration of achievement score in academic sphere of some students may not be
only due to lack of interest in school subjects, but may also due to lack of motivation, effective
teaching methods, attitude of the students on school subjects, faulty scoring procedure, etc.
Similarly scores on memory test of a group certainly depend on meaningfulness of learning
materials, maturation of the students, methods of learning, motivation, interest of the students,
etc.
3. Statistics are Numerically Expressed: Qualitative phenomena which cannot be numerically
expressed, cannot be described as statistics e.g. honesty, goodness, ability, etc. But if we assign
numerical expression, it maybe described as ‘statistics’.
4. Statistics are Enumerated or estimated according to Reasonable Standards of Accuracy:
The standard of estimation and of accuracy differs from enquiry to enquiry or from purpose to
purpose. There cannot be one standard of uniformity for all types of enquiries and for all
purposes. A single student cannot be ignored while calculating I.Q. of 100 students in group
whereas 10 soldiers can be easily ignored while finding out I.Q. of soldiers of whole country.
Similarly we can ignore ten deaths in a country but we cannot ignore even a single death in a
family. The amount of time and resources at disposal also determine the amount of accuracy in
estimates.
5. Statistics are Collected in a Systematic Manner: In order to have reasonable standard of
accuracy statistics must be collected in a very systematic manner. Any rough and haphazard
method of collection will not be desirable for that may lead to improper and wrong conclusion.
Accuracy will also be not definite and as such cannot be believed.
6. Statistics for a Pre-determined Purpose: The investigator must have a purpose beforehand
and then should start the work of collection. Data collected without any purpose is of no use.
Suppose we want to know intelligence of a section of people, we must not collect data relating to
income, attitude and interest. Without having a clear idea about the purpose we will not be in a
position to distinguish between necessary data and unnecessary data or relevant data and
irrelevant data.
5
7. Statistics are Capable of being Placed in Relation to each other: Statistics is a method for
the purpose of comparison etc. It must be capable of being compared, otherwise, it will lose
much of its value and significance. Comparison can be made only if the data are homogeneous.
Data on memory test can be compared with I.Q. not with salary status of parents. It is with the
use of comparison only that we can depict changes which may relate to time, place, frequency or
any other character, and statistical devices are used for this purpose.
Concepts in Statistics:
1. Data: You might be reading a newspaper regularly. Almost every newspaper gives the
minimum and the maximum temperature recorded in the city on the previous day. It also
indicates the rainfall recorded, and the time of sunrise and sunset. In the school, attendance
of the students are recorded in a register regularly.
For a patient, the doctor advises recording of the body temperature at regular intervals. If we
record the minimum and maximum temperature, or rainfall, or the time of sunrise and sunset, or
attendance of children, or the body temperature of the patient, over a period of time, what we are
recording is known as data.
Here we are recording the data of minimum and maximum temperature of the city, data of
rainfall, data for the time of sunrise and sunset, and the data pertaining to the attendance of
children.
Table 2.0 gives the data for class-wise attendance of students. Here the data comprise 7
observations in all. These observations are, attendance for class VI, VII, and so on. So, data
6
refers to the set of observations, values, elements or objects under consideration. The complete-
set of all possible elements or objects is called a population.
Each of the elements is called a piece of data. Data also refers to the known facts or things used
as basis for inference or reckoning facts, information, material to be processed or stored.
2. Scores: Scores or other numbers in continuous series are to be thought of as distances along a
continuum, rather than as discrete points. An inch is the linear magnitude between two divisions
on a foot rule; and, in like manner, a score in a mental test is a unit distance between two limits.
A score of 120 upon an intelligence examination, for example, represents the interval 119.5 up to
120.5.
The exact midpoint of this score interval is 120 as shown below:
Other scores may be interpreted in the same way. A score, of 15, for, instance, includes all values
from 14.5 to 15.5, i.e., any value from a point .5 unit below 15 to a point .5 unit above 15. This
means that 14.7, 15.0 and 15.4 would all be scored 15. “The usual mathematical meaning of a
score is an interval which extends along some dimension from .5 unit below to .5 unit above
the face value of the score.”
[
3. Variable: In the field of education and psychology we study differences in respect of the
persons’ personality traits, abilities, aptitudes, etc. For example, college students of the same
class would differ in their performance on a particular test or on marks obtained in examinations.
In all such cases, we are dealing with characteristics that vary or fluctuate in a rather
unpredictable way. We find that, shape or quality is a characteristic on which objects vary; speed
is a characteristic on which animals vary; height is a characteristic on which trees vary and
people vary in respect of various characteristics like age, sex, height, weight and personality
traits etc.
The characteristic on which individuals differ among themselves is called a variable. Thus speed,
shape, height, weight, age, sex, grades are variables in the above examples. In educational and
psychological studies we often deal with variables relating to intellectual abilities. Now, it is the
aim of every physical and behavioural science to study the nature of the variation in whatever
7
variable it is dealing with, and therefore, it is necessary to measure the extent and type of
variation in a variable. Statistics is a branch of science which is concerned with the study of
variables that vary in unpredictable fashion and helps in providing an understanding of the
phenomena and objects which show such variations.
4. Measurement Scales: Measurement refers to the assignment of numbers to objects and events
according to logical acceptable rules. The numbers have many properties, such as identity, order
and additivity. If we can legitimately assign numbers in describing of objects and events, then the
properties of numbers should be applicable to the objects and events.
It is essential to know about the different kinds of measurement scales, as the number of
properties applicable depends upon the measurement scale applied to the objects or events.
Let us take four different situations for a class of 30 students:
i. Assigning them roll nos. from 1 to 30 on random basis.
ii. Asking the students to stand in a queue as per their heights and assigning them position
numbers in queue from 1 to 30.
iii. Administering a test of 50 marks to all students and awarding marks from 0 to 50, as per their
performance.
iv. Measuring the height and weight of students and making student-wise record.
In the first situation, the numbers have been assigned purely on arbitrary basis. Any student
could be assigned No. 1, while anyone could be assigned No. 30. No two students can be
compared on the basis of allotment of numbers, in any respect.
The students have been labelled from 1 to 30 in order to give each an identity. This scale refers
to nominal scale. Here the property of identity is applicable but the properties of order and
additivity are not applicable.
In the second situation, the students have been assigned their position numbers in queue from 1
to 30. Here the numbering is not on arbitrary basis. The numbers have been assigned according
to the height of the students. So the students are comparable on the basis of their heights, as there
is a sequence in this regard.
Every subsequent child is taller than the previous one, and so on. This scale refers to ordinal
scale. Here the object or event has got its identity, as well as order. As the difference in height of
8
any two students is not known, so the property of addition of numbers is not applicable to the
ordinal scale.
In the third situation, the students have been awarded marks from 0 to 50 on the basis of their
performance in the test administered on them. Consider the marks obtained by 3 students, which
are 30, 20 and 40 respectively. Here it may be interpreted that the difference between the
performance of the 1st and 2nd student is the same, as between the performance of the 1st and
the 3rd student.
However, no one can say that the performance of the 3rd student is just the double of the 2nd
student. This is because there is no absolute zero and a student getting 0 marks, cannot be termed
as having zero achievement level. This scale refers to interval scale. Here the properties of
identity, order and additivity are applicable.
In the fourth situation, the exact physical values pertaining to the heights and weights of all
students have been obtained. Here the values are comparable in all respects. If two students have
heights of 120 cm. and 140 cm, then the difference in their heights is 20 cm and the heights are
in the ratio 6:7. This scale refers to ratio scale.
Examples can be multiplied to show that human behaviour and statistical methods have much in
common. In fact statistical methods are so closely connected with human actions and behaviour
that practically all human activity can be explained by statistical methods. This shows how
important and universal statistics is.
9
(i) Statistics in Planning: Statistics is indispensable in planning—may it be in business,
economics or government level. The modern age is termed as the ‘age of planning’ and almost
all organizations in the government or business or management are resorting to planning for
efficient working and for formulating policy decision.
To achieve this end, the statistical data relating to production, consumption, birth, death,
investment, income are of paramount importance. Today efficient planning is a must for almost
all countries, particularly the developing economies for their economic development.
(ii) Statistics in Mathematics: Statistics is intimately related to and essentially dependent upon
mathematics. The modern theory of Statistics has its foundations on the theory of probability
which in turn is a particular branch of more advanced mathematical theory of Measures and
Integration. Ever increasing role of mathematics into statistics has led to the development of a
new branch of statistics called Mathematical Statistics.
Thus Statistics may be considered to be an important member of the mathematics family. In the
words of Connor, “Statistics is a branch of applied mathematics which specialises in data.”
(iii) Statistics in Economics : Statistics and Economics are so intermixed with each other
that it looks foolishness to separate them. Development of modern statistical methods has
led to an extensive use of statistics in Economics.
Statistics of Public Finance enables us to impose tax, to provide subsidy, to spend on various
heads, amount of money to be borrowed or lent etc. So we cannot think of Statistics without
Economics or Economics without Statistics.
10
(iv) Statistics in Social Sciences: Every social phenomenon is affected to a marked extent by a
multiplicity of factors which bring out the variation in observations from time to time, place to
place and object to object. Statistical tools of Regression and Correlation Analysis can be used to
study and isolate the effect of each of these factors on the given observation.
Sampling Techniques and Estimation Theory are very powerful and indispensable tools for
conducting any social survey, pertaining to any strata of society and then analysing the results
and drawing valid inferences. The most important application of statistics in sociology is in the
field of Demography for studying mortality (death rates), fertility (birth rates), marriages,
population growth and so on.
(v) Statistics in Trade: As already mentioned, statistics is a body of methods to make wise
decisions in the face of uncertainties. Business is full of uncertainties and risks. We have to
forecast at every step. Speculation is just gaining or losing by way of forecasting. Can we
forecast without taking into view the past? Perhaps, no. The future trend of the market can only
be expected if we make use of statistics. Failure in anticipation will mean failure of business.
Changes in demand, supply, habits, fashion etc. can be anticipated with the help of statistics.
Statistics is of utmost significance in determining prices of the various products, determining the
phases of boom and depression etc. Use of statistics helps in smooth running of the business, in
reducing the uncertainties and thus contributes towards the success of business.
(vi) Statistics in Research Work: The job of a research worker is to present the result of his
research before the community. The effect of a variable on a particular problem, under differing
conditions, can be known by the research worker only if he makes use of statistical methods.
Statistics are everywhere basic to research activities. To keep alive his research interests and
research activities, the researcher is required to lean upon his knowledge and skills in statistical
methods.
Briefly, the advantages of statistical thinking and operations in research are as follows:
11
1. They permit the most exact kind of description: The goal of science is description of
phenomena. The description should be complete and accurate so that it can be useful to anyone
who can understand it when he reads the symbols. Mathematics and statistics are a part of the
descriptive language, an outgrowth of our verbal symbols.
2. They force us to be definite: Statistics makes the activities of a researcher definite and exact
—both in his procedures and thinking. Statistics systematizes the efforts of a researcher and
leads him towards the goal.
3. They help us to summarize the results: Masses of observations taken by themselves are
bewildering and almost meaningless. Statistics enables us to summarize our results in
meaningful and convenient form. Before we can see the forest as well as the trees, order must be
given to the data. Statistics provides an unrivalled device for bringing order out of chaos, of
seeing the general picture in one’s results.
4. They enable us to draw general conclusions: And the process of extracting conclusions is
carried out according to accepted rules. Furthermore, by means of statistical steps, we can say
about how much faith should be placed in any conclusion and about how far we may extend our
generalization.
5. They enable us to make predictions: Of “how much” of a thing will happen under
conditions we know and have measured. For example, we can predict the probable mark a
freshman will earn in college algebra if we know his score in a general academic aptitude test,
his score in a special algebra-aptitude test, his average mark in high-school mathematics, etc.
Our prediction may be somewhat in error, but statistical method will tell us about how much
margin of error to allow in making predictions.
6. They enable us to analyze some of the casual factors of complex and otherwise
bewildering events.
SPSS
SPSS (originally, Statistical Package for the Social Sciences) (Statistical Product and Service
Solution) PASW (Predictive Analytics SoftWare) was released in its first version in 1968 after
being developed by Norman H. Nie, Dale H. Bent, and C. Hadlai Hull. Norman Nie was then a
political science postgraduate at Stanford University, and now Research Professor in the
Department of Political Science at Stanford and Professor Emeritus of Political Science at the
University of Chicago. SPSS is among the most widely used programs for statistical analysis in
social science. It is used by market researchers, health researchers, survey companies,
12
government, education researchers, marketing organizations and others. The original SPSS
manual (Nie, Bent & Hull, 1970) has been described as one of "sociology's most influential
books". In addition to statistical analysis, data management (case selection, file reshaping,
creating derived data) and data documentation (a metadata dictionary is stored in the data file)
are features of the base software.
The software was released in its first version in 1968 as the Statistical Package for the Social
Sciences (SPSS) after being developed by Norman H. Nie, Dale H. Bent, and C. Hadlai Hull.
Those principals incorporated as SPSS Inc. in 1975. Early versions of SPSS Statistics were
written in Fortran and designed for batch processing on mainframes, including for example IBM
and ICL versions, originally using punched cards for data and program input. A processing run
read a command file of SPSS commands and either a raw input file of fixed format data with a
single record type, or a 'getfile' of data saved by a previous run. To save precious computer time
an 'edit' run could be done to check command syntax without analyzing the data. From version
10 (SPSS-X) in 1983, data files could contain multiple record types.
SPSS Statistics versions 16.0 and later run under Windows, Mac, and Linux. The graphical user
interface is written in Java. The Mac OS version is provided as a Universal binary, making it
fully compatible with both PowerPC and Intel-based Mac hardware. Prior to SPSS 16.0, different
versions of SPSS were available for Windows, Mac OS X and Unix. The Windows version was
updated more frequently and had more features than the versions for other operating systems.
SPSS Statistics version 13.0 for Mac OS X was not compatible with Intel-based Macintosh
computers, due to the Rosetta emulation software causing errors in calculations. SPSS Statistics
15.0 for Windows needed a downloadable hotfix to be installed in order to be compatible with
Windows Vista.
SPSS Inc announced on July 28, 2009 that it was being acquired by for US$1.2 billion. Because
of a dispute about ownership of the name "SPSS", between 2009 and 2010, the product was
referred to as PASW (Predictive Analytics SoftWare). As of January 2010, it became "SPSS: An
IBM Company". Complete transfer of business to IBM was done by October 1, 2010. By that
date, SPSS: An IBM Company ceased to exist. IBM SPSS is now fully integrated into the IBM
Corporation, and is one of the brands under IBM Software Group's Business Analytics Portfolio,
together with IBM Algorithmics, IBM Cognos and IBM OpenPages.
13
SPSS Data Collection was acquired October 31, 2015 by UNICOM Systems, Inc., a division of
UNICOM Global, under the UNICOM Intelligence brand. However, Statistics included in the
base software:
Descriptive statistics: Cross tabulation, Frequencies, Descriptive, Explore, Descriptive
Ratio Statistics
Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial, distances),
Nonparametric tests
Prediction for numerical outcomes: Linear regression
Prediction for identifying groups: Factor analysis, cluster analysis (two-step, K-means,
hierarchical), Discriminant
The “Statistical Package for the Social Sciences” is a user friendly but powerful statistical
analysis and data management system. Please note that the program does have a very exhaustive
and generally clear online help facility that can be accessed by clicking the Help button from the
menu bar.
Launching SPSS
SPSS for Windows is activated by double-clicking on the SPSS icon. This is often either located
on the desktop or in the Start menu, under Programs, Applications, SPSS (or IBM SPSS).
Likewise, on OS X, SPSS can be launched from either the dock or the Applications folder.
14
SPSS is spreadsheet based: most data entry work will be done using the spreadsheets in Data
View (inputting cases) or in Variable view (defining factors). Note that, by default, SPSS creates
a new dataset upon launch entitled “Untitled1”, which appears in the title bar. If we were to
save a dataset with a new name, the title bar would update. Below the title bar is the menu bar
with a series of headings (File, Edit, View, Data, Transform, Analyze, Direct Marketing, Graphs,
Utilities, Add-ons, Window, and Help). Below the menu bar is the tool bar, which contains one-
click shortcuts to a variety of convenient features. Finally, note that, on the bottom of the screen
(in the “status bar”), the application reports that the “IBM SPSS Statistics Processor is ready
Data View
Data View has the appearance of a spreadsheet, with columns designated by the variable names
of the active data set and rows numbered sequentially. Each column of the grid represents a
single variable, and each row is used for a single case. With large datasets, you may have to use
the scroll bars to navigate through all of the data.
Variable View
If you are starting a project from scratch, you typically want to begin by defining your
variables. This is done in “Variable View” each variable can be defined with the following
properties:
15
Name: Variable name (appears in Data View; must begin with a letter and be unique)
Type: Defines the type of variable (commonly choose either numeric or string)
Width: Defines how many digits SPSS should store for a variable (8 digits is the default)
Decimals: Specifies the number of decimals to show in Data View for a new variable
(e.g. use 0 for integer data)
Label: A long variable name that is used to designate the variable in output
Values: Apply labels for values (useful w/ordinal data, e.g. 6 indicates “strongly
agree”)
Missing: Defines how you specify missing data (e.g. use 99 to indicate missing)
Columns: Defines the width of the column for the variable in Data View
Align: Specifies how to align values in column (“left”, “center”, “right”)
Measure: Specifies the level of measurement (“scale”, “ordinal”, “nominal”)
Role: New in SPSS v19, this property allows for the automatic inclusion of a variable in
certain dialog boxes (not recommended)
At minimum, one should always define their variable names and measures!
16
View: A few system specifications such as whether to show grid lines and whether to
show variables names or labels.
Data: Allows batch manipulation of the entries in Data View (most frequently used
features include adding variables and/or cases; split file, which is used to split the dataset
by a grouping variable; and select cases, which is used to run analyses on only a subset of
the cases).
Transform: This menu is useful if you want to manipulate a variable, e.g. to conduct a
transformation of some sort. Manipulations can be done using “Compute Variable”.
Analyze: This menu serves as the backbone of SPSS and grants access to all of the
statistical procedures included in the software. Below is a brief guide to some of these:
o Descriptive Statistics: Measures of central tendency, frequencies, general data
exploration. Also, measures of expected frequency (e.g. chi-square tests) can be
conducted using “Crosstabs”.
o Compare Means: This is where you can find t-tests (independent and repeated
measures) and one-way independent measure ANOVAs.
o General Linear Model: This is the menu for complex ANOVA designs such as
two-way (unrelated, related, or mixed), one-way with repeated measures, and
multivariate analyses of variance.
o Mixed Models: This menu can be used for running multilevel linear models.
o Correlate: This menu contains methods to calculate Pearson’s R, Spearman’s Rho
and Kendall’s tau, as well as partial correlations.
o Regression: A variety of regression models, ranging from simple linear
regression, to multiple linear regression, to more advanced techniques like logistic
regression.
o Data Reduction: For conducting factor analysis.
o Scale: For conducting reliability analysis.
o Nonparametric Tests: There are a variety of non-parametric statistics available,
such as the Mann-Whitney test, the Kruskal-Wallis test, Wilcoxon’s test, and so
on, for use if you have violated the assumptions of parametric tests.
Graphs: This menu is used to construct visual representations of your data. This can be
conducted through the more dynamic “chart builder” or the older Legacy Dialogs.
17
Window: This menu is primarily used to switch between windows (e.g. to go from data
view to the output window or back again).
Utilities: This menu has a few interesting options (e.g. allowing for the inclusion of data
file comments) although more are less useful.
Add-ons: SPSS sells several additional packages, such as a sample power calculator, that
can be accessed through this menu.
Help: This is an invaluable menu because it offers online help on both the software (via
“Topics” and “Tutorial”) and the underlying statistical tests (via “Statistics Coach”).
The Toolbar
The toolbar allows for quick access to a variety of useful commands. Under “View – Toolbars –
Customize” you can customize the icons that appear as well their size (large or small). Also note
that, by default, SPSS enables a feature called “ToolTips”, which will tell you what a button
does if you hover your mouse over it for a few seconds.
18
Select Cases: This is useful for only running analyses on a particular subset of your data
file (e.g. only run my analysis on data if sex = 0 to look at a particular gender)
Value Labels: Toggles between showing either the numeric values or the value labels
specified in the Variable View when looking at your data in Data View.
Use Variable Sets and Show All Variables: This allows for the designation of multiple
“sets” or clusters of variables to toggle between (useful with a dataset that has many
variables, akin to Excel’s “hide columns”). The second button reveals the entire dataset.
Spell Check: Only available in variable view, this allows for a quick spell check of all of
your defined variable labels (not variable names), as well as value labels.
Variable type
Variable Type specifies the data type for each variable. By default, all new variables are assumed
to be numeric. You can use Variable Type to change the data type. The contents of the Variable
Type dialog box depend on the selected data type. For some data types, there are text boxes for
width and number of decimals; for other data types, you can simply select a format from a
scrollable list of examples. The available data types are as follows:
o Numeric: A variable whose values are numbers. Values are displayed in standard
numeric format.
o Comma: A numeric variable whose values are displayed with commas delimiting every
three places, and with the period as a decimal delimiter.
o Dot: A numeric variable whose values are displayed with periods delimiting every three
places, and with the comma as a decimal delimiter
o Scientific notation: A numeric variable whose values are displayed with an embedded E
and a signed power-of-ten exponent. The exponent can be preceded either by E or D with
an optional sign, or by the sign alone--for example, 123, 1.23E2, 1.23D2, 1.23E+2, and
even 1.23+2.
o Date: A numeric variable whose values are displayed in one of several calendar date or
clock-time formats. Select a format from the list. You can enter dates with slashes,
hyphens, periods, commas, or blank spaces as delimiters.
o Dollar: A numeric variable displayed with a leading dollar sign ($), commas delimiting
every three places, and a period as the decimal delimiter. You can enter data values with
or without the leading dollar sign.
o Custom currency: A numeric variable whose values are displayed in one of the custom
currency formats that you have defined in the Currency tab of the Options dialog box.
19
o String: Values of a string variable are not numeric, and hence not used in calculations.
They can contain any characters up to the defined length. Uppercase and lowercase letters
are considered distinct.
In SPSS Variable names consists of the following rules.
Names must begin with a letter.
Names must not end with a period.
Names must be no longer than eight characters.
Names cannot contain blanks or special characters.
Names must be unique.
Names are not case sensitive. It doesn’t matter if you call your variable CLIENT, client, or
ClIENT It’s all client to SPSS.
3. ) Measurement level
You can specify the level of measurement as scale (numeric data on an interval or ratio scale),
ordinal, or nominal. Nominal and ordinal data can be either string (alphanumeric) or numeric.
Nominal: A variable can be treated as nominal when its values represent categories with
no intrinsic ranking (for example, the department of the company in which an employee
works). Examples of nominal variables include region, zip code, and religious affiliation.
Ordinal: A variable can be treated as ordinal when its values represent categories with
some intrinsic ranking (for example, levels of service satisfaction from highly dissatisfied
to highly satisfied). Examples of ordinal variables include attitude scores representing
degree of satisfaction or confidence and preference rating scores.
Scale: A variable can be treated as scale (continuous) when its values represent ordered
categories with a meaningful metric, so that distance comparisons between values are
20
appropriate. Examples of scale variables include age in years and income in thousands of
dollars.
Options in SPSS
Under Edit – Options is a series of preferences you can set within the SPSS environment. By
and large, most of the default values are completely acceptable although you may be inclined to
tweak them here and there. For instance, under the “General” table, you can change
whether to display variables by names or by their labels in output. Under “Viewer”, you
can specify font options for the output viewer. Under “File Locations”, you can change the
default directory that SPSS will look in for finding saved datasets and output. One power user
feature is to select a template for the output file under “Charts”. The default SPSS
installation includes almost 30 different styles (e.g. “APA style”) that can make your output
look much nicer!
21
2) Open the file in SPSS using the import wizard.
3) Open the file in its native application and simply copy and paste the data into SPSS.
Unfortunately, the first approach is not available using Microsoft Excel (but is from other
statistical applications, such as SAS). The most recommended way of importing data is to
attempt to open them directly within SPSS. This simply involves selecting File – Open – Data
and then changing the file type from *.sav to what you are trying to import (e.g. an Excel file,
*.xls, *.xlsx, and *.xlsm). If the spreadsheet you are importing contains variable names in the
first row check the “Read variable names” option, otherwise leave it unchecked.
Exercise: 1) Use Find and Replace to change the string Sector values to numeric values.
2) Ensure each variable has the proper type and level of measurement specified.
3) Add variable labels to the six variables.
4) Add value labels to the Sector variable.
The third method of importing data has no real benefits over using the built-in Open command,
although it can be quicker. It is possible to simply highly a column (or matrix) of data in Excel,
select Edit – Copy and paste it into an SPSS data frame. The main downside of this procedure is
that variable names are not kept.
22
Importing from other file types follows a similar process. If you ever find yourself with a text
file that you want to import, there is an automated tool (File – Read Text Data) that walks you
through the steps of designating column length for the variables and so on.
WINDOWS
The main windows you will use in SPSS are the Data Editor, the Viewer, the Pivot Table Editor, the
Chart Editor and the Syntax Editor. These windows are summarized here, when you begin to analyze
your data, you will have a number of these windows open at the same time. Some students find this idea
very confusing. Once you get the hang of it, it is really quite simple. You will always have the Data
Editor open because this contains the data file that you are analyzing. Once you start to do some
analyses, you will have the Viewer window open because this is where the results of all your analyses are
displayed, listed in the order in which you performed them.
The different windows are like pieces of paper on your desk—you can shuffle them around, so that
sometimes one is on top and at other times another. Each of the windows you have open will be listed
along the bottom of your screen. To change windows, just click on whichever window you would like to
have on top. You can also click on Window on the top menu bar. This will list all the open windows and
allow you to choose which you would like to display on the screen.
Sometimes the windows that SPSS displays do not initially fill the screen. It is much easier to have the
Viewer window (where your results are displayed) enlarged on top, filling the entire screen. To do this,
look on the top right-hand area of your screen. There should be three little buttons or icons. Click on the
middle button to maximize that window (i.e. to make your current window fill the screen). If you wish to
shrink it again, just click on this middle button.
The Data Editor window displays the contents of your data file, and in this window you can
open, save and close existing data files, create a new data file, enter data, make changes to the
existing data file, and run statistical analyses
23
Viewer window
When you start to do analyses, the Viewer window should open automatically (see Figure
below). If it does not open automatically, click on Window from the menu and this should be
listed. This window displays the results of the analyses you have conducted, including tables and
charts. In this window you can modify the output, delete it, copy it, save it, or even transfer it
into a Word document.
The Viewer screen consists of two parts. On the left is an outline or navigation pane, which gives
you a full list of all the analyses you have conducted. You can use this side to quickly navigate
your way around your output (which can become very long). Just click on the section you want
to move to and it will appear on the right-hand side of the screen. On the right-hand side of the
Viewer window are the results of your analyses, which can include tables and graphs (also
referred to as charts in SPSS).
Saving output
When you save the output from SPSS, it is saved in a separate file with a .spv extension, to
distinguish it from data files, which have a .sav extension. If you are using a version of SPSS
prior to version 18, your output will be given a .spo extension. To read these older files in SPSS
Statistics 18, you will need to download a Legacy Viewer program from the SPSS website.
24
To save the results of your analyses, you must have the Viewer window open on the screen in
front of you. Click on File from the menu at the top of the screen. Click on Save. Choose the
directory and folder in which you wish to save your output, and then type in a file name that
uniquely identifies your output. Click on Save. To name my files, I use an abbreviation that
indicates the data file I am working on and the date I conducted the analyses. For example, the
file survey8may2009.spv would contain the analyses I conducted on 8 May 2009 using the
survey data file. I keep a log book that contains a list of all my file names, along with details of
the analyses that were performed. This makes it much easier for me to retrieve the results of
specific analyses. When you begin your own research, you will find that you can very quickly
accumulate a lot of different files containing the results of many different analyses. To prevent
confusion and frustration, get organized and keep good records of the analyses you have done
and of where you have saved the results.
It is important to note that the output file (with a .spv extension) can only be opened in SPSS.
This can be a problem if you, or someone that needs to read the output, does not have SPSS
available. To get around this problem, you may choose to ‘export’ your SPSS results. If you
wish to save the entire output, select File from the menu and then choose Export. You can
choose the format that you would like to use (e.g. pdf, Word/rtf). Saving as a Word/rtf file means
that you will be able to modify the tables in Word. Use the Browse button to identify the folder
you wish to save the file into, specify a suitable name in the Save File pop-up box that appears
and then click on Save and then OK.
25
If you don’t want to save the whole file, you can select specific parts of the output to export.
Select these in the Viewer window using the left-hand navigation pane. With the selections
highlighted, select File from the menu and choose Export. In the Export Output dialog box
you will need to tick the box at the top labelled Selected and then select the format of the file
and the location you wish to save to.
Printing output
You can use the navigation pane (left-hand side) of the Viewer window to select particular
sections of your results to print out. To do this, you need to highlight the sections that you want.
Click on the first section you want, hold down the Ctrl key on your keyboard and then just click on any
other sections you want. To print these sections, click on the File menu (from the top of your screen) and
choose Print. SPSS will ask whether you want to print your selected output or the whole output.
The tables you see in the Viewer window (which SPSS calls pivot tables) can be modified to suit
your needs. To modify a table you need to double-click on it, which takes you into what is
known as the Pivot Table Editor. You can use this editor to change the look of your table, the
size, the fonts used and the dimensions of the columns—you can even swap the presentation of
variables around (transpose rows and columns).
If you click the right mouse button on a table in the Viewer window, a pop-up menu of options
that are specific to that table will appear. If you double-click on a table and then click on your
right mouse button even more options appear, including the option to Create Graph using these
results. You may need to highlight the part of the table that you want to graph by holding down
the Ctrl key while you select the parts of the table you want to display.
26
Syntax Editor window
In the ‘good old days’, all SPSS commands were given using a special command language or
syntax. SPSS still creates these sets of commands to run each of the programs, but all you
usually see are the Windows menus that ‘write’ the commands for you. Although the options
available through the SPSS menus are usually all that most undergraduate students need to use,
there are some situations when it is useful to go behind the scenes and to take more control over
the analyses that you wish to conduct.
Syntax is a good way of keeping a record of what commands you have used, particularly when
you need to do a lot of recoding of variables or computing new variables . It is also useful when
you need to repeat a lot of analyses or generate a number of similar graphs.
You can use the normal SPSS menus to set up the basic commands of a particular statistical
technique and then ‘paste’ these to the Syntax Editor using a Paste button provided with each
procedure (see Figure below). It allows you to copy and paste commands, and to make
modifications to the commands generated by SPSS. Quite complex commands can also be
written to allow more sophisticated recoding and manipulation of the data. SPSS has a
Command Syntax Reference under the Help menu if you would like additional information.
The commands pasted to the Syntax Editor are not executed until you choose to run them. To
run the command, highlight the specific command (making sure you include the final full stop),
or select it from the left-hand side of the screen, and then click on the Run menu option or the
arrow icon from the menu bar. Extra comments can be added to the syntax file by starting them
with an asterisk (see Figure below).
Syntax is stored in a separate text file with a .sps extension. Make sure you have the syntax
editor open in front of you and then select File from the menu. Select the Save option from the
drop-down menu, choose the location you wish to save the file to and then type in a suitable file
name. Click on the Save button.
27
The syntax file (with the extension .sps) can only be opened using SPSS. Some-times it may be
useful to copy and paste the syntax text from the Syntax Editor into a Word document so that
you (or others) can view it even if SPSS is not available. To do this, hold down the left mouse
button and drag the cursor over the syntax you wish to save. Choose Edit from the menu and
then select Copy from the drop-down menu. Open a Word document and paste this material
using the Edit, Paste option or hold the Ctrl key down and press V on the keyboard.
This section is broken into two main parts. First, we will look at the techniques used to explore
the relationship among variables (e.g. between age and optimism), followed by techniques you
can use when you want to explore the differences between groups (e.g. sex differences in
optimism scores). I have separated the techniques into these two sections, as this is consistent
with the way in which most basic statistics texts are structured and how the majority of students
will have been taught basic statistics. This tends to somewhat artificially emphasis the difference
between these two groups of techniques. There are, in fact, many underlying similarities between
the various statistical techniques, which is perhaps not evident on initial inspection. A full
discussion of this point is beyond the scope of this note.
I have deliberately kept the summaries of the different techniques brief and simple, to aid initial
understanding. This part certainly does not cover all the different techniques available, but it
does give you the basics to get you started and to build your confidence. Note that there are
necessary assumptions before you use any of these statistic
28
Exploring relationships
Often in survey research you will not be interested in differences between groups, but instead in
the strength of the relationship between variables. There are a number of different techniques
that you can use.
Correlation
Pearson correlation or Spearman correlation is used when you want to explore the strength of the
relationship between two continuous variables. This gives you an indication of both the direction
(positive or negative) and the strength of the relationship. A positive correlation indicates that as
one variable increases, so does the other. A negative correlation indicates that as one variable
increases, the other decreases.
Partial correlation
Partial correlation is an extension of Pearson correlation—it allows you to control for the
possible effects of another confounding variable. Partial correlation ‘removes’ the effect of the
confounding variable (e.g. socially desirable responding), allowing you to get a more accurate
picture of the relationship between your two variables of interest.
Multiple regression
Multiple regression is a more sophisticated extension of correlation and is used when you want
to explore the predictive ability of a set of independent variables on one continuous dependent
measure. Different types of multiple regression allow you to compare the predictive ability of
particular independent variables and to find the best set of variables to predict a dependent
variable.
Factor analysis
Factor analysis allows you to condense a large set of variables or scale items down to a smaller,
more manageable number of dimensions or factors. It does this by summarizing the underlying
patterns of correlation and looking for ‘clumps’ or groups of closely related items. This
technique is often used when developing scales and measures, to identify the underlying
structure.
Summary
29
All of the analyses described above involve exploration of the relationship between continuous
variables. If you have only categorical variables, you can use the Chi Square Test for
Relatedness or Independence to explore their relationship (e.g. if you wanted to see whether
gender influenced clients’ dropout rates from a treatment program). In this situation, you are
interested in the number of people in each category (males and females who drop out
of/complete the program) rather than their score on a scale.
T-tests
T-tests are used when you have two groups (e.g. males and females) or two sets of data (before
and after), and you wish to compare the mean score on some continuous variable. There are two
main types of t-tests. Paired sample t-tests (also called repeated measures) are used when you are
interested in changes in scores for participants tested at Time 1, and then again at Time 2 (often
after some intervention or event). The samples are ‘related’ because they are the same people
tested each time. Independent sample t-tests are used when you have two different (independent)
groups of people (males and females), and you are interested in comparing their scores. In this
case, you collect information on only one occasion but from two different sets of people.
30
between-groups (or independent samples) ANOVA, where you are comparing the mean scores
of two or more different groups of people.
Two-way analysis of variance
Two-way analysis of variance allows you to test the impact of two independent variables on one
dependent variable. The advantage of using a two-way ANOVA is that it allows you to test for
an interaction effect—that is, when the effect of one independent variable is influenced by
another; for example, when you suspect that optimism increases with age, but only for males.
It also tests for ‘main effects’—that is, the overall effect of each independent variable (e.g. sex,
age). There are two different two-way ANOVAs: between-groups ANOVA (when the groups are
different) and repeated measures ANOVA (when the same people are tested on more than one
occasion). Some research designs combine both between-groups and repeated measures in the
one study. These are referred to as ‘Mixed Between-Within Designs’, or ‘Split Plot’.
Analysis of covariance
Analysis of covariance (ANCOVA) is used when you want to statistically control for the
possible effects of an additional confounding variable (covariate). This is useful when you
suspect that your groups differ on some variable that may influence the effect that your
independent variables have on your dependent variable. To be sure that it is the independent
variable that is doing the influencing, ANCOVA statistically removes the effect of the covariate.
Analysis of covariance can be used as part of a one-way, two-way or multivariate design.
Intensive Practicals
31