Introduction To IBM SPSS Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 85

Introduction to IBM

SPSS Statistics

Alan Taylor, Department of Psychology


Macquarie University

© 2001-2011 Macquarie University


-ii-

Note: During the life of v. 17 of SPSS, it underwent a name


change, to PASW Statistics 17. PASW stands for Predictive
Analytics Software. This name change was carried forward
to v. 18. A further name change meant that v. 19 is called
IBM SPSS Statistics, as indicated by the title of this
handbook.

For convenience, and to save space, we'll continue to refer to


SPSS.

Although this handbook is updated for v. 19 of SPSS, the


sharp-eyed reader will notice that many of the screen-shots
are from earlier versions. In order to minimise the task of
copying, editing and formatting the graphics, older versions
have been retained when there seemed to be no danger of
confusing readers.
-iii-

Contents

1. Introduction 1
2. Starting SPSS 1
3. What You See 1
4. Setting Some Options 2
5. Moving Between the Windows 5
6. Two Ways to Carry Out Analyses 6
7. Getting Data into SPSS 6

8. Exercise 1: Keying Data into SPSS and Carrying out Some Basic
Analyses 6
Naming the Variables 7
Value Labels 8
Entering the Numbers 8
Saving the Data 9
Descriptive Statistics 10
Looking at Output 11
Deleting Unwanted Output 14
Printing Output 15
Clearing the Output Window 15
Using the Syntax Window 15
Saving the Syntax 17
Getting SPSS to Carry out the Commands 18
Looking at and Printing the Text Output 18
Saving Output 19
Exiting from SPSS 20

9. Exercise 2: Calling up an SPSS Data File, Altering the Data,


Selecting Cases and Carrying Out Some Analyses 20
Calling up the Dataset 20
Having a Look at the Variable Names 20
Looking at the Distributions 20
Correlations 21
Factor Analysis 22
Editing a Table 23
The Correlations of Items with Factors 26
Testing Reliability 26
Reliability Output 28
Creating the Scales 28
Missing Values 30
-iv-

Correlations 30
Scatterplots 30
Editing a Graph and Identifying a Case 32
Omitting the Outlier from the Correlation Calculation (using select cases) 33
Producing a t-test 35
Comparing Means for More than Two Groups 37
Recoding Variables 38
Analysis of Variance 40
Post-hoc Tests 41
Plots of Means 42
Crosstabulations 44
Finishing the Exercise 45

10. Further Reading 46

Appendix 1: Questions from the Work Attitudes Study 47


Appendix 2: Reading Raw Data into SPSS for Windows 51
Free Format 52
Fixed Format 52
Reading a Free-Format Data File 52
Reading a Fixed-Format Data File 56

Appendix 3: Reading Data From Excel into SPSS for Windows 59


Appendix 4: Finding the Syntax for a Point-and-Click Command 61
Appendix 5: Getting Help with Syntax 63
Appendix 6: Copying SPSS Output into Word 65
Appendix 7: Examples of Data Manipulation 69
Appendix 8: Some SPSS Procedures 75

Appendix 9: Three Useful Tips 79


1. Introduction
Welcome to Introduction to IBM SPSS Statistics. This is intended as a do-it-
yourself introduction, which you can read and work through alone, but it can also
be used in a class.

This handbook is based on Version 19 of IBM SPSS Statistics (referred from here
on as SPSS), which is very similar for Windows and Macintosh (the screenshots
are from the Windows version, but see below). Much of it applies to earlier
Windows versions as well and also to recent versions for Macintosh.

The handbook is intended as an introduction only, not a comprehensive manual,


and not a statistics textbook. Once you have worked through this introduction,
you should have no trouble learning more advanced aspects of SPSS from the
various books mentioned later, some of which give detailed material on statistics
not specific to SPSS.

2. Starting SPSS
Depending on how SPSS has been set up, you can either double-click on the SPSS
icon on the desktop of your computer or click on the button, then on
Programs, etc, as shown below:

3. What You See


When started (and it can take a while to show itself), SPSS displays the Data
Window:
-2-

If the Data Window doesn't fill the whole screen, click on the enlarging button
at the top right of the screen.

The initial display shows a list of previously-used data and other files. If you don't
want to use this display in a session, you can click on the Cancel button; if you will
never want to use the display, you can check the Don't show this again box. An
alternative way of opening the most recently-used files is provided by the Recently
Used Data and Recently Used Files options at the bottom of the File menu.

The Data Window is where data can be entered and edited, and we will be doing that
shortly. However, before we begin, we will set up some options and defaults (or
make sure that they're already set), so that your work with SPSS will produce what we
expect it to produce.

4. Setting Some Options


To set the options, click on Edit at the top left of the screen, then on Options at the
bottom of the resulting pull-down menu. The following display will appear.

An initial choice concerns the Look and feel of


SPSS:
-3-

In the interests of standardisation, we'll switch to SPSS Standard for the rest of this
handbook, but you may, like me, decide that you feel more comfortable with the
Windows (or Macintosh, on a Mac) look (and feel).

Select Display names in

Select File in if you would


like variables to be listed in
the order they appear in the
file, or Alphabetical if you
would like them to be listed
alphabetically regardless of their position in the file. (It's often possible to change
this setting 'on the fly' while you're using a procedure, so don't feel that you're
entering into a commitment at this point.)

Make sure that there is a tick beside

If you don't want to have more than one dataset open at a time, which can be very
handy if you are wanting to combine a number of datasets, for example, but can be
confusing, check this box:

Check this box:

You may prefer to work in


centimetres rather than inches:

Now, click on the Viewer tab at the top of the screen:

You will see the following display:


-4-

Make sure there is a tick at:

Make sure that in the Initial Output State panel, Log is set to Shown:

Now click on the Output Labels tab. The following display will appear:
-5-

Make sure that Names and Labels (for variables) or Values and Labels (for variable
values) appear in each of the slots as shown on the above display.

Finally, if you make any changes, click on and .

To have some of the options take effect, you may have to close down SPSS
(click on File then Exit) then start the program up again.

Note: There are many other options you can set in SPSS. As an example, one useful
option, on the Charts tab, enables you to switch from using colour to using patterns
(e.g., dotted lines) in graphs, helpful when you aren't using a colour printer.

5. Moving Between the Windows in SPSS


After you have set the options as described in Section 4, SPSS will still start up with
the Data Window displayed, but a second window, the Syntax Window, will also be
available. When you carry out an analysis a third window, the Output or Viewer
-6-

Window, will appear. There are three ways to move from window to window. (1)
Using the mouse, click on Window at the top of the screen, then click on the window
you want to see. (2) Holding down the Alt key, press the Tab key once. A grey
oblong will appear in the middle of the screen. On it are icons representing the
various windows you have open in Windows (not just those for SPSS) and the title of
one of the windows (the one surrounded by a rectangle). Each time you tap the Tab
key (while continuing to hold down the Alt key), a different window is selected. Keep
tapping until the window you want to see is selected, then release both the Alt and Tab
keys. (3) Click on the appropriate SPSS icon at the bottom of the screen, then click on
the window you want. For example, with the Data Window (containing an open data
file called financial risk- …) and unnamed Syntax and Viewer Windows open, these
options appear:

Click once on one of the options to go to the window indicated.

6. Two Ways to Carry Out Analyses in SPSS


When you want SPSS to carry out an analysis of your data, or modify the data in
some way, you have to give it the appropriate instructions. There are two ways of
doing this. (1) Using the mouse to pull down the appropriate menu (e.g., Analyze) and
selecting the appropriate options. (2) Keying in written commands (which are called
syntax) in the Syntax Window and running them. In this introduction we will be
using both methods, and these will be illustrated in the exercises described below.

7. Getting Data into SPSS


There are a number of ways to get your data (usually numbers) into SPSS they can be
analysed. In the exercises below we will be using two methods: (1) The most basic
method, which is to enter the numbers directly into the Data Window and (2) Reading
the data in from a saved SPSS (.sav) data file. Two other methods, reading data from
a text or ASCII data file, and translating the data from an Excel spreadsheet, are
covered in Appendices 2 and 3 respectively, and you should familiarise yourself with
both.

8. Exercise 1: Keying Data into SPSS and Carrying out


Some Basic Analyses
Before you start, make sure that you have checked the options and set them
appropriately as described in Section 4 of this handout.

We are going to enter into SPSS, via the Data Window, a very small dataset with
only four variables and 10 cases:
-7-

id gender iq gpa
1 1 105 1.33
2 1 92 3.5
3 2 104 2.67
4 2 94 2.75
5 1 106 2.75
6 1 111 2.25
7 2 100 3
8 2 95 3
9 1 120 3
10 1 90 2.5

Naming the Variables

1. Go to the Data Window (see Section 5 of this handout if you don't know how to
do this).

2. Double-click in the grey area (where it


says var) at the top of the first white
column.

3. The Data Window will switch from

the to the

and you will see a column with a heading Name. Type the names of the variables (id,
gender, iq and gpa) into the column, one per line.

Note: SPSS variable names must start with a letter, and contain no spaces. However,
they may contain numbers and underscores. The letters can be upper- or lower-case
or a mixture of the two. We'll use all lower-case.
-8-

We would like to have labels attached to the values for gender (1 and 2), so click in
the Values column for gender, then click once on the gray square with the three dots.
The display shown in 4. below will appear.

Value Labels

4. Type the number 1 in the window beside Value and the text Male in the Value
Label slot. Click on the Add button. Repeat the process with 2 and Female, then
click on OK.

5. Click on the Data View tab at the bottom left of the screen, and you will be able to
see the columns with the appropriate variable names at the top.

Entering the Numbers

6. Enter the numbers from the table on page 6 above into the appropriate columns
and rows of the Data Window. When you have entered a number in a cell, press
the Enter key to move to the cell below the first, or the Tab key to move to the
right of the first cell. You may also use the arrow keys to move between
cells.When you have finished, the window should look like this:
-9-

Saving the Data

7. Before we go any further, we'll save the data you have entered, so that you can
always call it up again if something goes wrong. Pull down the File menu and
click on Save. The display you see will depend on whether you are using a stand-
alone machine, or one connected to a network server. In the stand-alone case, you
may see something like the following:

8. It's not usually a good idea to keep your data files in the SPSS directory, so you
will want to navigate to a suitable directory that you have created, or to a memory
-10-

stick. Having done that, type demo1.sav into the File name slot, and press the
Save button. This will save the data you have entered onto your hard disk or onto
your memory stick.

9. The Ouput Window will show the syntax equivalent of the command for saving
the data file. Go to the Data Window and you will see that, now that you have
saved the data, the name of the file appears at the top of the Data Window:

Descriptive Statistics

10. We would now like to have a first look at the data we have entered. Probably the
best way is to ask for some one-way frequency distributions. Request these as
follows:

Click on Analyze at the top of the screen, then click on Descriptive Statistics,
then Frequencies, as shown here:
-11-

You will see the display below:

This is a common type of "gateway" to SPSS procedures. The immediate task is


to tell SPSS which variables you would like frequencies for. To select all the
variables you can EITHER click on a variable then click on the button to
move that variable into the right-hand panel, and repeat this to select all
the variables, OR click on each of the variables while holding down the Ctrl key,
then click on the arrow, OR click on the first variable in the list then click on the
last in the list while holding down the Shift key, then click on the arrow, OR click
on the first variable and drag down the list while holding down the left mouse
button, then click on the arrow.

When you have selected the variables, the OK button will become available, and
you should click on it. SPSS will then carry out the analysis.

Looking at Output

SPSS should now show you the Viewer Window. Notice that there are two
panels. The left
panel is like a
"Contents" page; it's
called the Outline.
You can see
particular parts of
the output in the
right-hand window
(the actual results)
by clicking once on
the appropriate icon
in the Outline. You
can also delete bits
of the results by
selecting the
-12-

appropriate icon in the Outline and pressing the Delete key. Furthermore, you can
"hide" some of the output (make it disappear, but not permanently, from the results
window) by double-clicking on the appropriate icon in the Outline. An open book
icon, for example, , means that the piece of output will be shown in the
right-hand panel; a closed book, for example, , , means that the output
will not be shown in the right-hand panel. By double-clicking on an open book,
you can hide the corresponding item in the right-hand panel, and by double-
clicking on a closed book, you can cause the item to be shown in the right-hand
panel.

You will want to experiment with the Outline, but for this exercise we'll concentrate
on the right-hand panel and hide the Outline. To do this, use the mouse to move the
arrow cursor to the vertical line separating the Outline panel from the right-hand
panel. When the arrow is immediately over the line, the cursor will change to a pair
of short vertical lines with an arrow pointing left and another pointing right. Holding
the left mouse button down, drag the vertical line to the left of the screen so that the
Outline panel disappears. We can now see just the right-hand panel, which contains
the output or results.

11. The first part of the output shows the syntax or command equivalent of the
instructions which you have given by pointing-and-clicking. (If the commands
aren't shown, go back to Section 4 and check that you have specified that the
contents of the log are initially shown, and that you have asked for the commands
to be shown in the log.)

FREQUENCIES
VARIABLES=id gender iq gpa
/ORDER= ANALYSIS .

12. The next part of the output shows the names of the variables for which SPSS has
produced tables, the number of valid (i.e., non-missing) cases for each variable
and the number of missing cases. You should have 10 cases and no missing
values. If this isn't the case, go back to the Data Window and check your data
entry.

13. Finally, we get to the tables we're interested in. The four tables below are called
pivot tables. As we'll see later on, the tables are editable. We'll examine just one
of them, but you should make sure that each of your tables is the same as the ones
shown here. If they aren't, go to the Data Window and make any necessary
alterations. The table for gender shows that there are six males and four females
and that they make up 60% and 40% respectively of the sample. In all of these
tables, the Percent and the Valid Percent are the same, but if some values are
missing, the calculation of Percent includes the missing cases, while such cases
are excluded from the calculation of the Valid Percent.
-13-

id

Cumulative
Frequency Percent Valid Percent Percent
Valid 1.00 1 10.0 10.0 10.0
2.00 1 10.0 10.0 20.0
3.00 1 10.0 10.0 30.0
4.00 1 10.0 10.0 40.0
5.00 1 10.0 10.0 50.0
6.00 1 10.0 10.0 60.0
7.00 1 10.0 10.0 70.0
8.00 1 10.0 10.0 80.0
9.00 1 10.0 10.0 90.0
10.00 1 10.0 10.0 100.0
Total 10 100.0 100.0

gender

Cumulative
Frequency Percent Valid Percent Percent
Valid 1.00 Male 6 60.0 60.0 60.0
2.00 Female 4 40.0 40.0 100.0
Total 10 100.0 100.0

iq

Cumulative
Frequency Percent Valid Percent Percent
Valid 90.00 1 10.0 10.0 10.0
92.00 1 10.0 10.0 20.0
94.00 1 10.0 10.0 30.0
95.00 1 10.0 10.0 40.0
100.00 1 10.0 10.0 50.0
104.00 1 10.0 10.0 60.0
105.00 1 10.0 10.0 70.0
106.00 1 10.0 10.0 80.0
111.00 1 10.0 10.0 90.0
120.00 1 10.0 10.0 100.0
Total 10 100.0 100.0

gpa

Cumulative
Frequency Percent Valid Percent Percent
Valid 1.33 1 10.0 10.0 10.0
2.25 1 10.0 10.0 20.0
2.50 1 10.0 10.0 30.0
2.67 1 10.0 10.0 40.0
2.75 2 20.0 20.0 60.0
3.00 3 30.0 30.0 90.0
3.50 1 10.0 10.0 100.0
Total 10 100.0 100.0
-14-

Hint: Notice that the SPSS output gives the variable names in lower case, as we
entered them. When referring to the variables (when typing commands, for example),
however, you may use upper or lower case: to SPSS, GPA is the same as gpa.

Deleting Unwanted Output

15. SPSS produces a lot of output, not all of which you'll want to print (especially if
you're using your own paper). There are various ways of selecting what is to be
printed. We'll look at two ways. One involves deleting everything you don't
want, then asking SPSS to print all visible output. The other involves selecting
what you want to print, then printing the selection.

16. Looking at the output you have produced, the first three items are:
FREQUENCIES
VARIABLES=id gender iq gpa
/ORDER= ANALYSIS .
Statistics

id gender iq gpa
N Valid 10 10 10 10
Missing 0 0 0 0

id

Cumulative
Frequency Percent Valid Percent Percent
Valid 1.00 1 10.0 10.0 10.0
2.00 1 10.0 10.0 20.0
3.00 1 10.0 10.0 30.0
4.00 1 10.0 10.0 40.0
5.00 1 10.0 10.0 50.0
6.00 1 10.0 10.0 60.0
7.00 1 10.0 10.0 70.0
8.00 1 10.0 10.0 80.0
9.00 1 10.0 10.0 90.0
10.00 1 10.0 10.0 100.0
Total 10 100.0 100.0

While it was useful to have these items shown in the output, we may not want to print
them. To select them for deletion, click once on each, holding down the Ctrl key after
the first. All the selected items will have a frame around them. To delete them,
simply press the Delete key. Now only the three frequencies tables for gender, iq and
gpa will be left.
-15-

Printing Output

17. We'll now have a look


at how to print
selected output using
both the methods
mentioned in 15.
above. (a) Make sure
that none of the tables
remaining in the
output window is
selected (i.e., that none
has a frame around it).
Now, click on File and
Print. Something like
the following display
will appear:

Notice that in the


bottom left, we are being offered only one option, All visible output. Clicking on
OK will print all three tables in the Output Window. We won't print the tables at
this stage, so click Cancel.

(b) To demonstrate the selection of one item for printing, click once on any of the
tables remaining in the Output Window. Now click on File and Print. This time the
bottom left of the display looks like this:

Here you have the option of printing just the table you've selected
(Selection) or all the tables in the Output Window (All visible
output).

If you have access to an appropriate printer, you may like to print the selected table.
Otherwise, try printing when you have access to a printer.

Clearing the Output Window

18. Before going any further, we'll clear all the output from the Output Window. This
is something you'll want to do quite often, as you find that the results of an
analysis are not what you wanted and decide to run it again after an alteration.
One way of selecting all output (not just what can be seen in the Output Window)
is to click on Edit, then on Select all. A keyboard shortcut is to press the Ctrl and
a keys together. Either way, once you have selected all the output, press the
Delete key, and all the output will be deleted (not just made invisible).

Using the Syntax Window

19. For much of the time, you can use the point-and-click method (as in 10.) to carry
out analyses. However, there are some procedures and options which are
-16-

available only with syntax, or typed commands. Also, users who have learned
some syntax may find it faster and less laborious than pointing-and-clicking. So,
we’ll now try an example with syntax which will also provide good old-fashioned
text output, as opposed to the pivot table-type output which we saw above.

20. Switch to the Syntax Window. (If there isn't a Syntax Window, go to Section 4 of
this handout and set the appropriate options. Whatever the options are, you can
always open a new Syntax Window by clicking on File, New then Syntax.)

21. Type the following lines in the top part of the Syntax Window, exactly as given:

manova gpa by gender (1,2)


/print=cellinfo(means)
/design.

The Enhanced Syntax Window

Version 17 of SPSS introduced enhancements to the Syntax Window. When you enter a
command, SPSS isn't passive as in earlier versions, but monitors what you type.

Auto-completion

One of the first consequences of this is that the window offers helpful suggestions (auto-
completion) to help you select your command. For example, if you were using v. 17
when typing the above command (manova …. ) SPSS may have responded as follows:

You may or may not find this helpful. If you would rather go it alone, you can click on
Edit Options, select the Syntax Window tab, and remove the tick from
-17-

Colour coding

The Syntax Window colour codes different parts of the syntax (commands themselves,
subcommands, etc) and also uses colour to indicate that you have typed a legitimate
command. You can turn off the colour coding by going to the Syntax Window tab, but I
must admit I've quickly got used to having colours and miss it when using earlier
versions of SPSS. My advice is to persevere and change the colour coding to suit your
preferences. For example, I've found it very helpful to have SPSS display comments
(text preceded by an asterisk ['*'] and ending with a full-stop which SPSS shows in the
output but does not try to execute) in red, rather than the default grey.

Other features

Features such as the navigation and Error Tracking panes, line numbering, and the
ability to create bookmarks in syntax files, may well be useful under certain
circumstances, and are things for you to explore.

Bear in mind that you can remove and modify all of the features of the enhanced Syntax
Window to suit yourself.

Hint: In SPSS for Windows, all commands must end with a full stop. Slashes are used to separate
subcommands within overall commands. Unlike earlier versions of SPSS, SPSS for Windows does
not require that continuations of a command be indented, but some of us who have used SPSS for a
long time still hold to the practice. If a subcommand is started on a new line, it is an SPSS convention
that the slash appears at the beginning of the line rather than at the end of the previous line, but there is
little or no functional difference, and some of us put slashes at the end of the line out of habit.

Saving the Syntax

22. Before going any further, save the commands you've typed in. Like the data you
entered, the contents of the Syntax Window can be saved and called up again on
another occasion. Click on File, then Save. Use demo1.sps as the filename.

Hint: SPSS expects that SPSS files will have the following filename extensions:
SPSS data files: .sav
SPSS syntax or command files: .sps
SPSS output files: .spv
It is best to adhere to these conventions, although you don't have to.

Note: Before version 16, output files had the suffix .spo. Version 16 and later versions will
not open these files, and you will need to have the so-called Legacy Smartviewer to open
them.
-18-

Getting SPSS to Carry out the Commands

23. There are various ways to get SPSS for Windows to carry out your commands.
We'll mention the two simplest methods. Make sure that your cursor is
somewhere on the commands you've entered. For example, in this case the cursor
happens to be between the "g" and the "p" of gpa:

Then EITHER click on the button OR press the Ctrl and r keys together.

Note: If you have a number of commands which you would like to have carried out one after
the other, you can use the mouse to highlight the commands (you only have to highlight part
of each command). When you press the button, or Ctrl-r, all the highlighted
commands will be carried out.

Looking at and Printing the Text Output

24. The Output Window should look like this:


-19-

Following the listing of the commands used (which has been clicked on and therefore
selected in the above Output Window), the text output is in one piece (i.e., it does not
consist of "pivot" tables, which can be separately selected). You can select it with a
click for printing or simply print all visible output exactly as described above.

Note: In versions before 16, text output like that from MANOVA is sometimes deemed "too
big" by SPSS, and not all of it is shown. When this happens, a red arrow indicates that there
is more output, as in this example.

There are two ways of coping with this. (a) Click anywhere on the incomplete output. The
output will be framed. Scroll down to the bottom of the incomplete output until you come to
the bottom edge of the frame:

Position the cursor over the middle black square until a double-headed arrow appears.
Holding down the left mouse button, drag the bottom edge of the frame down until all of the

output is shown. This can be a tedious business, so it's worth considering method (b).

(b) Click on the incomplete output with the right mouse button. The following display
appears: Now click on SPSS Rtf Document Object, and then on
Open. SPSS will then open a new window, called

In this window you may scroll through the entire out-


put and print it if you wish. Make sure that you close
the window when you've finished looking at the
output.
(Hint: If you click on Edit instead of Open after
clicking on SPSS Rtf Document Object, you can edit the text output much as you would in a
word processor. When you close the new window, the altered version will be copied into the
main Output Window.)

Saving Output

25. Output (either text or pivot tables) can be saved and called up and looked at later
with SPSS. Simply click on the File menu then Save. Save your MANOVA
output, calling it demo1.spv.
-20-

Exiting from SPSS

26. For practice, close down SPSS by going to the Data Window, then either (a)
clicking on the File menu then on Exit, or (b) Double-clicking on the SPSS icon
in the top left of the screen, or (c) clicking once on the cross at the top right of
the screen.

9. Exercise 2: Calling up an SPSS Data File, Altering the


Data, Selecting Cases and Carrying Out Some Analyses
This exercise is based on selected variables from a Work Attitude Survey conducted
by Psychology students in 1996. The questions on which the data are based, and the
names of the variables, are shown in Appendix 1. We will use the data to demonstrate

recoding variables (e.g., grouping values together to create categories)


computing new variables (e.g., combining items to create a scale score)
selecting cases to carry out separate analyses (e.g., temporarily removing a case
from the analysis)
some basic and not-so-basic analyses.

Note: The instructions given here for things you have already done in Exercise 1 are
brief. If you are not sure how to do what is required, look back at Exercise 1.

Calling up the Dataset

The dataset is called workmot.sav. There are three ways of obtaining it. (1) Load or
copy it from a common directory on the Student Server. (2) Download it from the
website http://www.psy.mq.edu.au/psystat/download.htm . (3) Ask me to email it to
you.

1. Start up SPSS for Windows. Click on File, Open, Data and open workmot.sav.

Having a Look at the Variable Names

2. To see the names of the variables in the file, and to see any variable labels which
have been attached, click on Utilities and Variables. (A more compact list of
variables and variable labels can be obtained by running the syntax display
labels.)

Looking at the Distributions

3. It's always a good idea to have a look at the one-way frequency tables for your
variables before doing anything further. So, use the frequencies command (see 10
in Exercise 1 above) to get a one-way table for all variables. Inspect the results
for each variable to make sure there are no out-of-range values, etc.
-21-

Syntax equivalent:

freq all. OR

freq id to d6h.

Correlations

4. We are going to create two scales from the variables d6a to d6h, which deal with
the respondents' assessments of the factors involved in "getting ahead " in their
organisation (see Appendix 1). In order to see whether the items can be combined
to make meaningful, reliable scales, we will examine the correlations, carry out a
factor analysis and check the reliability of the items. To ask SPSS to calculate the
correlations, click on Analyze Correlate Bivariate, then select items d6a to
d6h before clicking on OK.

Syntax equivalent:

corr d6a to d6h.

Inspect the correlation matrix to see if there is any pattern which tells us what items
could be combined.

A way into this is to find the largest correlation (positive or negative) in the matrix
and start from there. In this case, the strongest relationship (r = .821) is between d6a
Hard work and effort and d6e Good performance. So, employees who thought that
hard work and effort were rewarded by the organisation also thought that good
performance was recognised (and, correspondingly, those who thought hard work and
effort didn't help you to get ahead also tended to think that good performance wasn't
rewarded).

The next step is to find a variable which correlates with both of the first two variables.
There are several candidates, but d6c Natural ability stands out: it correlates .658
with hard work and effort and .709 with good performance.

Continuing with this strategy will lead to you assembling a cluster of variables which
may be seen to measure a dimension or factor which underlies the correlations
between the responses to the items.

If you run out of variables which go together, you can start the search for a second
cluster of variables by finding the pair of variables, not in your first cluster, which are
most highly correlated (d6d Who you know and d6f Office politics stand out).

This general strategy might be called factor analysis for people who don't have a
statistical package. We'll deal with the computational version in the next section.
-22-

Factor Analysis

5. Factor analysis is a method which allows us to see how variables cluster together.
The model assumes that the correlations between measured variables such as d6a
to d6h occur because the variables are measures of one or more common (hence
the term "common factor analysis") underlying hypothetical "factors" or latent
variables. Factor analysis has many uses, but one is to find out how many
"dimensions" a set of items is measuring, and which items measure which
dimensions. This allows us to combine the appropriate items. For further
information see

http://www.psy.mq.edu.au/psystat/other/FactorAnalysis.PDF

To carry out a factor analysis click on Analyze Dimension Reduction Factor


and select variables d6a to d6h. Click on the Extraction button to get this display:

Select Principal axis factoring for the Method, and leave the other extraction options
as shown above. Then click on Rotation and select Varimax on the display:

Now click on OK.


-23-

Syntax equivalent:

factor vars=d6a to d6h/


print=initial extraction rotation/
extraction=paf/
rotation=varimax.

For our purposes, the table of most interest is the one below, which shows the

Rotated Factor Matrixa

Factor

1 2

d6a Hard work&effort .765 -.394

d6b Good luck -.092 .503

d6c Natural ability .780 -.169

d6d Who you know -.204 .779

d6e Good performance .842 -.405

d6f Office politics -.094 .718

d6g Adapatability .758 -.046

d6h Years of service .421 .313

Extraction Method: Principal Axis Factoring.


Rotation Method: Varimax with Kaiser
Normalization.

a. Rotation converged in 3 iterations.

correlation between each item and the two factors which the procedure has produced
(in a solution like the above, in which the factors are uncorrelated, i.e., orthogonal, the
numbers are also called "factor loadings" and show how each factor is weighted in the
model which predicts ratings from the factors).

Editing a Table

Sometimes you may want to edit a table. In our case, we'd like the correlations to be
shown with two decimal places rather than three.

First, double-click on the table. The table will then be surrounded by a dashed line,
which means it is editable:
-24-

Click in the top left corner of the part of the table containing the numbers (i.e., at
.765) with the left mouse button, then drag with the mouse until all the numbers are
high-lighted:

Now click once in the high-lighted area with the


right mouse button. The following menu will
appear:

Click on Cell Properties and the following display


will appear (next page):
-25-

Click on the Format Value tab, and you will see:

Choose two decimal places and click on OK. Click anywhere off the table to end the
editing session.
-26-

Hint: While the table is in editable mode, you can click on various parts in order to
change things. For example, if you double-clicked on the heading Rotated Factor
Matrix, you could change or delete the heading.

The Correlations of Items with Factors

6. We now have this table to consider.

Rotated Factor Matrixa

Factor

1 2

d6a Hard work&effort 0.77 -0.39

d6b Good luck -0.09 0.50

d6c Natural ability 0.78 -0.17

d6d Who you know -0.20 0.78

d6e Good performance 0.84 -0.40

d6f Office politics -0.09 0.72

d6g Adapatability 0.76 -0.05

d6h Years of service 0.42 0.31

Extraction Method: Principal Axis Factoring.


Rotation Method: Varimax with Kaiser
Normalization.

a. Rotation converged in 3 iterations.

There is a clear pattern. The items which concern performance and ability, d6a, d6c,
d6e and d6g, have high correlations with factor 1. The items which have to do with
luck and who you know, d6b, d6d and d6f, have high correlations with factor 2. The
item d6h, "Years of service", is moderately correlated with both factors.

Testing Reliability

7. We would like to combine the items to make two scales, one of which will be
called perf and the other who (the names are arbitrary, taken from the item which
has the highest correlation which each factor). Since item d6h cannot be
identified clearly with either factor, we will not use it in either scale.

Before doing the calculation that will produce new variables to represent the scales,
we'll test the reliability of the scales. The reliability of a scale is a measure of how
stable subjects' responses to the scale are over time – essentially the correlation
between the responses of subjects on two different occasions. Cronbach's alpha,
which SPSS will calculate, estimates this correlation from the inter-correlations of the
items and the number of items (the higher the correlations, and the greater the number
of items, the higher the reliability). Click on Analyze Scale Reliability
-27-

Analysis. Select the items d6a, d6c, d6e and d6g. Click on the Statistics button and
make the selections shown on the following screen:

Click on Continue, then OK. Repeat the process for items d6b, d6d and d6f.

Syntax equivalent:

reliability vars=d6a to d6h/


scale(perf)=d6a d6c d6e d6g/
statistics=correlations/
summary=total.

reliability vars=d6a to d6h/


scale(who)=d6b d6d d6f/
statistics=correlations/
summary=total.

Hint: All the items we're dealing with are scored in the same direction. If you are
dealing with a scale with both positive and negative items, you will have to reverse
the numbers for some of the items before carrying out the reliability analysis and also
before creating the scales. You can use the recode command, described below, for
this purpose. For example, on a 1-to-5 scale 1 would be changed to 5, 2 => 4, 4 => 2
and 5 => 1 on the reversed scales.
-28-

Reliability Output

8. The Cronbach's alpha values for the two scales are .891 and .716, which are
respectable. The output tells us that the reliability for the who scale would
increase a bit if we removed item d6b. This suggests that luck (d6b) is not quite
the same thing as who you know (d6d) and office politics (d6f), but for our
purposes we will retain item d6b.

Creating the Scales

Note: Following a factor analysis, SPSS will create factor scores for us which are in
some ways better than the ones we'll be creating. For a discussion of different types
of factor scores, see

http://www.psy.mq.edu.au/psystat/other/FactorAnalysis.PDF.

We're using the less desirable, but very commonly-used, method mainly to illustrate
the use of the compute command.

9. We would like to obtain the average of each set of items to use as the scale score.
We'll use the compute command, which has a wide range of uses in creating new
variables based on existing variables. While looking at the Data Window, click on
Transform Compute Variable. You will see the following display:

The functions are grouped under different headings, which is fine as


long as you know what heading to click on. Life is too short to worry
-29-

about this, so let's click on All so that can have access to all available functions at
once.

Enter perf in the Target Variable slot.

We can now scroll down the lower list to Mean and double-click on it to produce this
display:

Note the helpful description of the function in the lower-centre panel.

11. Either double-click on it, or click once then click the button to place the
function in the Numeric Expression window. Then use the point-and-click
method, or simply type, to enter the following in the window:
MEAN(d6a,d6c,d6e,d6g). (make sure that there is a comma between each
variable name). You should see

Note that there is no full-stop after the bracket, as there would be with syntax.

Click OK, then create the who scale by the same method.

Syntax equivalent:

compute perf=mean(d6a,d6c,d6e,d6g).
compute who=mean(d6b,d6d,d6f).
-30-

To check our work (it always pays to do this), obtain one-way frequency distributions
of perf and who, this time asking for histograms as well as the tables (see Appendix 7
if you can't see how to do this). Make sure that the values of perf and who are not
outside the limits 1 and 6.

Missing Values

You'll notice that there are two missing values for perf and two missing values for
who. These come about because when SPSS is calculating perf and who using the
mean function, it sets the result to missing if all of the d6 variables on which the new
variable is based are missing (if a respondent doesn't answer a question, a blank
instead of a number is entered for that item and SPSS automatically treats it as
missing). This seems very reasonable!

In fact, though, the mean function may a bit too liberal, because it will give a subject a
non-missing value for perf even if the subject has only answered one of d6a, d6c ,d6e
and d6g. A variation of the mean function allows you to specify the minimum
number of items a subject has to answer in order to have a mean calculated. For
example, if we want anyone who has answered fewer than three of the items to be
missing on perf, we could use the expression mean.3(d6a,d6c,d6e,d6g). This can be
used in the Syntax Window and in the point-and-click window. The section on the
compute command in Appendix 7 contains further information on creating scales.

Correlations

12. We are interested in how scores on the two scales vary according to a number of
other variables, including age, gender, education, level and salary. For the
purposes of this exercise, we'll look at three variables, age, sex and educ.

13. Since perf, who and age are all numeric variables, we can assess the relationship
between them with the correlation coefficient. Click on Analyze Correlate
Bivariate, then select the three variables and click on OK.

Syntax equivalent:

corr perf who age.

15. Have a look at the correlation matrix. It's noticeable that the only significant
correlation is between the two scales, perf and who. The value of -.39 suggests
that people who are higher on perf tend to be lower on who, which makes sense.
However, our real interest is in the relationship between the two scales and age.
To make sure that the correlation between perf and age isn't being degraded by a
few outlying points, we'll produce a scatterplot. This is always a very good idea
when calculating correlations, especially with small samples, in which a few cases
can be very influential.

Scatterplots

If you click on Graphs, this menu appears:


-31-

The Chart Builder option offers a drag-and-drop way of creating graphs, but is
somewhat complicated by the role that the scale of the variables can play. We'll take
the easy way out and click on Legacy Dialogs. Note, though, that the Chart Builder
offers some capabilities which aren't available through the Legacy Dialogs, so you
should become familiar with it.

16. Click on Graphs Legacy Dialogs Scatter/Dot Define (make sure simple
is selected). Select perf for the Y Axis and age for the X Axis, then select id in the
Label Cases slot. (Selecting a variable for Label Cases is only necessary if you
want to be able to identify cases from the points in the graph, as we will want to.)
Click on OK.

Syntax equivalent:

graph/ scatterplot(bivar)=age with perf by id(identify).

See Appendix 8 for further variations of the graph command.

The scatterplot should look something like (except for the box around one point):

It certainly looks as if there is no particular relationship between age and perf.


However, in order to demonstrate a few things about graphs, and the selection of
cases, we're going to pretend that we're concerned about the 'outlier' which has the
box around it in the above graph.
-32-

Editing a Graph and Identifying a Case

Double-click on the graph. This opens the graph in a separate window called the
Chart Editor. Any alterations you want to make to a graph (except for enlarging
it) start with this step.

Many options are available for editing graphs. First, there are the buttons at the
top of the editing window:

You can hover the mouse pointer on the icons to see their function.

Further options are available if you right-mouse click on the graph:

Probably the best way of proceeding, though,


is to double-click on the part of the graph that
you want to change. For example, if you
wanted to remove the decimal places on the
Y-axis numbers, you could double-click on
the axis itself to obtain this display:
-33-

If you then click on the Number Format


tab, you can change the Decimal Places
to zero:

We're going use this button , which is at the left-hand end of the icons in the
Chart Editor window, to obtain the value of the variable id for the "outlier".
This is possible because we nominated id in the Label Cases slot when we specified
the graph. Click on . This will turn your cursor into a "hairsight". Move the
cursor to cover the "outlier" and click once. The id of that case (161) will be
displayed in the graph.

Close the Chart Editor window. Notice that the modified version of the graph is
shown in the Output Window.

Omitting the Outlier from the Correlation Calculation (using select cases)

17. We would now like to calculate the correlation coefficient without the case for
which id = 161. We don't want to get rid of the case entirely, just omit it for one
calculation. To do this
Click on Data Select Cases
The following display will appear:
-34-

Click on If condition is satisfied, then on If. In the display that appears, select id,
then click on the ~= (not equal to) button and the keypad buttons for 161, so that
the display looks like:

Click on the Continue Button.

Before clicking on OK, make sure that the Filter option is chosen on the bottom
of the first display:

If the Delete option is chosen, case 161 will be permanently removed from the
dataset. (Of course, this will only be a real problem if you save the workmot.sav
dataset and replace the original version of the file with the one which does not contain
case 161.)

Obtain the correlation between perf and age as you did previously (see 14. above).
The correlation is slightly less negative (-.018 versus -.060) but there's no
suggestion that the case with id equal to 161 is having a major effect on the
correlation.
-35-

Turn the selection off, by clicking on Data Select Cases then selecting All
cases.

Syntax equivalent:

temporary.
select if (id ne 161).
corr age with perf.

Hint: The select cases (select if in syntax) command has many uses and can be as
complicated as you like. For example, in the present data you may want to select only
female subjects for an analysis:

temporary.
select if (sex eq 2).
freq perf who.

Or, you might want to look at the details of subjects who meet certain criteria:

temporary.
select if (perf >= 6 and who le 5 and (salary < 4 or level ge 2)).
list vars=id sex age perf who salary level.

As the second example shows, the same logical statement may be made with different
symbols. For example, ge [greater than or equal to] is equivalent to >=.

Note also that the second example demonstrates a method which can be used to
identify cases with out-of-range or dubious values. If you are inspecting a set of
frequencies for newly-entered data, and come across (for example) values of 0 or 3 for
sex, which should have values of 1 and 2 only, you might use the following
commands to find the id values of the cases involved, so that you can look at the
original data (questionnaires, for example) to find what the values should be.

temporary.
select if (sex eq 0 or sex eq 3).
list vars=id sex.

The list command simply lists the values of the specified variables (id and sex in this
case) for the cases which meet the criteria in the select if command.

Producing a t-test

18. We would now like to see whether male and female employees obtain different
scores on the two scales, perf and who. One way to answer this question is to use
the t-test, since the scales are numeric variables and sex has two categories. To
carry out the t-test
-36-

Click on Analyze Compare Means Independent-Samples T Test

In the following display, select perf and who as Test Variable(s) and sex as the
Grouping Variable. Click on Define Groups, and enter 1 and 2 as the

specified values. Click on Continue then OK.

Syntax equivalent:

t-test groups=sex (1,2) vars=perf who.

19. You will see from the t-test results (an excerpt from the full output) that there are
significant differences between males and females on both scales.

What do the means (which are shown below) tell us about the direction of the
differences?
-37-

Comparing Means for More than Two Groups

We are now going to concentrate on one of the scale scores, perf, and see how it
varies over employees in different salary groups. We'll first approach the question
descriptively, then carry out a significance test, which will require us to combine
employees to make groups which are large enough for testing to be meaningful.

20. In order to look at the mean rating on the perf scale for each salary group, we'll
use the means procedure. Click on Analyze Compare Means Means. As in
the display shown below, specify perf as the dependent variable and salary as the
independent variable and click OK.

Syntax equivalent:

means perf by salary.

The output suggests that there is some variation over salary groups, with lower means
occurring for the $30,001 - $40,000 and $40,001 - $50,000 groups and higher means
occurring for the $60, 001 - $70,000 and > $90,000 groups.
-38-

We would like to carry out a test to see whether this variation exceeds what might
occur by chance, but some of the groups are very small, and we would like to
combine some of them to create four reasonably-sized groups. To do this we'll use
recode, which is one of the most-used SPSS data manipulation commands.

Recoding Variables

21. Click on Transform Recode into Different Variables -- we would like to keep
the original version of salary while having the new version.

Select salary as the Input Variable, then type in salrec in the Name slot of the
Output Variable and click on the Change button.
Click on the Old and New Values button.
Insert 1 in the Value slot in the Old Value panel (left part of the display) and 2 in
the Value slot of New Value panel (upper right part of the display) [i.e., we want to
change 1 to 2]. Click on the Add button.
Click on the Range radio button in the Old Value panel and insert 2 and 5 in the
appropriate slots. Click on the Copy old value(s) radio button in the New Value
panel [i.e., we want values 2, 3, 4 and 5 to retain their original values on the new
variable]. Click on the Add button.
Carry out the necessary actions to change 6, 7 and 9 (as it happens, there are no
subjects with a code of 8) to 5. You could do this by (a) specifying that each
separate value is to be changed to 5, (b) specifying that the range of values 6-9 is
to be converted to 5, or (c) simply specifying that the range 6 through [to] highest
is to be converted to 5. If you use option (c), the display should look like this:
-39-

Click on the Continue button, then on the OK button. If you click on the
button, then scroll down, you should see the new variable added to the
list of variables in the dataset.
Check on the new variable by obtaining a one-way frequency distribution. It
should look like this:
SALREC

Cumulative
Frequency Percent Valid Percent Percent
Valid 2.00 21 12.7 12.9 12.9
3.00 58 34.9 35.6 48.5
4.00 66 39.8 40.5 89.0
5.00 18 10.8 11.0 100.0
Total 163 98.2 100.0
Missing System 3 1.8
Total 166 100.0

Notice that there are no value labels for the new variable. If you would like to add
some labels, you can remind yourself how to do it by looking at 4. in Example 1.

Syntax equivalent:

recode salary (1=2)(2 thru 5=copy)(6 thru hi=5) into salrec.


value labels salrec 2 'up to $30k' 3 '> $30k-$40k' 4 '> $40k-$50k'
5 ' > $50k'.
freq salrec.
-40-

Hint: As is sometimes the case, syntax offers more flexibility than the point-and-click
method. For example, by using the temporary command in syntax we could modify
salary without creating salrec, but do it only temporarily, so that the recoding (and the
value labels) are in force for only one procedure. So, with the commands below,
frequencies would show the modified version of salary, but any subsequent
procedures would use the original version.

temporary.
recode salary (1=2)(6 thru hi=5).
add value labels salary 2 'up to $30k' 5 ' > $50k'.
freq salary.

Note that because we're not creating a new variable, we don't have to copy the values
which aren't being changed (2, 3, 4 and 5). Also, we've used the add value labels
command (another syntax perk) to modify only the labels which need changing. With
add value labels, as opposed to value labels, the labels for values which aren't
mentioned remain the same as those in the unmodified variable.

Analysis of Variance

22. While there is a one-way analysis of variance procedure accessible through the
Analyze Compare Means One-Way ANOVA menus, it is a subset of a more
general procedure called GLM (General Linear Model), which handles much more
complicated analyses such as factorial univariate, repeated measures, and
multivariate ANOVAs, so we will use that. We'll do a basic analysis first, then
demonstrate some of the more advanced facilities of GLM in some follow-up
analyses.

23. Click on Analyze General Linear Model Univariate. For a basic analysis,
specify perf as the Dependent Variable and salrec as a Fixed Factor and click on
the OK button. The table of most interest shows that the effect of salrec is not

Tests of Between-Subjects Effects

Dependent Variable: PERF


Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 9.290a 3 3.097 2.298 .080
Intercept 2132.397 1 2132.397 1582.257 .000
SALREC 9.290 3 3.097 2.298 .080
Error 212.936 158 1.348
Total 3003.563 162
Corrected Total 222.226 161
a. R Squared = .042 (Adjusted R Squared = .024)

significant at the conventional level (p < .05), but for the purposes of demonstration
we'll pursue the analysis further anyway.
-41-

Syntax equivalent:

glm perf by salrec.

Pairwise Tests

24. Again click on Analyze General Linear Model Univariate. The variables
perf and salrec should still be selected as the dependent variable and a fixed factor
respectively. Now click on the Options button. In the resulting display, double-

click on salrec in the Factor(s) list to select it, then click on the arrow to transfer
salrec to the Display Means for box. Then check Compare main effects and select
Bonferroni as the Confidence interval adjustment. The display should look like
that above. Click on the Continue button and OK. (Note that we're assuming that
the variance of perf is not significantly different over the four groups. This is in
fact the case, as demonstrated by an optional test in GLM. If you would like to
carry out this test for yourself, check the Homogeneity tests box in the above
display.)

25. This procedure will carry out all pairwise comparisons of the salary groups. We
are going to treat them at post-hoc, which means that we have no particular
comparisons we wish to make, but that we're interested in any of the 4!/(2! 2!) = 6
-42-

pairwise comparisons which could be made between the means of the four groups.
When carrying out such follow-up tests, we generally want to control the Type I
error level by adjusting p-values to allow for the number of tests which could be
carried out. The Bonferroni option is one way of doing this. As would be

expected, none of the comparisons is significant. SPSS obtains the Bonferroni-


adjusted p-values shown in the table above by multiplying the unadjusted p-values by
the number of post-hoc comparisons, 6. For example, the original p-value for the
comparison between category 4 (> $40 - $50K) and category 5 (> $50) and was .029.
Multiplied by 6, this becomes the value shown in the table, .175.

Syntax equivalent:

glm perf by salrec/


emmeans=table(salrec) compare(salrec) adjust(bonferroni).

Note: emmeans is short for estimated marginal means. These are means estimated
from the regression model underlying the GLM analysis. They are very useful in
more complicated analyses because they are adjusted for other variables in the model.
In factorial analyses, they are also used for tests of simple effects.

Plots of Means

26. One of the most useful facilities in GLM is that it will produce a graph of the mean
of the dependent variable for each level of a factor. To obtain such a plot, again
click on Analyze General Linear Model Univariate. Click on the Plots
button, click once on salrec, then on the arrow button beside the Horizontal Axis
slot, then click on the Add button. The display should look like this:
-43-

Now click on the Continue button and OK. The graph should look like this:
-44-

Putting aside the fact that there are no significant differences, what does this graph
show about the relationship between salary and employees' beliefs that good
performance and ability will lead to success in the company?

Syntax equivalent:

glm perf by salrec/


plot=profile(salrec).

Crosstabulations

27. One of the most common forms of analysis is to produce a contingency table or
crosstabulation, which shows the number of cases which fall into each of the
categories defined by combinations of the values of two categorical variables. For
example, we might like to know how many male employees and how many
female employees fall into each category of the recoded salary variable, salrec.
Further, we might want to ask whether the way the employees are distributed over
the salary categories differs for males and females. A crosstabs analysis is often
accompanied by a chi-squared test of independence, which provides an answer to
this kind of question.

To carry out a crosstabs analysis, click on Analyze Descriptive Statistics


Crosstabs. Specify sex as the row variable and salrec as the column variable.
Click on the Statistics button, check Chi-square, and click on Continue. Click on
the Cells button and check Observed and Expected under Counts and Row under
Percentages. Click on Continue and OK. The resulting tables are as follows:

SEX Gender * SALREC Crosstabulation

SALREC
2.00 up 3.00 > 4.00 >
to $30k $30k-$40k $40k-$50k 5.00 > $50k Total
SEX Gender 1 Male Count 5 26 46 14 91
Expected Count 11.7 32.4 36.8 10.0 91.0
% within SEX Gender 5.5% 28.6% 50.5% 15.4% 100.0%
2 Female Count 16 32 20 4 72
Expected Count 9.3 25.6 29.2 8.0 72.0
% within SEX Gender 22.2% 44.4% 27.8% 5.6% 100.0%
Total Count 21 58 66 18 163
Expected Count 21.0 58.0 66.0 18.0 163.0
% within SEX Gender 12.9% 35.6% 40.5% 11.0% 100.0%
-45-

Chi-Square Tests

Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 20.241a 3 .000
Likelihood Ratio 20.871 3 .000
Linear-by-Linear
19.191 1 .000
Association
N of Valid Cases 163
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 7.95.

Syntax
crosstabs sex by salrec/
cells=count row expected/
statistics=chisq.

The percentages in each cell show, separately for males an females, the number of
employees who fell into each salary category. It is immediately evident that the
distribution over the salrec categories is different for males and females. At one
extreme, 22.2% of females, versus only 5.5% of males, are in the lowest salary
category (up to $30 K). At other extreme, 15.4% of males, versus only 5.6% of
females, are in the highest salary category (> $50 K).

Does this represent a systematic difference, or could it be due to sampling error? The
chi-squared test result provides an answer to this question. This statistic is based on
the discrepancy between the number of cases which would be expected in each cell if
there were no relationship between gender and salary – if the two variables were
independent – and the number actually observed in each cell. The bigger the
discrepancy, the less likely it could have occurred by chance. The expected values are
shown in the table along with the observed values. Notice that while, under
independence, around nine females would be expected in the lowest salary group,
there were actually 16. For information about the calculation of the expected
frequencies, and of the chi-squared statistic, see

http://www.psy.mq.edu.au/psystat/chi2.htm

At 20.24, with 3 degrees of freedom, the chi-squared test is highly significant (p <
.0005), so we can conclude that gender and salary are not independent and, further,
that females receive significantly lower salaries than males.

Finishing the Exercise

28. To save the modified data (three new variables, perf, who and salrec have been
added to it), click on File Save As, and save the file as workmot2.sav.

Syntax equivalent:

save outfile='workmot2.sav'.
-46-

This command will save the file in the default directory. If you want to be sure that
you save the file in a particular directory (a folder called spss\data on a memory stick
with the drive letter k:, for example) this can be specified in the command:

save outfile='k:\spss\data\workmot2.sav'.

Close SPSS down by going to the Data Window then clicking on File, then Exit.

10. Further Reading


This handout contains a number of appendices with further information:

Appendix 2: Reading Raw Data into SPSS for Windows


Appendix 3: Reading Data from Excel into SPSS for Windows
Appendix 4: Finding the Syntax for a Point-and-Click Command
Appendix 5: Getting Help with Syntax
Appendix 6: Copying SPSS Output into Word
Appendix 7: Some SPSS Procedures

The website http://www.psy.mq.edu.au/psystat/ contains a number of documents about


SPSS.

There are many other books on SPSS in the Library which may be helpful. To find
them, search for SPSS in the Library catalogue and stand back.

Some of these books are excellent, but none that I know of gives an adequate
introduction to the use of syntax.

Alan Taylor
Department of Psychology
Macquarie University
10th January 2001

Latest revision 13th January 2011

Thanks to Dr David Cairns, Dr Lesley Inglis, Heather Soo and Anne Connolly for
commenting on earlier versions of this handout.
-47-

Appendix 1: Questions from the Work Attitudes Study


Variable Question

id Identification number unique to each subject


sex Which gender are you?
age How old are you?
educ What is the highest level of education you have reached?

1. Did not finish year 10


2. Finished year 10
3. Finished year 10 and completed or completing a TAFE course
4. Did the HSC or equivalent
5. Did the HSC or equivalent and completed or completing a TAFE
course
6. Doing or completed an undergraduate degreee at university
7. Doing or completed postgrduate studies at university
8 Other (please specifiy)
level Would you describe yourself as

1. Junior management
2. Middle management
3. Senior management

Note: All other respondents were coded 0 'Worker'


salary How much is your annual salary/remuneration worth?

1. 0-$20,000
2. $20,001 - $30,000
3. $30,001 - $40,000
4. $40,001 - $50,000
5. $50,001 - $60,000
6. $60,001 - $70,000
7. $70,001 - $80,000
8. $80,001 - $90,000
9. over $90,000
b14a to Using this scale, please answer the following questions about the
b14o organisation you work for.

1 Strongly disagree
2 Disagree
3 Slightly disagree
4 Neither agree nor disagree
5 Slightly agree
6 Agree
7 Strongly agree
-48-

(a) I am willing to put in a great deal of effort beyond that normally


expected in order to help this organisation be successful

(b) There's not too much to be gained by sticking with this organisation
indefinitely
(c) I feel very little loyalty to this organisation
(d) I would accept almost any type of job assignment in order to keep
working with this organisation
(e) I find that my values and this organisation's values are very similar
(f) I am proud to tell others that I am part of this organisation
(g) I could just as well be working for a different organisation as log as
the type of work were similar
(h) I tell my friends that this is a great organisation to work for
(i) It would take very little change for me to leave this organisation
(j) I am extremely glad that I chose to work for this organisation
(k) This organisation really inspires me to perform my job to the best of
my ability
(l) Often I find it difficult to agree with this organisation's policies on
important matters relating to its employees
(m) I really care about the fate of this organisation
(n) For me this is the best of all possible organisations
(o) Deciding to work for this organisation was a definte mistake

satjob Overall, how satisfied are you with:


satorg
Your present job
The organisation you work for

1 Very dissatisfied
2 Dissatisfied
3 Neutral
4 Satisfied
5 Very satisfied
effort How would you describe the level of effort you put into your job?

1. I work less than is necessary


2. I work about what is necessary
3. I work a little harder than is necessary
4. I work quite a bit harder than is necessary
5. I work a lot harder than is necessary
d6a to d6h How important are the following factors to getting ahead in your
organisation?

Hard work and effort


Good luck
Natural ability
Who you know
Good performance
Office politics
-49-

Adaptability
Years of service

1 Not important at all


2 Slightly important
3 Moderately important
4 Quite important
5 Very important
6 Extremely important
-50-
-51-

Appendix 2: Reading Raw Data into SPSS for Windows


Sometimes data which are to be analysed come in the form of a "raw" dataset, entered
into an ASCII or text file. There are two main types of raw data file.

Free Format

The first contains the numbers, and perhaps the variable names, separated by
delimiters such as tabs, commas or spaces. The dataset below (which is the demo
dataset entered in Exercise 1) contains the variable names and numbers separated by
tabs.

Data in Free Format, Tab-Delimited

ID GENDER IQ GPA
1 1 105 1.33
2 1 92 3.5
3 2 104 2.67
4 2 94 2.75
5 1 106 2.75
6 1 111 2.25
7 2 100 3
8 2 95 3
9 1 120 3
10 1 90 2.5

The kind of raw dataset show above is in "free" format, so-called because the numbers
for a given variable do not have to be in any particular columns in the file, although in
this case they are, because the tabs space them evenly. However, this is clearly not
the case with the dataset below, which is also in free format, this time comma-
delimited.

Data in Free Format, Comma-Delimited

ID,GENDER,IQ,GPA
1,1,105,1.33
2,1,92,3.5
3,2,104,2.67
4,2,94,2.75
5,1,106,2.75
6,1,111,2.25
7,2,100,3
8,2,95,3
9,1,120,3
10,1,90,2.5
-52-

Fixed Format

When there are many variables, raw datasets are likely to be in fixed format, which
has no delimiters, but in which the values for each variable always appear in the same
columns of the file. There is no reason why such files cannot have spaces between the
values for different variables, but usually there are none, as in the dataset below,
which makes it critical that you know what column(s) the values for each variable
occupy. In this case, id is columns 1-2, gender in 3, iq in 4-6 and gpa in 7-10 (note

Data in Fixed Format

0111051.33
021 923.50
0321042.67
042 942.75
0511062.75
0611112.25
0721003.00
082 953.00
0911203.00
101 902.50

that the decimal point in gpa takes up one column). Datasets in fixed format do not
usually include variable names -- just the data.

Reading a Free-Format Data File

We'll suppose that the dataset is the first one shown above, which is called
demotab.dat and which is tab-delimited and contains the names of the variables.

1. Click on File, then Read Text Data.


2. At the Open File display, navigate to the appropriate directory, open the Files of
type menu and select Text.
-53-

3. Double-click on demotab.dat.
4. The following display will appear. You can see from the data in the Text file
window that SPSS has already made some intelligent assumptions about the way
the data are laid out.

5. Click on the Next button, then in the next display, tell SPSS that the variables are
delimited and that the variable names are included at the top of the file:

6. Again click on the Next button, and confirm what SPSS has already assumed, that
the first case begins on line 2 of the file (the variable names are in line 1), that
each line represents a case and that you want to read all the cases in the file into
SPSS:
-54-

7. Again click on the Next button and confirm what SPSS has assumed, that the
variables are delimited by tabs. Notice that the Data preview at the bottom of the
display shows that SPSS is on the right track:

Click on the Next button twice more to arrive at this screen:


-55-

8. Click on the Finish button to get SPSS to read the data into the Data Window.
The variables should be the same as those in 6. of Exercise 1.

Syntax equivalent:

get data/ type=txt/ file='demotab.dat'/


delimiters="\t"/
firstcase=2/
variables= id f2 gender f1 iq f3 gpa f4.2.

Hint:

If your data are in free format (tab, comma or space-delimited) and there are no
variable names at the top of the file, the following no-fuss syntax works:

data list free file='demotab.dat'/


id gender iq gpa.
-56-

Reading a Fixed-Format Data File

We'll suppose that the dataset is the third one shown above, which is called
demofix.dat and which is in fixed format and contains no variable names.

1. Carry out steps 1. to 4. as above (specifying the appropriate filename of course).


2. When the display below appears, tell SPSS that the variables are of fixed width
and that the variable names are not included in the file:

Click on the Next button.

3. In the next display, confirm that the data begin on the first line of the file, that
there is one line per case and that you want to import all of the cases:

Click on the Next button.

4. The resulting display allows you to tell SPSS where the breaks between variables
occur. As can be seen below, the Data preview panel shows the data
-57-

as it appears in the data file. By clicking with the mouse, you can insert lines which
show where the values for one variable end and those for the next variable begin.
Knowing that id takes up two columns, gender 1, iq 3 and gpa 4, we can easily insert
lines so that the Data preview panel looks like this:

Click on the Next button.

5. The resulting display allows us to give meaningful variable names to the four
variables we have defined. By default, SPSS names them V1 to V4. A click
-58-

on the variable name given at the head of each column in the Data preview panel,
allows us to type in a new variable name for that variable, and also to ensure that each
variable is read appropriately (e.g., as a numeric variable rather than as a string or
alphanumeric variable). A click on the Next button takes us to the display shown in 8.
above, and a click on the Finish button causes SPSS to read the data into the Data
Window.
-59-

Appendix 3: Reading Data From Excel into SPSS for


Windows
It is often convenient to enter data into an Excel spreadsheet then import them into
SPSS. If the variable names are given in the first line of the spreadsheet, and conform
to SPSS rules (no more than eight characters long, begin with a letter and contain no
spaces), they will be translated into SPSS without fuss. It is also a good idea to
remove any extraneous material, such as totals, graphs and explanatory text, from the
spreadsheet before attempting to translate it into SPSS. The spreadsheet should
contain only the raw data and (optionally) the variable names. One advantage of
using Excel for data entry is its ubiquity -- many people who do not have SPSS have
access to Excel. Another advantage is that in Excel you can specify that a column, the
one containing subject identification, for example, is always visible, something which
isn't possible in SPSS. The advantage of entering data into a spreadsheet rather than
typing it into a raw data file is that data definition of the type described in Appendix 2
is unnecessary. A point to note is that there is no facility for value labels in Excel:
labelling has to wait until the data have been imported into SPSS.

Reading an Excel Spreadsheet into SPSS

For the purposes of demonstration, the demo and workmot datasets have been placed
in an Excel file called forspss.xls as separate sheets. Usually the spreadsheet you are
given will contain only one sheet but, as will be seen below, SPSS copes if there is
more than one.

1. In SPSS, click on File, Open and Data. In the File Open display, select Excel
(*.xls) in the Files of type slot:
-60-

Type in the name of the file, forspss.xls, in this case, and click on the Open button.

2. The resulting display gives the option of reading the variable names from the first
line of the spreadsheet. If the SPSS data look nonsensical following the
translation, it's probably because you've given the wrong option here. In this
example, the variable names are in the first line of the spreadsheet, so the box is
ticked. Because there are two sheets in the file, SPSS allows us to select one or

the other in the Worksheet slot. We'll stay with demo1. If there is only one sheet in
the file, this option won't appear.

3. Click on the OK button to complete the operation.

Syntax equivalent:

get data/ type=xls/


file='forspss.xls'/
sheet=name 'demo1'.

Note: Be careful to specify the directory that the file is in. For example,
file='c:\spssdata\forspss.xls'.

Hint: If SPSS denies that the file you're trying to import is an Excel spreadsheet, and
you know it is, it's likely that you're using a version of Excel which SPSS hasn't
caught up with yet. To solve the problem, use Save As in Excel to save the
spreadsheet as if from an earlier version of Excel. Version 2.1 is the lowest common
denominator, which any version of SPSS will read.

You can save an SPSS dataset as an Excel spreadsheet -- simply use Save As and
specify Excel (*.xls) in the Save as type slot. SPSS gives you the option of saving the
variables names in the spreadsheet.
-61-

Appendix 4: Finding the Syntax for a Point-and-Click


Command
Sometimes you would like to know the syntax equivalent for a series of points-and-
clicks. SPSS makes this easy. Say you have used point-and-click to set up
frequencies:

If, instead of clicking on OK, you click on the Paste button, SPSS pastes the syntax
equivalent of the analysis you have specified into the Syntax Window. For example:

FREQUENCIES
VARIABLES=gender
/ORDER= ANALYSIS .

The slight drawback is that the pasted version of the command contains more detail
than you actually need to specify (we can usually rely on defaults, and omit
VARIABLES= in many commands), but it's a good start to learning the syntax.
-62-
-63-

Appendix 5: Getting Help with Syntax


There's a handy feature for anyone who isn't entirely familiar with SPSS syntax (i.e.,
everyone): If you type the name of the command, for example, frequencies, or even
just freq, into the Syntax Window, place the cursor on it and click on the button,
a new window containing more information than you thought possible about
the command concerned will open:

With version 19, SPSS has adopted a browser interface to display help information,
which can slow things down considerably. I've found that making Google Chrome
my default browser (rather than Internet Explorer) has shortened the wait for help
information.
-64-
-65-

Appendix 6: Copying SPSS Output into Word


You may want to copy all or part of your SPSS output into Word in order to
incorporate it into a report or in order to edit it in your word processor and/or print it
on your own printer. All elements of SPSS output -- pivot tables, graphs and text
output -- can be selected and copied in one operation. To illustrate this, open the file
demo1.sav, and run the following commands:

means gpa by gender.


manova gpa by gender (1,2)/
print=cellinfo(means)/
design.

graph scatter=gpa with iq by gender.

The output window should then contain the following output:


_____________________________________________________________________

means gpa by gender.

Means

[DataSet1] C:\writing\NEWDOC\new08\general\demo1.sav

manova gpa by gender (1,2)/


print=cellinfo(means)/
design.

The default error term in MANOVA has been changed from WITHIN CELLS to
WITHIN+RESIDUAL. Note that these are the same for all full factorial
designs.

* * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e *
* * * * * * * * * * * * * * * *
-66-

10 cases accepted.
0 cases rejected because of out-of-range factor values.
0 cases rejected because of missing data.
2 non-empty cells.

1 design will be processed.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - -
Cell Means and Standard Deviations
Variable .. gpa
FACTOR CODE Mean Std. Dev. N 95
percent Conf. Interval

gender Male 2.555 .738 6


1.780 3.330
gender Female 2.855 .171 4
2.584 3.126
For entire sample 2.675 .580 10
2.260 3.090

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - -

* * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e --
Design 1 * * * * * * * * * * * * * * * * *

Tests of Significance for gpa using UNIQUE sums of squares


Source of Variation SS DF MS F Sig of F

WITHIN CELLS 2.81 8 .35


gender .22 1 .22 .61 .456

(Model) .22 1 .22 .61 .456


(Total) 3.03 9 .34

R-Squared = .071
Adjusted R-Squared = .000

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - -

graph scatter=gpa with iq by gender.

Graph
[DataSet1] C:\writing\NEWDOC\new08\general\demo1.sav
-67-

The directions below allow you to copy all of your output from SPSS to Word.

Use Edit Select All, or Ctrl-a, to select all of the output, or just click on one item
(or multiple items, while holding down the Crtl key). Each will be surrounded by a
frame.
1. Click on Edit, then Copy.
2. Start up Word, if you haven't already done so. If Word is already open, switch to
the Word window with Alt-Tab (or use the Window menu).
3. In Word, click on Edit, then Paste. The items will appear in the Word document.

Note: If you select Paste Special rather than Paste, then select (say) Bitmap, pivot
tables will be copied as graphics objects rather than as Word tables. You may prefer
the appearance of the former.

4. You can now work with the output in Word. (Note, though, that it's best not to
edit the output in Word – do that before you copy the items.) For example:

i. The Graph Click once on the graph. It will be surrounded by a frame. You
can centre the table on the page by clicking on the "centre"
button. You can also change the size of the table by
clicking on and dragging one of the black squares on the frame. Dragging one of
the corner squares reduces or enlarges the table while preserving the aspect ratio. If
you want to have more flexibility in positioning the table or graph, click on it with
the right mouse button, choose Format Picture from the resulting menu, then select
the Layout tab. Choosing a wrapping style other than In line with text will enable
you to drag the table around the page and position exactly where you want it – Word
willing. The choice you make will also determine how the text in your document is
-68-

arranged around (or in front of or behind) the table. Experimenting with the various
kinds of wrapping is probably the best way to find what suits you.

ii. The Text Output You may need to adjust the font and its size to retain the
formatting of the text output. To do this, highlight the text by holding down the
mouse button and dragging. When you have highlighted the text you want to alter,
choose the font and the font size on the Tool Bar. The best font is Courier New and
a size of 10 or 9 should mean that none of the lines is wrapped around.

iii. The Table As noted in the box above, a standard Paste will result in tables
appearing as Word tables in the document. If you use Paste Special the tables
appear as graphics objects. Each of these may be the most suitable for your
purposes. The advantage of the Word tables is that they are easily editable in
Word, like any other table; tables that are graphics objects aren't readily edited once
they're out of SPSS but, on the other hand, you may find them much easier to move
around and resize than Word tables.
-69-

Appendix 7: Examples of Data Manipulation


(Based on workmot4.sav, available at
http://www.psy.mq.edu.au/psystat/download.htm)

Note: Each example given here assumes that you're starting with the unaltered workmot4.sav
dataset, except where noted.

Examples of Operators and Functions

Operators Functions

Logical or or | mean(x,y,z) Calculates mean of


variables
Logical and and & missing(x) True if x is missing
Equal to eq = sysmis(x) True if x is system-
missing
Not equal to ne ~= sum(x,y,z) Sums variables
Less than lt < rnd(x) Rounds x to the nearest
Greater than gt > whole number
Less than or equal to le <= rv.normal(0,1) Creates a normal
Greater than or equal to ge >= random variable with
Not not ~ mean zero and SD one.
rv.uniform(0,1) Creates a uniformly
Add + distributed (rectangular)
Subtract - random variable with
Multiply * values in the range zero
Divide / to one.
Exponentiation **

1. Selecting Cases

For the point-and-click case selection command click on Data Select Cases. See
also An Introduction to SPSS, p. 30

Selecting a subset of cases

select if (sex eq 2 and age ge 30). [Alternative: select if (sex=2 & age >=30).]

(What would happen if select if (sex eq 2 or age ge 30) was used instead?)

Selecting cases not missing on a variable

select if not(missing(educ)). [Alternative: select if ~missing(educ).]


-70-

Performing separate analyses on two subsets

temporary.
select if (sex eq 1).
freq age/format=notable/statistics=median.

temporary.
select if (sex eq 2).
freq age/format=notable/statistics=median.

Another way of doing the above:

sort cases by sex.


split file by sex.
freq age/format=notable/statistics=median.
split file off.

For the point-and-click version of split file, click on Data Split File.

2. Recode

For the point-and-click recode command, click on Transform Recode. See also An
Introduction to SPSS, p. 35

Grouping values of a numeric variable

recode age (lo thru 25=1)(26 thru 30=2)(31 thru 40=3)(41 thru hi=4) into agerec.

What if age was stored in fractional years to two decimal places, so that (for
example), the value 25.92 was possible? The following, by overlapping the values
which are recoded, makes sure that nothing slips through the cracks. Note that once a
value has been recoded it is not recoded again, so that, for example, 25 would be
recoded to 1 by (lo thru 25=1), and would not be recoded again by (25 thru 30=2).

recode age (lo thru 25=1)(25 thru 30=2)(30 thru 40=3)(40 thru hi=4) into agerec.

Temporary recode to combine categories with small numbers

temporary.
recode salary (1=2)(6 thru hi=5).
add value labels salary 2 'up to $30K' 5 '> $50K'.

For the point-and-click command to change the value labels, see An Introduction to
SPSS, p. 7.
-71-

Median split

freq age/format=notable/statistics=median. [To find the median.]

[notable suppresses the output of the frequency table.]

recode age (lo thru 30=1)(31 thru hi=2) into agemed.

[An easier way uses Visual Binning (Transform Visual Binning). To perform a
median split with this procedure, you need to click on Equal Percentiles and specify
one cutpoint after selecting Make Cutpoints.]

Recoding values to missing

recode salary (1, 6 thru hi=sysmis).

Recoding missing to a value so it will be included in a table

temporary.
recode salary (missing=-9).

[The following table would show which categories of levrec the people missing on
salary are in: crosstabs salary by levrec.]

Creating dummy variables (assuming no missing data for levrec)

recode levrec (1=1)(else=0) into lr1.


recode levrec (2=1)(else=0) into lr2.

If there were missing vaues for levrec, you could use

recode levrec (1=1)(missing=sysmis)(0,2=0) into lr1.


recode levrec (2=1) (missing=sysmis)(0,1=0) into lr2.

Reversing a scale

recode b14b b14c b14g b14i b14l b14o (1=7)(2=6)(3=5)(5=3)(6=2)(7=1).

Conditional recode

do if (sex eq 1).
recode age (lo thru 40=1)(40 thru hi=2) into agerec.
else if (sex eq 2).
recode age (lo thru 50=1)(50 thru hi=2) into agerec.
end if.
value labels agerec 1 'Young' 2 'Old'.

[A silly example, but conditional recodes can come in handy.]


-72-

Note 1. You can make recodes conditional in point-and-click by clicking on the If


button after clicking on Transform Recode Into Same Variables or Transform
Recode Into Different Variables.

Note 2. When using recode … into, any values on the original variable which are not
explicitly recoded will be missing on the new variable.

3. Compute

For the point-and-click compute command, click on Transform Compute Variable.


See also An Introduction to SPSS, p. 25.

Transforming a variable

[Positive skew]

compute sage=sqrt(age).
compute lage=lg10(age).
compute rage=1/age. [reciprocal -- now a higher value = younger]
freq age sage to rage/format=notable/histogram/statistics=skew seskew.

[Negative skew]

compute perf2=perf**2. [perf2 is perf squared.]


freq perf perf2/format=notable/histogram/statistics=skew seskew.

Or, you could simply reverse the scale, so that a negative skew becomes a positive
skew, and apply the appropriate correction for a positive skew:

compute sperf = 7 – perf (so that 1 becomes 6, 2 becomes 5 and so on)


compute sperf= sqrt(sperf).

Creating a scale

compute b14=mean( b14a to b14o). [Assumes the reversal of the b14 items above.]

compute b14=mean.12( b14a to b14o). [b14 is created for cases which have 12 or
more
non-missing items.]

[What's the drawback (apart from the labour involved) of using

compute
b14=(b14a+b14b+b14c+b14d+b14e+b14f+b14g+b14h+b14i+b14j+b14k+b14l+
b14m+b14n+b14o)/15. ?]
-73-

Creating a summed scale, but allowing for missing data

compute b14sum=(mean.12(b14a to b14o))*15. [Assumes the reversal of the b14


items
above.]

This version creates a sum rounded to the nearest whole number:

compute b14sum=rnd((mean.12(b14a to b14o))*15). [Assumes the reversal of the


b14 items
above.]

What's wrong with the following compute if there are missing data?

compute b14sum=sum(b14a to b14o).

Creating dummy variables/conditional compute

do if (levrec eq 1).
compute lr1=1.
else.
compute lr1=0.
end if.

do if (levrec eq 2).
compute lr2=1.
else.
compute lr2=0.
end if.

If you were sure there were no missing data for levrec, the following abbreviated form
could be used:

compute lr1=0.
if (levrec eq 1)lr1=1.
compute lr2=0.
if (levrec eq 2)lr2=1.

Another form of conditional compute

if (id eq 83)b14f=3.

Combining variables

if (sex eq 1 and levrec eq 0)sexlev=1.


if (sex eq 1 and levrec eq 1)sexlev=2.
if (sex eq 1 and levrec eq 2)sexlev=3.
if (sex eq 2 and levrec eq 0)sexlev=4.
if (sex eq 2 and levrec eq 1)sexlev=5.
if (sex eq 2 and levrec eq 2)sexlev=6.
-74-

value labels sexlev 1 'Male Worker' 2 'Male JnrMgr' 3 'Male Mgr'


4 'Female Worker' 5 'Female JnrMgr' 6 'Female Mgr'.

Another way:

compute sexlev=sex*10+levrec.
recode sexlev (10=1)(11=2)(12=3)(20=4)(21=5)(22=6).

Reversing a scale

compute b14b=8 - b14b.


compute b14c=8 - b14c.
….
compute b14o=8 - b14o.

A more efficient way:

do repeat x=b14b b14c b14g b14i b14l b14o.


compute x=8-x.
end repeat.

Centring a variable

descriptives perf who. [To find the means of the original variables.]

compute perfcent=perf - 4.1418. [The means of these two new variables will be
zero.]
compute whocent=who - 3.8028.

4. Count

For the point-and-click count command, click on Transform Count.

Counting the number of times a response occurs over a set of variables

count b14agree=b14a to b14o (5,6,7). [How many items does the respondent agree
with?]
freq b14agree. [A value of 2 means agreement with two items, and so
on.]

Counting the number of missing values over of a set of variables

count miss=sex age educ level salary (missing).

select if (miss eq 0). [Selects for analysis the subjects who have no missing data on
these variables.]
-75-

Appendix 8: Some SPSS Procedures

The following examples use the variables gender (0=female, 1=male), group (1,2,3),
age (1-100+), anxiety (0-30) and depress (1-20), score1 and score2 (test scores [0-20]
for the same subjects on two occasions) and agecat (1=0-20 yrs, 2=21-40 yrs, 3=41 to
60 yrs, 4=61+ yrs). The variable identifying individual subjects is id.

Command
frequencies
Produces One-way frequencies, histograms and statistics
Syntax freq gender age.
examples freq anxiety/histogram/statistics=all.
Point- Analyze Descriptive Statistics Frequencies
and-click
Point- Statistics
and-click Format
options Charts
Notes

Command
descriptives
Produces Means and standard deviations, etc, of numeric variables
Syntax descriptives age anxiety gender.
examples descriptives age anxiety/missing=listwise.
descriptives anxiety depress/save.
Point- Analyze Descriptive Statistics Descriptives
and-click
Point- Options
and-click
options
Notes (1) missing=listwise means that the results will be based only on cases
which have complete (i.e., non-missing) data for all the variables
specified. Surprisingly, this option is available only with syntax.
(2) A weakness: descriptives doesn't calculate the median (use
frequencies or means for that).
(3) A strength: the /save subcommand produces new variables (zanxiety
and zdepress in the example above) which are the standardised (z-score)
versions of the original variables.
(4) The mean of a dichotomous variable coded 0,1 gives the proportion of
cases coded 1. The descriptives output for gender in the above example
would show the proportion of males in the sample.

Command
crosstabs
Produces Crosstabulations (contingency tables), chi-squared test of independence
Syntax crosstabs gender by group.
examples crosstabs gender by group/cells=count col/statistics=chisq.
Point- Analyze Descriptive Statistics Crosstabs
and-click (contd)
-76-

Point- Statistics
and-click Cells
options
Notes

Command
graph
Produces Scatterplots
Syntax graph/scatterplot(bivar)=age with anxiety.
examples graph/scatterplot(bivar)=age with anxiety by gender.
graph/scatterplot(bivar)=age with anxiety by gender by id(identify).
Point- Graphs Scatter Simple Define
and-click
Point- Set Markers by:
and-click Label Cases by:
options
Notes (1) Note that the x-variable is given first, and the y-variable second.
(2) by in syntax and Set Markers by in point-and-click result in the use of
a different symbol for each value of the specified variable.
(3) by id(identify) and Label Cases by mean that individual points in the
graph can be identified by values of the specified variable.

Command
correlations
Produces Pearson correlations
Syntax correlations age anxiety gender.
examples correlations age with anxiety gender .
correlations age anxiety age/missing=listwise.
Point- Analyze Correlate Bivariate
and-click
Point- Options
and-click
options
Notes (1) The subcommand missing=listwise means that correlations are
computed only for cases which have non-missing values on all the
specified variables. The default is missing=pairwise, which means that a
case is excluded only if it is missing on one or both of the variables for
which a correlation is being calculated.
(2) The point-and-click menu gives the option of calculating the
Spearman and Kendall non-parametric correlations, which may be useful
if the data are markedly non-normal.
(3) The correlation between a continuous numeric variable and a
dichotomous variable is referred to as a point-biserial correlation.

Command
means
Produces Means and standard deviations of numeric variables for different
categories of grouping (categorical) variables.
Syntax means anxiety by group.
examples means anxiety by group by gender. (contd)
-77-

means anxiety by group/statistics=anova.


Point- Analyze Compare Means Means
and-click
Point- Options
and-click
options
Notes (1) Means can produce the effect-size eta and also perform a test of
linearity.
(2) Many different statistics can be produced in addition to the mean etc,
including the median. Click the Options button to see the posibilities.

Command
t-test
Produces Tests of the significance of differences between the means of (1) two
groups or (2) two measures of the same (or paired) subjects.
Syntax t-test groups=gender(0,1)/vars=anxiety depress. (1)
examples t-test pairs=score1 score2 . (2)
Point- Analyze Compare Means Independent-Sample T-Test (1)
and-click Analyze Compare Means Paired-Sample T-Test (2)
Point- Options
and-click
options
Notes

Command
npar test
Produces A variety of tests including (1) the binomial test, used to compare an
obtained proportion with that expected from theory or knowledge of the
population, and (2) the one-sample chi-squared test, used for comparing a
sample distribution with a theoretical or population distribution.
Syntax npar test binomial(.5) =gender. (1)
examples npar test chisquare=agecat/expected=.2 .2 .3 .3. (2)
Point- Analyze Nonparametric Tests Binomial (1)
and-click Analyze Nonparametric Tests Chi-Square (2)
Point- Options
and-click
options
Notes
-78-
-79-

Appendix 9: Three Useful Tips

(1) Creating a series of numbered variables with the same root name
When setting up a dataset which is to contain variables showing the responses to
items in a questionnaire (for example), it may speed things up to use this facility.

(a) In the Variable View, enter the name of the first variable. We'll imagine we're
using a 10 item questionnaire called the Statistics Phobia Subscale (SPSS).
Highlight the line containing the variable name:

(b) Right-mouse click on the numbered column next to the line containing the
variable, and select Copy:

(c) Right-mouse click on the numbered column next to the line where the new
variables will begin and select Paste Variables:
-80-

(d) From the resulting display, select the number of new variables to be created,
specify an appropriate root name (by default, SPSS uses the name of the
variable you copied, but includes the number as part of the root name, which
doesn't give the result we want), and the value from which you want SPSS to
start numbering:

(e) When you click on OK the new variables are created in a flash:

(2) Splitting the Data Window

When you are entering or inspecting data in the Data Window it is sometimes helpful
to split the screen so that (for example) you can see variables (or cases) at the
beginning and end of the file simultaneously. We'll demonstrate this capability with
workmot.sav.

At the extreme right of the display, you will see three dots on a vertical bar:
Similarly, there are three dots on a horizontal bar at the bottom of the
window:

The screen can be split vertically and horizontally respectively by dragging on these
spots. For example:
-81-

As can be seen, vertical and horizontal scrolling can be done independently either side
of the splits.

(3) Reading In and Using Summarised Frequency Data

Sometimes you may have data in a tabulated form which you would like to analyse,
such as these (from
workmot4.sav):

Rather than laboriously type the individual observations into SPSS, you can use the
following syntax to read the data in and then apply the weight command, to enable an
analysis to be carried out (using crosstabs in this case):

data list list/


sex level count.
begin data.
1 0 59
1 1 14
1 2 20
2 0 62
216
225
end data.

weight by count.

crosstabs sex by level/cells=count row/statistics=chisq.

You might also like