Andrews Plot
Andrews Plot
Andrews Plot
1/12/2010
Andrews Plot
Summary
The Andrews Plot is a multivariate visualization technique that can be very useful in identifying
differences and similarities amongst observed cases when the number of dimensions is too large
to use a standard scatterplot.
Sample Data:
The file 93cars.sgd contains information on 26 variables for n = 93 makes and models of
automobiles, taken from Lock (1993). The table below shows a partial list of the data in that file:
Data Input
The data to be analyzed consist of 2 or more numeric columns and an optional column with
group identifiers:
Group Codes: an optional column with levels to be used to identify groups of cases.
As an example, 6 variables have been selected. The type of vehicle will be used to identify the
cases.
Analysis Summary
The Analysis Summary shows the number of rows with complete data and summary statistics for
those rows:
Andrews Plot
Data variables:
MPG Highway (miles per gallon in highway driving)
Weight (pounds)
Wheelbase (inches)
Horsepower (maximum)
Engine Size (liters)
Cylinders
Andrews Plot
The Andrews Plot draws one line for each row with complete data:
Andrews Plot
4 Type
Compact
Large
3
Midsize
Small
2 Sporty
Van
1
-1
-2
0 0.2 0.4 0.6 0.8 1
The line for the i-th row plots the following values:
X i1
f i (t ) X i 2 sin(t ) X i 3 cos(t ) X i 4 sin(2t ) X i 5 cos(2t ) ... (1)
2
where the sum consists of as many terms as there are input variables, Xij represents a scaled value
for the j-th variable, and t range between -andIf a group code variable is supplied, its values
will be used to color the lines. In many cases, differences between groups of variables can be
seen. For example, the plot above shows a large amount of clustering by type of vehicle. There
are also some unusual cars, such as one small car that does not follow the pattern of the others. If
you click on a line with the left mouse button, the row number corresponding to that line will be
displayed on the analysis toolbar.
NOTES:
(1) Although t ranges from -toin (1), the horizontal axis is scaled from 0 to 1 for plotting
convenience.
(2) Since the general shape of the plot is dominated by the first few variables, the variables
should be ordered such that the most important variables are listed first.
(3) Analysis Options allows for different scaling of the X variables, which can have a major
impact on the appearance of the plot.
Analysis Options
The Analysis Options dialog box allows you to change the scaling of the variables and the order
of the group codes:
Standardization: the variables may be scaled by subtracting the minimum value of each
variable and dividing by the range, by subtracting the mean and dividing by the standard
deviation, or not transformed.
Group Codes: the order of the group codes in the legend block. You may drag level codes to
change their order.