Variants of Simple Correspondence Analysis: Ontributed Esearch Rticles
Variants of Simple Correspondence Analysis: Ontributed Esearch Rticles
Abstract This paper presents the R package CAvariants (Lombardo and Beh, 2017). The package
performs six variants of correspondence analysis on a two-way contingency table. The main function
that shares the same name as the package – CAvariants – allows the user to choose (via a series of
input parameters) from six different correspondence analysis procedures. These include the classical
approach to (symmetrical) correspondence analysis, singly ordered correspondence analysis, doubly
ordered correspondence analysis, non symmetrical correspondence analysis, singly ordered non
symmetrical correspondence analysis and doubly ordered non symmetrical correspondence analysis.
The code provides the flexibility for constructing either a classical correspondence plot or a biplot
graphical display. It also allows the user to consider other important features that allow to assess the
reliability of the graphical representations, such as the inclusion of algebraically derived elliptical
confidence regions. This paper provides R functions that elaborates more fully on the code presented
in Beh and Lombardo (2014).
Introduction
Computational procedures for detecting the association between two or more categorical variables are
important aspects of statistical theory and practice. In particular, correspondence analysis provides a
quick and simple graphical summary of how categories and variables are associated with one another.
The theoretical aspects of the analysis are well documented in the statistical and allied disciplines;
see, for example, Benzécri (1973), Greenacre (1984), Lebart et al. (1984), Beh (2004a), Nishisato (2007),
and Beh and Lombardo (2014). Despite the necessity for programs and functions that allow their user
to perform correspondence analysis, the availability for many of the varied approaches is generally
limited. Commercially available statistical software, such as MATLAB, Minitab, SAS and SPSS provide
a means of carrying out correspondence analysis, although their procedures often provide only the
most basic of features as part of their output. Generally nothing beyond the calculation of principal
inertia values, profile coordinates, contribution to inertia and a two-dimensional correspondence plot
are provided. Other popular statistical languages, such as R, provide some packages for performing
simple and multiple correspondence analysis of the classical (symmetrical) type, (Murtagh, 2005;
Nenadic and Greenacre, 2007; Alberti, 2015; De Leeuw, 2006; De Leeuw and Mair, 2009a; Ringrose,
2012; Kostov et al., 2015). Nevertheless, at present, no popular statistical packages provide functions
to perform ordered variants of symmetrical and non symmetrical correspondence analysis.
Table 1: R packages and some CA variants. CA: simple CA; NSCA: non symmetrical CA; MCA:
multiple CA; JCA: joint CA; SOCA: singly ordered CA; DOCA: doubly ordered CA; SONSCA: singly
ordered NSCA; DONSCA: doubly ordered NSCA; CCA: canonical CA; CNSCA: canonical NSCA;
DCA: discriminant CA
choice data. Baxter and Cool (2010) and Alberti (2015) provide a good overview of correspondence
analysis using R with an archaeological focus. Another R based package that can be downloaded freely
from CRAN is ExPosition (Beaton et al., 2014). It is written by Herve Abdi and his team and performs
a variety of different multivariate data analysis techniques, including correspondence analysis and
multiple correspondence analysis. Abdi’s group has also been responsible for other variations of
correspondence analysis including multi-block discriminant correspondence analysis (Williams et al., 2010)
and discriminant correspondence analysis (Abdi, 2007). Furthermore, another suite of R functions that
enables the user to perform a variety of correspondence analysis techniques is vegan (Oksanen et al.,
2016), which was developed primarily for vegetation ecologists. It includes functions that provide
the user with a large array of techniques to choose from including classic correspondence analysis,
canonical correspondence analysis and detrended correspondence analysis. One may also consider
the ade4 package (Dray and Dufour, 2007; Chessel et al., 2004; Thioulouse et al., 1997), which also
includes non symmetric correspondence analysis, to analyze ecological and environmental data in
the framework of numerous euclidean exploratory methods. Further, the cncaGUI package (Librero
et al., 2015) allows canonical correspondence analysis and canonical non symmetrical correspondence
analysis providing inferential results by using bootstrap methods. The PTAk package includes
(Leibovici, 2010, 2015) functions for doing multiway data decomposition, and in particular, it also
allows simple correspondence analysis and a generalization of correspondence analysis for k-way
tables. Lastly, but certainly not least, the R code of Murtagh (2005) for performing simple and multiple
correspondence analysis may also be considered.
An overview of the broad areas of correspondence analysis that these packages cover is summa-
rized in Table 1. While non symmetrical correspondence analysis for nominal variables is included in
some of the R packages on the CRAN that perform correspondence analysis, the remaining ordered
variants have not yet been made available in any R package. However, fragments of R code for some
of these CA variants are available in Beh and Lombardo (2014). Therefore, this paper provides a
comprehensive description of R code that enhances, beyond the classics, the type of correspondence
analysis that one may use. The advantages of these variants is that they enable the user to incorpo-
rate categorical predictor/response associations and the ordinal structure of a variable. For ordered
variables we can easily identify any linear and non-linear sources of association that may exist in the
data. The ordered variants also provide a visualization of non-linear trends of association; the classical
approaches to correspondence analysis do not encompass these features.
The theoretical aspects underlying all the six variants of correspondence analysis considered in
this paper can be found in Beh and Lombardo (2014) and Lombardo et al. (2016). However, here we
will provide the reader with a brief overview of the theoretical aspects of these analyses. We also
describe how the algebraic confidence ellipses for polynomial biplots can be derived; this aspect of the
analysis has not been described elsewhere.
Some theory
!2
I J p• j pij I J p• j
X =n∑
2
∑ − pi • =n∑ ∑ πij2 ,
i =1 j =1
pi • p• j i =1 j =1
pi •
where = πij is the I × J matrix of centered column profiles. In this case, the weight matrices in < I
and < J are defined by the elements of the matrices D− 1
I and D J , respectively.
Suppose we now treat the column variable as a predictor variable and the row variable as its
response variable. When such an asymmetric association structure exists between the two categorical
variables one may consider non symmetrical correspondence analysis (Lauro and D’Ambra, 1984;
D’Ambra and Lauro, 1989; Kroonenberg and Lombardo, 1999). To quantify this asymmetric association,
consider the Goodman-Kruskal (1954) tau index
2
pij
J J
∑iI=1 ∑ j=1 p• j p• j − pi • ∑iI=1 ∑ j=1 p• j πij2 τnum
τ= = = .
1 − ∑iI=1 p2i• 1 − ∑iI=1 p2i• τden
For this asymmetric case, the weight matrices are I (an I × I identity matrix) and D J respectively.
Notice that the denominator can be treated as a constant term since it does not depend on the predictor
variable. For this reason it can be neglected without losing any information about the structure
of the association. Therefore τnum is the measure of association considered in non symmetrical
correspondence analysis.
In order to graphically depict the association or the prediction of the rows given the columns in a
low dimensional space, we may consider the generalized singular value decomposition of the centered
column matrix using the suitable weight matrices (Kroonenberg and Lombardo, 1999).
Suppose we consider a general framework for the symmetrical and non symmetrical variants of
CA (Lombardo et al., 2016), that considers generic weight matrices, V I and W J , in < I and < J . This
general framework may be defined by considering the weighted centered column profile matrix
˜ = V1/2 W1/2 .
GSVD ( ˜ ) = A BT .
where the right and left singular vectors are A(= aim ) and B(= b jm ), respectively. These quantities
have the orthonormality properties with metrics D− 1 I
I or I (identity) (in < , depending on the symmetric
or asymmetric relationship between the rows and columns) and D J (in < J ), respectively. As usual, the
elements of the diagonal matrix of singular values, = diag (λm ), are arranged in descending order.
When both variables are ordered, we adapt SVD by using basis vectors for the row and column spaces
by performing the bivariate moment decomposition (BMD) on the matrix ˜. The BMD of ˜ is expressed
as
BMD ( ˜ ) = AZB T ,
where A and B are the row and column polynomial matrices defined by Emerson (1968), respectively,
and Z is the matrix of the generalized correlations (Rayner and Beh, 2009). The construction of
polynomials A and B requires the specification of a priori scores, s X (i ) and sY ( j) (defined by mi and
mj in CAvariants, respectively), to reflect the ordinal structure of the row and column variables. These
polynomials are orthonormal with respect to the weight matrices. For the analysis of nominal variables,
when a symmetrical association between the variables is considered, the weights in < I and < J are
D− 1
I and D J , respectively. When an asymmetric association is considered, the weights are given by I
and D J , respectively.
When only one of the two variables consists of ordered categories, rather than considering the
BMD or the GSVD of ˜, one may consider instead its hybrid decomposition (HD) (Beh, 2001, 2008;
Lombardo et al., 2016). This method of decomposition consists of singular vectors for the nominal
variable and orthogonal polynomials for the ordered variable. Consider the case, as does the package
CAvariants, where the column variable consists of ordered categories and the row variable consists of
nominal categories. Then the HD of ˜ takes the form
HD ( ˜ ) = AZB T ,
where A is the column matrix of singular vectors for the nominal row categories and B is the column
matrix of orthogonal polynomials for the ordered column categories. The generic elements of Z, zmv ,
are the hybrid generalized correlations; for further details on these elements see Beh and Lombardo
(2014) and Lombardo et al. (2016).
The generalized correlation matrix, Z, in the BMD of ˜ reflects the various sources of association
between the variables and is derived using orthogonal polynomials (Best and Rayner, 1996; Beh, 1997;
Rayner and Beh, 2009). For example, when the row and column scores are defined as consecutive
integers such that s X (i ) = i for i = 1, . . . , I and sY ( j) = j for j = 1, . . . , J, then z11 is Pearson’s
product moment correlation of N. A simple generalization of this correlation is z12 which is a measure
of the correlation between any change in the location of the row categories and dispersion of the
column categories. For this reason, z12 is a generalized correlation describing the linear-by-quadratic
association between the row and column categories.
For ordered CA variants, the total inertia is
I −1 J −1
Inertia ( ˜ ) = ∑ ∑ z2uv ,
u =1 v =1
From the matrix of generalized correlations Z, we can obtain the inertia of each polynomial axis
by considering the sum-of-squares of zuv over either u or v. Using BMD or HD, the symmetric
and asymmetric measures of association (X 2 and τ) can be partitioned into polynomial components
that reflect various sources of variation for each of the categories. The inertias of the polynomial
components will henceforth be referred to as sources of inertia and are akin to the principal inertia
values in (symmetrical or non symmetrical) correspondence analysis.
A formal statistical test of the X 2 or τ index can be made. To test the statistical significance of the
total inertia in the symmetrical and non symmetrical case, we can compare the chi-squared statistic,
or the C-statistic, C = τ · (n − 1) · ( I − 1) (Light and Margolin, 1971), with the χ2 distribution with
( I − 1)( J − 1) degree of freedom; see, for example, Beh and Lombardo (2014) for further details.
Unequal inertias of the row and column polynomials. When considering the BMD of ˜, the total
inertia of the row and column spaces (< I and < J , respectively) will be identical. However, the inertia
associated with each of the row and column polynomials will often be different. For the row categories,
there are I − 1 row inertia values – one for each of the axes – where the inertia of the uth polynomial
axis is z2u• . Similarly, there are J − 1 column inertia values – one for each of the axes – where the inertia
of the vth axis is z2•v . For this reason, we recommend constructing polynomial biplots for the ordered
variants of correspondence analysis instead of the traditional correspondence plots constructed using
principal coordinates. See Beh and Lombardo (2014) and Lombardo et al. (2016) for more details on
these features.
For the HD of ˜, the interpretation and properties of the Z matrix are a mixture (or hybrid) of
M∗
from the GSVD and Z from the BMD. When considering the space < J , calculating ∑m 2
=1 zm1 = z•1
2
gives the location component of the ordered columns and represents the principal inertia for this
J −1
variable along the first polynomial axis. Similarly in < I , computing ∑v=1 z21v = z21• = λ21 yields the
principal inertia of the first principal axis for the nominal row variable. Like BMD, HD yields different
sets of inertia values for each axis in the < I and < J spaces.
while the principal polynomial coordinates for the row categories are
J
F = AZ = ˜W J B f iv = αiu zuv = ∑ w• j π̃ij β jv .
j =1
In practice, the coordinates for both the row and column categories are computed using the same
orthonormal polynomial axes, i.e., the column polynomials.
The plot method for objects returned by CAvariants provides the user with the option of con-
structing parametric (or algebraic) elliptical confidence regions for all the six CA variants not only
for the nominal CA variants as originally proposed by Beh (2010). We compute the semi-major and
semi-minor axis lengths of the elliptical region for the row and column categories. Here, we provide
the ellipse axes lengths for the ordered symmetric variants of correspondence analysis. For example,
the semi-major axis length of the confidence ellipse for the ith row category is
v !
I −1
u
χ2α 1
n × trace(Z0 Z) pi• m∑
u
2 t 2
xi(α) = z11 − aim , (1)
=3
Similar semi-axis lengths can also be derived for the column categories and for the non-symmetrical
CA variants. Furthermore, note that ellipsoids can be constructed for three- or higher- dimensional
correspondence plots by considering the input parameter M >2 in the plot method. For further details
on this issue see Beh (2010); Beh and Lombardo (2014).
Unlike the confidence circles of Lebart et al. (1984) and the more computationally intensive
bootstrap techniques proposed in the literature (Markus, 1994; Linting et al., 2007; Ringrose, 2012;
Greenacre, 1984; Lombardo and Ringrose, 2012), constructing confidence ellipses in this manner takes
into consideration the contribution of the ith row principal polynomial coordinate in dimensions
higher than the second. In fact, since all I dimensions are reflected in the semi-major and semi-minor
axis lengths, all of the contribution that a point has to the symmetrical or asymmetrical association can
be accounted for in a two-dimensional plot using equations (1) and (2). Additional information for how
to compute the p-values of each category point can be easily found by considering a similar theoretical
development of the p-values described in Beh and Lombardo (2014, 2015) for a correspondence
analysis of a contingency table with nominal variables.
• The name of the output object, for example say res, used with the main function CAvariants.
• The horizontal polynomial or principal axis, firstaxis. By default, firstaxis = 1.
The print method for “CAvariants” objects included in the package, CAvariants, and displays
the main results of the analysis specified by the user. The results displayed depends on the type of
analysis being performed. The principal inertia values, total inertia and p-values are included as part
of its output when catype = "CA", catype = "SOCA" or catype = "DOCA" and are based on Pearson’s
chi-squared statistic. The Goodman Kruskal tau-index is the association measure of interest when
catype = "NSCA", catype = "SONSCA" or catype = "DONSCA". When an ordered analysis is specified
– such as when catype = "DOCA", catype = "SOCA", catype = "SONSCA" or catype = "DONSCA" – a
table describing the significant polynomial components of inertia will also be reported.
The input parameters of the print method for “CAvariants” objects are:
• The name of the output object, for example say res, used with the main function CAvariants.
• The number of dimensions, printdims, that are used to generate the correspondence plot, or
biplot, and for summarizing the numerical output of the analysis. By default, printdims = 2.
• The flag parameter, ellprint, allows that the characteristics of the confidence ellipses (eccen-
tricity, semi-axis, area, p-values) are displayed. By default, ellprint = TRUE.
• The number of axes, Mell, used for the construction of the confidence ellipses. By default,
it is equal to its maximum value, Mell = min(nrow(Xtable),ncol(Xtable)), i.e., the rank of
the data matrix. This input parameter is identical to the parameter Mell of both, function
CAvariants and the plot method for “CAvariants” objects.
• The level of significance used for the construction of the elliptical regions, alpha. By default,
alpha = 0.05.
• The minimum number of decimal places, digits, used for displaying the numerical summaries
of the analysis. By default, digits = 3.
Furthermore, package CAvariants contains a summary method for the objects returned by CAvariants.
This method provides the list of the objects names of the output and a selection of the main output
objects described in the print method for objects returned by CAvariants.
Numerical outputs
As an example of the complete set of numerical results that is obtained from performing a particular
variant of correspondence analysis, consider the case where a singly ordered non symmetrical cor-
respondence analysis is performed on the data table shopdataM available in the package CAvariants.
This object is the contingency table being analyzed and is described more fully in Section Application.
The output object name of the main function is called res and is the execution of the CAvariants
function on the shopdataM. The object res is obtained using
R> res <- CAvariants(shopdataM, catype = "SONSCA")
The results are available in the following entries which can be obtained using
R> names(res)
which gives
[1] "Xtable" "rows" "cols" "r" "rowlabels"
[6] "collabels" "Rprinccoord" "Cprinccoord" "Rstdcoord" "Cstdcoord"
[11] "tauden" "tau" "inertiasum" "inertias" "inertias2"
[16] "comps" "catype" "mj" "mi" "pcc"
[21] "Jmass" "Imass" "Trend" "Z" "ellcomp"
[26] "risell" "Mell"
These results may be printed to the screen by using
R> print(res)
while a summary of each of these numerical features is produced by using
R> summary(res)
Application
To demonstrate the application of a variant of simple correspondence analysis described in the
CAvariants package, we present the following example. We shall confine our attention to the non
symmetrical correspondence analysis of a singly ordered contingency table. The contingency table
that we are examining is concerned with shoplifting in The Netherlands and summarizes, in part,
the results of a survey of the Dutch Central Bureau of Statistics (Israëls, 1987). The data considers a
sample of 20819 men who were suspected of shoplifting in Dutch stores between 1977 and 1978. The
predictor variable consists of the age groups of the perpetrators (less than 12yrs, 12 to 14yrs, 15 to
17yrs, 18 to 20yrs, 21 to 29yrs, 30 to 39yrs, 40 to 49yrs, 50 to 64yrs, 65yrs and over) while the response
variable of the table consists of the items stolen. These items are clothing, clothing accessory, tobacco
and/or provisions, stationary, books, records, household goods, candy, toys, jewelry, perfume, hobby and/or tools
and other items. For an extensive description of this example, and the application of correspondence
analysis, see Lombardo et al. (2016).
After choosing the suitable variant of correspondence analysis, we create the object res that
consists of the complete features of the analysis by running the command
R> res <- CAvariants(shopdataM, catype = "SONSCA")
print(res) will return as part of its output the following numerical features:
RESULTS for SONSCA Correspondence Analysis
Data Table:
M12< M13 M16 M19 M25 M35 M45 M57 M65+
clothing 81 138 304 384 942 359 178 137 45
accessories 66 204 193 149 297 109 53 68 28
tobacco 150 340 229 151 313 136 121 171 145
stationary 667 1409 527 84 92 36 36 37 17
books 67 259 258 146 251 96 48 56 41
records 24 272 368 141 167 67 29 27 7
household 47 117 98 61 193 75 50 55 29
candy 430 637 246 40 30 11 5 17 28
toys 743 684 116 13 16 16 6 3 8
jewelry 132 408 298 71 130 31 14 11 10
perfumes 32 57 61 52 111 54 41 50 28
hobby 197 547 402 138 280 200 152 211 111
other 209 550 454 252 624 195 88 90 34
Inertias, percent inertias and cumulative percent inertias of the row space
Numerator of Tau Index predicting the rows given the column categories
[1] 0.038
[1] 0.041
** Column Components **
Component Value P-value
Location 6181.536 0
Dispersion 2642.363 0
Cubic 757.192 0
Error 750.418 0
** C-Statistic ** 10331.509 0
Inner product of coordinates (first two axes when 'firstaxis=1' and 'lastaxis=2')
M12< M13 M16 M19 M25 M35 M45 M57 M65+
clothing 0.111 0.097 0.014 -0.112 -0.150 -0.118 -0.065 -0.012 0.044
accessories 0.029 0.022 0.002 -0.024 -0.031 -0.027 -0.019 -0.011 -0.002
tobacco 0.057 0.017 -0.010 -0.003 0.004 -0.023 -0.059 -0.094 -0.122
stationary -0.132 -0.089 -0.003 0.089 0.114 0.110 0.098 0.085 0.064
books 0.023 0.015 0.000 -0.014 -0.018 -0.018 -0.018 -0.017 -0.015
records 0.011 0.008 0.000 -0.008 -0.010 -0.010 -0.008 -0.006 -0.004
household 0.023 0.014 0.000 -0.013 -0.016 -0.018 -0.018 -0.018 -0.017
candy -0.074 -0.049 -0.001 0.048 0.061 0.061 0.055 0.049 0.039
toys -0.122 -0.070 0.004 0.061 0.074 0.088 0.101 0.113 0.115
jewelry -0.021 -0.013 0.000 0.012 0.015 0.016 0.016 0.016 0.015
perfumes 0.021 0.010 -0.001 -0.008 -0.009 -0.013 -0.018 -0.023 -0.026
hobby 0.048 0.007 -0.012 0.010 0.021 -0.011 -0.055 -0.098 -0.135
other 0.026 0.030 0.007 -0.039 -0.054 -0.036 -0.010 0.017 0.043
Eccentricity of ellipses
[1] 0.757
When an ordered analysis is performed, the trend plots of the row and column categories are depicted.
For example, when performing a singly ordered NSCA, the variation, or trend, of the row categories is
examined by observing how it is affected by the ordered column categories when using a polynomial
transformation. Figure 1 shows a parabolic trend of the row category clothing. This trend highlights
that there is a greater propensity to steal clothing by people aged 25 to 45 years than those of a younger,
or older, age. Figure 2 provides an alternative visual display of these trends and is constructed by
depicting the row (items) categories using principal coordinates and the column (age) categories using
standard coordinates. Hence a row isometric biplot is constructed. Since the analysis also incorporates
the ordered nature of the column categories and the nominal structure of the row categories, Figure 2
is referred to as the row isometric polynomial biplot of the data.
The trend plot of Figure 1 and the polynomial biplot given by Figure 2 can be obtained using the
following command:
R> plot(res, plottype = "biplot", biptype = "row", scaleplot = 5, pos = 1)
When the first two polynomial axes are used to construct the biplot of Figure 2, the resulting configu-
ration has a parabolic shape. Observe that the explained inertia of the polynomial axes is as follows:
The first polynomial axis accounts for 59.8% of the inertia and the second polynomial axis for 25.6% of
the inertia. We can therefore see that the novelty of the polynomial biplot is based on the polynomial
representation of the predictor variable. The first linear polynomial axis represents the deviation from
the mean centered profile accounting for the ordered structure of the age groups, which is reflected in
the correct ordering of the age categories along the first polynomial axis. The second polynomial axis
shows a parabolic shape of the categories with positive concavity. Furthermore, note that the left-hand
0.4
clothing
accessories
tobacco
0.3
stationary
toys
other
0.2
0.1
Increase in Predictability
0.0
−0.1
−0.2
10 20 30 40 50 60 70
Figure 1: Trend of rows: A selection of rows of the centered column profile table reconstructed by
using the first two polynomials.
+
0.6
M65+
0.4
25.58%
+
+ M57
*M12<
0.2
toys
*
stationary *
candy
+ ** +
tobacco
0.0
Axis 2
hobby M45
*
*
M13* perfumes
*
* * +
household
jewelrybooks
+accessories
M16 +* + M35
records
other
M19M25 *
−0.4
clothing
Axis 1 59.83%
Figure 2: Row-isometric polynomial biplot of singly ordered NSCA of shoplifting data: first two
polynomial components, Stolen goods and Age.
side of the first axis is dominated by the young age groups with adolescents and young adults at the
center of the display (who steal items consistent with the average number of thefts of all items). The
mid-adult and older age groups are on the right-hand side of Figure 2.
The magnitude of the coordinates indicate the importance of the first two polynomial components
for modeling the trends of the items. In particular, we see that the first two polynomial coordinates are
sufficient to model the trends for most stolen goods. The reliability of the graphical representation can
be assessed by constructing elliptical confidence regions for the row categories which are depicted
using row principal polynomial coordinates. These ellipses can be obtained using the plot method for
“CAvariants” objects such that
M65+ +
3
25.58%
2
M57+
M12< +
1
Axis 2
M45 +
M13 + toys
stationary
candy
tobacco
hobby
perfumes
household
jewelry
books
accessories
other ***
records
0
clothing **
M35 +
M16 +
−1
+ +
M19 M25
−1 0 1 2 3
Axis 1 59.83%
Figure 3: 95% confidence ellipses in the row isometric polynomial biplot of singly ordered NSCA of
the shoplifting data: Stolen goods and Age.
Conclusion
There are many freely downloadable programs/code available for performing classical correspondence
analysis. For example, the R code of Nenadic and Greenacre (2007) and De Leeuw and Mair (2009a)
may be considered for performing simple and joint correspondence analysis. However, the CAvariants
package provides variants of correspondence analysis which are not offered by other correspondence
analysis R packages on CRAN. To the best of these authors’ knowledge, CAvariants is the only package
available that provides the user with the option of performing six variants of two-way correspondence
analysis and, in particular, ordered symmetrical and non symmetrical correspondence analysis variants.
Indeed, symmetrical correspondence analysis for ordered variables was implemented in SPLUS by
Beh (2004b) and has been easily adapted for R.
Subsequent versions of the function may allow for more flexibility by giving the user more tools
to assess the reliability of graphical results. These may include bootstrap confidence regions to
95 % Confidence Ellipses
0.04
25.58%
0.02
tobacco *
hobby *
Axis 2
0.00
perfumes *
jewelry household *
*
books *
accessories *
−0.02
records *
Axis 1 59.83%
Figure 4: A zoomed view of the origin of the row-isometric polynomial biplot given by Figure 3.
Bibliography
H. Abdi. Discriminant correspondence analysis. In N. J. Salkind, editor, Encyclopedia of Measurement
and Statistics, pages 270–275. Sage Publications, Inc., 2007. [p168]
D. Beaton, C. R. C. Fatt, and H. Abdi. An ExPosition of multivariate analysis with the singular value
decomposition in R. Computational Statistics & Data Analysis, 72:176–189, 2014. doi: 10.1016/j.csda.
2013.11.006. [p168]
E. J. Beh. Partitioning Pearson’s chi-squared statistic for singly ordered two-way contingency tables.
The Australian and New Zealand Journal of Statistics, 43:327–333, 2001. doi: 10.1111/1467-842x.00179.
[p170]
E. J. Beh. Simple correspondence analysis: A bibliographic review. International Statistical Review, 72:
257–284, 2004a. doi: 10.1111/j.1751-5823.2004.tb00236.x. [p167]
E. J. Beh. S-PLUS code for ordinal correspondence analysis. Computational Statistics, 19:593–612, 2004b.
doi: 10.1007/bf02753914. [p181]
E. J. Beh. Elliptical confidence regions for simple correspondence analysis. Journal of Statistical Planning
and Inference, 140:2582–2588, 2010. doi: 10.1016/j.jspi.2010.03.018. [p171, 172]
E. J. Beh and R. Lombardo. Correspondence Analysis: Theory, Practice and New Strategies. John Wiley &
Sons, 2014. doi: 10.1002/9781118762875. [p167, 168, 170, 171, 172, 173, 182]
E. J. Beh and R. Lombardo. Confidence regions and p-values for classical and non-symmetric cor-
respondence analysis. Communications in Statistics – Theory and Methods, 44:95–114, 2015. doi:
10.1080/03610926.2013.768665. [p171, 172]
D. J. Best and J. C. W. Rayner. Nonparametric analysis for doubly ordered two-way contingency tables.
Biometrics, 52:1153–1156, 1996. doi: 10.2307/2533077. [p170]
D. Chessel, A. B. Dufour, and J. Thioulouse. The ade4 package I: One-table methods. R News, 4(1):
5–10, 2004. URL https://www.R-project.org/doc/Rnews/Rnews_2004-1.pdf. [p168]
J. G. Clavel, S. Nishisato, and A. Pita. dualScale: Dual Scaling Analysis of Multiple Choice Data, 2014.
URL https://CRAN.R-project.org/package=dualScale. R package version 0.9.1. [p167]
L. D’Ambra and N. C. Lauro. Non-symmetrical correspondence analysis for three-way contingency ta-
ble. In R. Coppi and S. Bolasco, editors, Multiway Data Analysis, pages 301–315. Elsevier, Amsterdam,
1989. [p169]
J. De Leeuw and P. Mair. Simple and canonical correspondence analysis using the R package anacor.
Journal of Statistical Software, 31(5):1–18, 2009a. doi: 10.18637/jss.v031.i01. [p167, 181]
J. De Leeuw and P. Mair. Gifi methods for optimal scaling in R: The package homals. Journal of
Statistical Software, 31(4):1–20, 2009b. doi: 10.18637/jss.v031.i04. [p167]
S. Dray and A. B. Dufour. The ade4 package: Implementing the duality diagram for ecologists. Journal
of Statistical Software, 22(4):1–20, 2007. doi: 10.18637/jss.v022.i04. [p168]
J. Gower, S. Lubbe, and N. le Roux. Understanding Biplots. John Wiley & Sons, Chichester, 2011. doi:
10.1002/9780470973196. [p173]
M. Greenacre. Theory and Application of Correspondence Analysis. London Academic Press, London,
1984. [p167, 171]
A. Israëls. Eigenvalue Techniques for Qualitative Data. DSWO Press, Leiden, 1987. [p175]
N. C. Lauro and L. D’Ambra. L’analyse non symmetrique des correspondances. In E. Diday, editor,
Data Analysis and Informatics III, pages 433–446. Elsevier, Amsterdam, 1984. [p169]
S. Lê, J. Josse, and F. Husson. FactoMineR: An R package for multivariate analysis. Journal of Statistical
Software, 25(1):1–18, 2008. doi: 10.18637/jss.v025.i01. [p167]
L. Lebart, A. Morineau, and K. M. Warwick. Multivariate Descriptive Statistical Analysis. John Wiley &
Sons, New-York, USA, 1984. [p167, 171, 172]
R. J. Light and B. H. Margolin. An analysis of variance for categorical data. Journal of the American
Statistical Association, 66(335):534–544, 1971. doi: 10.1080/01621459.1971.10482297. [p170]
M. Linting, J. J. Meulman, P. F. J. Groenen, and A. J. Van der Kooij. Stability of nonlinear principal
components analysis: An empirical study using the balanced bootstrap. Psychological Methods, 12(3):
359–379, 2007. doi: 10.1037/1082-989x.12.3.359. [p171]
R. Lombardo and E. J. Beh. CAvariants: Correspondence Analysis Variants, 2017. URL https://CRAN.R-
project.org/package=CAvariants. R package version 3.4. [p167]
M. T. Markus. Bootstrap Confidence Regions in Non-Linear Multivariate Analysis. DSWO Press, 1994.
[p171]
F. Murtagh. Correspondence Analysis and Data Coding with Java and R. Chapman & Hall/CRC, Boca
Raton, FL, 2005. doi: 10.1201/9781420034943. [p167, 168]
S. Nishisato. Multidimensional Nonlinear Descriptive Analysis. Taylor & Francis Group, LLC, 2007. [p167,
169]
J. C. W. Rayner and E. J. Beh. Towards a better understanding of correlation. Statistica Neerlandica, 63:
324–333, 2009. doi: 10.1111/j.1467-9574.2009.00425.x. [p170]
T. J. Ringrose. Bootstrap confidence regions for correspondence analysis. Journal of Statistical Computa-
tion and Simulation, 83:1397–1413, 2012. doi: 10.1080/00949655.2011.579968. [p167, 171]
B. Ripley. MASS: Support Functions and Datasets for Venables and Ripley’s MASS, 2016. URL https:
//CRAN.R-project.org/package=MASS. R package version 7.3-45. [p167]
J. Thioulouse, D. Chessel, S. Dolédec, and J. M. Olivier. ADE-4: A multivariate analysis and graphical
display software. Statistics and Computing, 7:75–83, 1997. doi: 10.1023/a:1018513530268. [p168]
W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Springer-Verlag, 4th edition, 2002.
doi: 10.1007/978-0-387-21706-2. [p167]
Rosaria Lombardo
Department of Economics, University of Naples Campania “Luigi Vanvitelli”
via Gran Priorato di Malta, Capua 81043
Italy
rosaria.lombardo@unina2.it
Eric J. Beh
School of Mathematical & Physical Sciences, University of Newcastle
University Drive, Callaghan, NSW, 2308 Australia
eric.beh@newcastle.edu.au