2019 Auerswaldmoshagen EFAextractioncriteria
2019 Auerswaldmoshagen EFAextractioncriteria
net/publication/330546928
CITATIONS READS
279 4,660
2 authors, including:
Morten Moshagen
Ulm University
124 PUBLICATIONS 6,547 CITATIONS
SEE PROFILE
All content following this page was uploaded by Morten Moshagen on 15 October 2022.
CITATION
Auerswald, M., & Moshagen, M. (2019, January 21). How to Determine the Number of Factors to
Retain in Exploratory Factor Analysis: A Comparison of Extraction Methods Under Realistic
Conditions. Psychological Methods. Advance online publication.
http://dx.doi.org/10.1037/met0000200
Psychological Methods
© 2019 American Psychological Association 2019, Vol. 1, No. 999, 000
1082-989X/19/$12.00 http://dx.doi.org/10.1037/met0000200
Abstract
Exploratory factor analyses are commonly used to determine the underlying factors of multiple observed
variables. Many criteria have been suggested to determine how many factors should be retained. In this
study, we present an extensive Monte Carlo simulation to investigate the performance of extraction
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
criteria under varying sample sizes, numbers of indicators per factor, loading magnitudes, underlying
This document is copyrighted by the American Psychological Association or one of its allied publishers.
multivariate distributions of observed variables, as well as how the performance of the extraction criteria
are influenced by the presence of cross-loadings and minor factors for unidimensional, orthogonal, and
correlated factor models. We compared several variants of traditional parallel analysis (PA), the
Kaiser-Guttman Criterion, and sequential 2 model tests (SMT) with 4 recently suggested methods:
revised PA, comparison data (CD), the Hull method, and the Empirical Kaiser Criterion (EKC). No single
extraction criterion performed best for every factor model. In unidimensional and orthogonal models,
traditional PA, EKC, and Hull consistently displayed high hit rates even in small samples. Models with
correlated factors were more challenging, where CD and SMT outperformed other methods, especially
for shorter scales. Whereas the presence of cross-loadings generally increased accuracy, non-normality
had virtually no effect on most criteria. We suggest researchers use a combination of SMT and either
Hull, the EKC, or traditional PA, because the number of factors was almost always correctly retrieved
if those methods converged. When the results of this combination rule are inconclusive, traditional PA,
CD, and the EKC performed comparatively well. However, disagreement also suggests that factors will
be harder to detect, increasing sample size requirements to N ⱖ 500.
Translational Abstract
Exploratory factor analysis (EFA) is a statistical tool commonly used in psychological research to
determine the underlying factors of questionnaire items. One of the key issues in EFA is deciding how
many underlying factors researchers need to assume to account for different responses to these items. In
this simulation study, we compared different extraction criteria, designed to determine this number, under
conditions that are realistic in empirical practice. We investigated conditions with one underlying factor,
multiple uncorrelated factors, and multiple correlated factors. In addition, we also violated two assump-
tions of the extraction criteria. First, we included conditions with minor underlying factors that represent
systematic measurement errors, for example, when different questionnaire items are phrased in a similar way.
Second, many extraction criteria assume a normal distribution of responses to the questionnaire items and we
included conditions where this distribution was non-normal. We found that (1) some criteria perform better
in conditions with one factor or multiple uncorrelated factors, whereas other criteria perform well in conditions
with multiple correlated factors, (2) the latter criteria perform worse when minor factors are present, and (3)
non-normality did not impact the performance of most criteria. We suggest that researchers use two criteria
in conjunction, one suited for single/uncorrelated factors and one suited for correlated factors. If both criteria
suggest the same number of factors, the result is likely correct. Otherwise, the sample size should be at least
500 because the number of underlying factors is harder to detect.
Exploratory factor analysis (EFA) is a widely used statistical of observed variables, especially if there is no strong a priori
method to study the underlying latent structure of a large number justification for a particular theoretical model. EFA determines the
underlying structure using a data-driven approach assuming a
common factor model (Thurstone, 1947). In this model, each
observed variable is conceptualized as the weighted sum of a set of
(potentially correlated) factor variables and a single unique factor.
Max Auerswald and Morten Moshagen, Institute of Psychology and Edu-
The common factors account for covariances among the observed
cation, Department of Quantitative Methods in Psychology, Ulm University.
Correspondence concerning this article should be addressed to Max
variables and, thus, are the factors of theoretical interest. Unique
Auerswald, Research Methods, Institute of Psychology and Education, factors, on the other hand, exclusively account for the variances of
Ulm University, Albert-Einstein-Allee 47, 89081 Ulm, Germany. E-mail: single observed variables, which is considered to reflect measure-
max.auerswald@uni-ulm.de ment error with regard to the common factors.
1
2 AUERSWALD AND MOSHAGEN
One of the key issues in EFA is deciding how many latent The Common Factor Model
factors need to be extracted. Both under- and overestimating the
number of factors (referred to as under- and overextraction, re- The common factor model (for an overview, see, e.g., Jöreskog,
2007) assumes a set of m latent common factors 1, . . . , m that
spectively) have detrimental effects on the quality of EFA (Com-
explain variations in the p manifest (and standardized) random
rey, 1978). Underextraction results in substantial error on all factor
variables x1, . . . , xp. A single manifest variable xi is assumed to
loadings, irrespective of their weight in a correctly specified model
be a linear combination of 1, . . . , m and one unique factor εi,
(Wood, Tataryn, & Gorsuch, 1996), and deteriorates the factor
similar to a linear regression:
scores compared with factor scores in a correctly specified model
(Fava & Velicer, 1996). In contrast, overextraction typically re- xi ⫽ li11 ⫹ li22 ⫹ . . . ⫹limm ⫹ εi, 1 ⱕ i ⱕ p, (1)
sults in lower biases in factor scores and loadings (Fava & Velicer,
where εi is uncorrelated with all 1, . . . , m and all εi= for which
1992; Wood et al., 1996). However, overextraction can lead to
i ⫽ i⬘, and lij is the loading of the i-th item on factor j. The goal
factor splitting, such that manifest variables with population load-
is thus to find common latent factors, fewer in number than the
ings on one factor are split on multiple factors after the rotation
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
1,2
1,1 2,2
1 2
1 2 3 4 5 6 7
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 1. A common factor model with two common latent factors and seven observed variables. The common
factors can be correlated whereas the unique factors are independent from other unique factors and the latent
factors. The arrows from the common factors to the observed variables indicate the loadings li1, li2 for 1 ⱕ i ⱕ
7 (see Equation 1).
We denote the correlation matrix between the common factors PCA is primarily a data reduction technique that does not differ-
as ⌽ (⫽ E(T)) and the covariance matrix of the unique factors ε entiate between common and unique variance. If the goal of the
as ⌬ (⫽ E(εεT)). Because and ε are independent, the model analysis is to uncover a latent structure that addresses the covari-
expresses the correlation matrix as ances among observed variables measured with some random
error, which is a more realistic case in psychological research,
R ⫽ ⌳⌽⌳T ⫹ ⌬. (7) EFA is usually preferred (e.g., Bentler & Kano, 1990; de Winter &
The common factor model thus becomes a statement about the Dodou, 2016). The main difference with regards to the eigenvalues
correlation matrix, where the matrices ⌳ and ⌽ are only deter- is that PCA eigenvalues are calculated based on the correlation
mined up to a rotation (Browne, 2001). matrix R instead of RC. However, a hypothetical population com-
The matrix ⌬ in Equation 7 is a diagonal matrix because the mon factor model fully determines the correlation matrix R (see
common factor model assumes that all unique factors εi, εi=, are Equation 7) and, therefore, the associated eigenvalues of R as well
independent for i ⫽ i⬘. The entries ␦i of ⌬ are called uniqueness (see, e.g., Braeken & van Assen, 2017, p. 465). Furthermore, there
factors and represent the part of variance of the manifest variable is evidence that EFA and PCA often yield comparable results in
xi that is independent of the latent factors. The communalities are practice (Stevens, 2009). We return to this issue later in this article
their counterpart, that is, the part of the variance of xi that can be in the context of which matrix to choose in PA.
explained by the latent factors.1 The common factor model esti-
mates ⌳ such that General Issues in Factor Extraction
R̂C ⬇ ⌳⌳ , T
(8) Regardless of the particular criterion employed to determine the
number of retained factors, conditions can be identified that will
where R̂C is the matrix that results when replacing the diagonal simplify or complicate recovery of the correct number of factors.
elements of R with the communalities. One least squares solution Most importantly, the correct number of factors is typically harder
to Equation 8 estimates the loadings ⌳ proportional to the so- to detect when factor saturation is low (due to, e.g., low factor
called eigenvectors of R̂C (Jöreskog, 2007). In general, eigenvec- loadings, a small number of items per factor, or high factor
tors are vectors for which intercorrelations), because the eigenvalues associated with true
A ⫽ , ⫽ 0 (9) factors are numerically closer to the remaining eigenvalues
(Braeken & van Assen, 2017).
holds for an arbitrary square matrix A of size p ⫻ p, a vector of Given that the eigenvalues can be immediately derived from a
length p, and a scalar , the corresponding eigenvalue. Symmetric, population common factor model (see Equation 7 and 8), we can
positive semidefinite matrices like covariance matrices or RC
always have p (not necessarily distinct) non-negative eigenvalues.
1
Most importantly, the j-th largest eigenvalue of RC corresponds to The problem of communalities refers to the difficulty of simultane-
the variance explained by the j-th factor in a common factor model ously estimating the proportion of variance that can be explained by
common factors and the common factor model itself. The common factor
(see the Appendix for a more technical explanation of this fact). model approximates a correlation matrix with communalities on the diag-
Principal component analysis (PCA) is also often used as a onal, but the communalities are only known after the model is estimated
substitute for EFA based on the common factor model. However, (see e.g. Harman, 1976).
4 AUERSWALD AND MOSHAGEN
analytically derive the effects of different factor models on the Kaiser-Guttman Criterion
expected eigenvalues.2 For a hypothetical population common
One of the most prominent heuristics to determine the number
factor model with m factors, p items per factor, standardized factor
of factors to retain is the KGC (Guttman, 1954; Kaiser, 1960),
loadings l, and factor intercorrelation r, the eigenvalues can be
which extracts all factors with corresponding sample eigenvalues
derived from the model implied correlation matrix R as
greater than 1. The rationale behind this rule is that a factor should
1 ⫽ 1 ⫹ (p ⫺ 1)l2 ⫹ (m ⫺ 1)prl2 at least explain as much variance as a single item. However,
because sampling error leads to eigenvalues that exceed 1 even in
2 ⫽ . . . ⫽ m ⫽ 1 ⫹ (p ⫺ 1)l2 ⫺ prl2 (10)
the absence of any factor, the KGC severely overestimates the
m⫹1 ⫽ . . . ⫽ mp ⫽ 1 ⫺ l2 number of factors (e.g., Hakstian, Rogers, & Cattell, 1982; Lance,
Butts, & Michels, 2006; Zwick & Velicer, 1986). Despite this
(Braeken & van Assen, 2017). Similarly, we can derive the eigen-
substantial bias, the KGC is commonly used (Henson & Roberts,
values of RC as
2006) and is the default in several statistics programs such as SPSS
(IBM Corp, 2015).
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
1 ⫽ pl2 ⫹ (m ⫺ 1)prl2
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 2 illustrates the effects of factor loadings, number of Cattell’s (1966) scree test is a graphical method based on the
plot of the successive eigenvalues in descending order (the so-
items per factor, and factor intercorrelation on the eigenvalues of
called scree plot). The test is performed by searching for an elbow,
R and RC assuming a factor model with three common factors. As
a point at which the eigenvalues decrease abruptly. The method
can be seen, eigenvalues of true factors will generally increase
suggests extracting all factors up to the factor corresponding to the
with the loading magnitude and the number of items per factor
eigenvalue preceding the sharpest decline. Being a graphical ap-
because both (p ⫺ 1)l2 (Equation 10) and pl2 (Equation 11) will be
proach, the method is obviously subjective and therefore rarely
larger. In contrast, high factor intercorrelations increase the term
evaluated systematically. Furthermore, scree plots can be ambig-
prl2, implying that the first eigenvalue will increase with r,
uous, either lacking any clear elbow or showing multiple elbows in
whereas the remaining eigenvalues associated with true factors the same scree plot (Ruscio & Roche, 2012). Raîche, Riopel, and
decrease with r. Except for the first factor, highly correlated Blais (2006) suggested nongraphical solutions for Cattell’s scree
factors therefore present a difficult condition for factor extraction. test that rely on the change in slope of adjacent eigenvalues. Both
Eigenvalues estimated from a sample will deviate from the methods clearly outperformed the Kaiser criterion, but tended to
population eigenvalues of Equations 10 and 11. Braeken and van underestimate the number of factors and were inferior to other
Assen (2017) demonstrated that sample dispersions tend to affect approaches, such as PA (Raîche, Walls, Magis, Riopel, & Blais,
eigenvalues of various examples in three relevant respects. First, 2013; Ruscio & Roche, 2012).
the eigenvalues 1, . . . , <m ⁄ 2= (the first half of eigenvalues asso-
ciated with true factors) tend to increase because the extracted
Traditional and Revised Parallel Analysis
factors capitalize on chance correlations in the sample correlation
matrix. Second, >m ⁄ 2?, . . . , m (the second half of eigenvalues PA (Horn, 1965) compares the empirical eigenvalues with the
associated with true factors) tend to decrease, because a larger mean of eigenvalues obtained from random samples based on
portion of variance has already been explained by the first uncorrelated variables. The random samples have the same number
factors. Third, the first half of remaining eigenvalues 共m⫹1, . . . , of observations and variables as the empirical data, so the eigen-
<m⫹共p⫺m兲⁄2=兲 again tend to increase. Similar to the effect for values of the random samples take sampling error into account. PA
1, . . . , <m ⁄ 2=, additional factors capitalize on remaining chance extracts all factors with eigenvalues that exceed the average cor-
correlations that were not addressed by the previous factors. Over- responding eigenvalue of the random samples (see Figure 3 for an
all, this deviation leads to a more ambiguous pattern of eigenval- example).
ues, because the difference between m and m⫹1 decreases. The eigenvalues in PA are typically based on the correlation
In sum, the above illustrates that any extraction criterion that matrix R of observed and random samples (PAPCA; e.g., Finch &
explicitly or implicitly relies on the sample eigenvalues will per- West, 1997; Steger, 2006), similar to a PCA. As we discussed
form well when there are many, high-loading indicators and when above, a common factor model fully determines both the eigen-
factor correlations are weak. In contrast, low loadings, few indi- values of R and RC, so PA can also be based on the correlation
matrix RC with communalities on the diagonal, reflecting the EFA
cators per factor, and strong factor correlations pose serious chal-
eigenvalues (PAEFA, Humphreys & Ilgen, 1969). There has been
lenges, especially when the sample size is small.
some controversy as to which eigenvalues are appropriate for PA
(Mulaik, 2010). Garrido, Abad, and Ponsoda (2013) argued that
Methods to Decide the Number of Retained Factors the common factor model is inappropriate for PA because the
4
λPCA λEFA
3
3
Eigenvalues
Eigenvalues
Eigenvalues
Eigenvalues
2
2
1
1
0
0
2 4 6 8 10 12 2 4 6 8 10 12 5 10 15 2 4 6 8 10 12
Figure 2. Population eigenvalues of R (PCA) and RC (EFA). The first panel displays eigenvalues for a
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
common factor model with three orthogonal factors, four items per factor, and standardized loadings l ⫽ .6.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Compared with the first panel, the second panel shows the effect of increased loadings (l ⫽ .8), the third panel
displays increased items per factor (p ⫽ 6), and the fourth panel illustrates three correlated factors (r ⫽ .5).
random samples have uncorrelated variables with EFA commu- is generally considered to be the method of choice (e.g., Hayton et
nalities of h2 ⫽ 0 in the population, whereas the common factor al., 2004; Schmitt, 2011). However, there are two weaknesses
model assumes a common cause behind the observed variables. associated with PA, initially suggested by Horn (1965). The first
PCA, on the other hand, does not account for unique variance and stems from the fact that sampling error can lead to eigenvalues
might overestimate the explained variance of common factors. above the average eigenvalue of random samples. For example, if
However, this overestimation is similar for empirical and random all manifest variables are uncorrelated in the population (such that
sample eigenvalues, resulting in no meaningful bias once these there is no common factor), the first empirical eigenvalue would
eigenvalues are compared with each other (Garrido et al., 2013). exceed the first average eigenvalue from random samples in ap-
Furthermore, the performance of PAEFA is also affected by the proximately 50% of all samples, which would lead to overestima-
method of estimating the communalities. Crawford et al. (2010) tion of the number of factors for PA. One possible solution is to
found a higher hit rate for PAPCA unless factors were moderately use the 95th percentile of the eigenvalues obtained from random
or highly correlated, compared with a PA where communalities are samples as a threshold instead of the mean (Glorfeld, 1995).
estimated as sample multiple R2 between the variables and all The second weakness of PA involves the choice of the reference
remaining variables. Based on existing evidence (Garrido et al., eigenvalues for the second and following factors (Turner, 1998).
2013), PAPCA seems to produce better results than PAEFA. Assume that the empirical data set has a single underlying factor
PA is supported by strong evidence from simulation studies that explains a large portion of the item covariances. Any remain-
(Hubbard & Allen, 1987; Humphreys & Montanelli, 1975; Peres- ing factor can only explain a fraction of the yet unexplained
Neto et al., 2005; Velicer et al., 2000; Zwick & Velicer, 1986) and covariances. However, the items in the random samples that con-
6
5
4
Eigenvalues
3
2
1
0
0 5 10 15 20 25 30 35 40
Number of factors
Figure 3. Parallel analysis on a simulated sample with N ⫽ 100, 40 manifest variables, and five underlying
factors. The filled dots represent the sorted eigenvalues of the sample correlation matrix. The empty dots
represent the average eigenvalues of correlation matrices from 100 independent random samples. The solid line
depicts the threshold for the Kaiser-Guttman Criterion. Parallel analysis correctly identifies the number of factors
as five, while the scree test suggests either one or three. The Kaiser-Guttman Criterion suggests 14 factors and
thus overestimates the number of factors severely.
6 AUERSWALD AND MOSHAGEN
i⫽1
(emp,i ⫺ sim,i)2 , (12)
ated by differing portions of unexplained covariance, might leave for p items. This step results in two RMSEs, one for each
a second factor undetected and underestimate the number of fac- number of underlying factors.
tors. These two weaknesses can counteract each other, so only
correcting for one weakness can lead to lower accuracy in some 3. Repeat Steps 1 and 2, for example, 500 times.
conditions. For example, Cho, Li, and Bandalos (2009) showed
4. Assess if the RMSE is significantly lower in the condition
that PA was more accurate if the average eigenvalues were used as
with two factors via a (one-sided) Wilcoxon’s test with
a criterion compared to using the 95th percentile. However, there
␣ ⫽ .30.
is no guarantee that these deficiencies have effects to the same
extent but in opposing directions, so traditional PA is typically 5. If the difference in RMSEs is not significant, CD suggests
biased because one weakness is more significant than the other.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
instead of using the eigenvalues relative to the number of factors, (GOF j ⫺ GOF j⫺1) ⁄ (df j ⫺ df j⫺1)
(13)
the Hull method relies on goodness-of-fit indices relative to the (GOF j⫹1 ⫺ GOF j) ⁄ (df j⫹1 ⫺ df j)
model degrees of freedom of the proposed model. More specifi-
cally, the method finds the number of factors in four steps: obtains its maximum and j is a viable solution.3 The Hull method
correctly identifies the number of factors as five in the example in
1. The method calculates a goodness-of-fit index GOFj and Figure 4.
model degrees of freedom dfj of various models with an The elbow is identified as the value where, relative to the change
increasing number of factors j up to a prespecified max- in the model df, model fit increases considerably compared to a
imum J (0 ⱕ j ⱕ J). Figure 4 depicts the comparative fit lower number of factors (j ⫺ 1) but is only barely lower as the
index (CFI, Bentler, 1990) for solutions with zero to model fit associated with a higher number of factors. This criterion
seven factors, corresponding model degrees of freedom, value is based on every viable fit value relative to both its preced-
and a simulated sample with five underlying factors. ing and subsequent fit values. Note that the suggested factor
solution therefore cannot be the first or last factor in the range for
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
2. A solution sj is considered to be unviable if a less complex which the model fit is estimated (unless all other solutions are
This document is copyrighted by the American Psychological Association or one of its allied publishers.
model (indicating a lower number of factors) with a higher unviable). This range typically includes a zero-factor model as a
(better) fit index exists. The j-th solution is thus unviable if minimum. In order to avoid overextractions, the maximum number
there is a solution sj= with j= ⬍ j and GOFj= ⬎ GOFj. In of factors is typically set to be the number of factors extracted
Figure 4, no solution is excluded at this point. based on traditional PA with the 95th percentile as a criterion, plus
one (Lorenzo-Seva et al., 2011). If the maximum is 1 (e.g., in the
3. The remaining solutions are further identified as unviable if case of a zero-factor model), the Hull method cannot be applied
GOFj is below the line connecting adjacent viable solutions and implicitly relies on traditional PA to identify the correct
in a plot of fit indices and model degrees of freedom. By this number of factors.
rule, Solutions 2 and 4 are excluded in Figure 4. This step is Lorenzo-Seva et al. (2011) compared the Hull method with
repeated until no remaining solutions can be identified as various goodness-of-fit indices to other selection criteria. The
unviable. design of the simulation study incorporated both major and minor
factors, where major factors constituted the factors of interest.
4. The Hull method then suggests the number of factors where Minor factors were associated with (random) loadings that ac-
counted for 15% of the variance on average. While no method
consistently outperformed the other approaches, the Hull method
based on the CFI was superior to other methods, including tradi-
Number of factors
tional PA, in conditions where the number of observed variables or
0 1 2 3 4 5 6 7
the sample size was large. However, the method has not yet been
1.0
with uncorrelated factors, but may fall short when factors are
highly correlated or when some factors only account for a small
CFI
冉 冑Np 冊 ,
1,ref ⫽ 1 ⫹
2
(14)
one additional factor, in this case a unidimensional factor model, is
estimated and tested. The procedure continues until a nonsignifi-
cant result is obtained, at which point the number of common
for N observations and p items. Subsequent eigenvalues are cor-
factors is identified.
rected by the explained variance, expressed as the eigenvalues of
Simulation studies investigating the performance of sequential
previous factors. The j-th reference eigenvalue is
2 model tests (SMT) as an extraction criterion have shown
Braeken and van Assen (2017) derived theoretical conditions for assumptions if the number of factors for the test exceeds the true
This document is copyrighted by the American Psychological Association or one of its allied publishers.
scale reliability, number of observations, number of factors, and number of factors. For example, if a test of three factors is applied
factor correlation under which the EKC is expected to correctly to samples from a population with two underlying factors, the
identify the number of factors. For example, for orthogonal factors, likelihood ratio test statistic will no longer follow a 2 distribution.
EKC is predicted to work if Note that the tests are applied sequentially, so a three-factor test is
The fit of common factor models is often assessed with the We considered 11 methods for determining the number of
likelihood ratio test statistic (Lawley, 1940) using maximum like- retained factors.
lihood estimation (ML), which tests whether the model-implied Kaiser-Guttman Criterion (KGC). The KGC has been im-
covariance matrix is equal to the population covariance matrix. plemented using the eigenvalues of either the input correlation
The associated test statistic asymptotically follows a 2 distribu- matrix (KGCPCA) or the correlation matrix with communalities on
tion if the observed variables follow a multivariate normal distri- the diagonal (KGCEFA).
bution and other assumptions are met (e.g., Bollen, 1989). This test
can be sequentially applied to factor models with increasing num- 4
Note that assumptions about the underlying factor structure, reliabili-
bers of factors, starting with a zero-factor model. If the 2 test ties, and factor correlation could be used for a power analysis for EFA to
statistic is statistically significant (with e.g., p ⬍ .05), a model with determine a sample size under which EKC is expected to work.
DETERMINING THE NUMBER OF FACTORS 9
superior to every other implementation in Lorenzo-Seva et al. Presence of cross-loadings. Cross-loadings also often occur
This document is copyrighted by the American Psychological Association or one of its allied publishers.
(2011). in empirical data sets (e.g., DiStefano & Hess, 2005) and were
Comparison data (CD). CD (Ruscio & Roche, 2012) was simulated with two levels in this simulation: present and absent. In
implemented using an alpha level of .30 and 500 resamples, in line the condition without cross-loadings, the loading pattern matrix
with the recommendations of Ruscio and Roche (2012). only contained the primary loadings (as described above). The
Empirical Kaiser Criterion (EKC). The EKC (Braeken & condition with cross-loadings included additional standardized
van Assen, 2017) was implemented using the eigenvalues of the loadings of l1 ⫽ .2 and l2 ⫽ ⫺.2 for the second and fourth
input correlation matrix. indicator out of each set of four indicators. For example, if a factor
Sequential 2 model tests (SMT). We implemented SMT was based on 12 indicators, the second, sixth, and 10th indicator
based on the hypothesis of perfect fit with ␣ ⫽ .05 and ML had cross-loadings of l1 ⫽ .2, whereas the fourth, eighth, and 12th
estimation. indicator had cross-loadings of l2 ⫽ ⫺.2. The cross-loading l1 was
on the first succeeding factor and l2 was on the second succeeding
factor. Note that the effect of cross-loadings potentially depends on
Experimental Conditions
both the number and the magnitude of the primary loadings, so that
We attempted to cover a wide range of data conditions plausibly the chosen approach to use a fixed magnitude for the secondary
occurring in empirical factor analysis studies. loadings might have a stronger effect when the primary loadings
Number of observations. The number of observations was are low. However, the inclusion of this condition does allow us to
set to 100, 200, 500, or 1,000, thereby covering the sample sizes identify whether cross-loadings have any effect on the accuracy of
used in most empirical studies (Fabrigar et al., 1999; Jackson, factor extraction methods at all.
Gillaspy, & Purc-Stephenson, 2009; Worthington & Whittaker, Presence of minor factors. We included conditions with only
2006). major factors, as described above, and two conditions with a single
Number of latent factors. Manifest variables were generated additional minor factor. Here, we define latent factors to be minor
with one, three, or five underlying factors, representing the dimen- when they represent systematic variance that is irrelevant with
sionality of scales most common in psychometric measurement respect to the factors of theoretical interest. For example, this
(DiStefano & Hess, 2005; Jackson et al., 2009). could include minor correlations among indicators due to phrasing
Factor intercorrelation. The intercorrelation among latent different items in the same direction. Specifically, we defined
factors was set to 0, .25, .50, or .75, covering the range from minor factors as factors with uniformly distributed standardized
independent to highly correlated scales. loadings on every indicator. The loadings were within the range
Indicators per latent factor. We examined four, eight, or 12 (⫺.1, .1) in the condition with weak minor factors and between
indicators per latent factor. While the majority of scales in psy- (⫺.11, ⫺.09) or (.09, .11) for moderate minor factors (Lorenzo-
chological assessment comprise four to eight indicators (DiStefano Seva et al., 2011). The advantage of this conceptualization is that
& Hess, 2005; Fabrigar et al., 1999; Jackson et al., 2009), factor the explained variance of minor factors is on average equal across
extraction criteria are especially important in the initial develop- conditions at 0.33% or 1%. Furthermore, the random loading
ment of a measurement instrument. The process of constructing a pattern and low explained variance ensure that such factors are not
scale typically involves the elimination of indicators, so a condi- meaningful and should not be extracted when performing EFA
tion involving 12 indicators per factor was realized to represent a with empirical data.
scale before the elimination process. Note that the present simulation study defined minor factors in
Loading magnitude. The standardized loadings of the ob- a way that they account for only a small or moderate amount of
served variables on the latent factors was set to either (.65, .55, .45, the variance. We pursued this particular approach to ensure that the
.35) or (.8, .7, .6, .5) for each set of four variables (i.e., every additional source of systematic variance would unambiguously be
loading was assigned three times when a factor was measured by considered as irrelevant in empirical practice. For example, one
12 indicators). The resulting average loadings were .50 or .65, condition in the simulation by Green et al. (2012) realized a
which is typical for psychological research (DiStefano & Hess, common factor model with two factors, loadings of l ⫽ .40 each,
2005). The implied McDonald’s reliability coefficients (McDon- the same number of indicators per factor, and a factor correlation
ald, 1999) are presented in Table 1. Whereas Fabrigar, Wegener, of ⫽ .80. In this condition, the (unrotated) second factor explains
MacCallum, and Strahan (1999) reported slightly higher reliabili- 1.6% of the common variance. By comparison, the minor factors
10 AUERSWALD AND MOSHAGEN
that methods were supposed to ignore in the study conducted was assigned two or three times (as was done for loading magni-
by Lorenzo-Seva et al. (2011) on average accounted for 15% of tudes, see above). The applied set of quantile mixtures resulted in
the common variance. Clearly, we cannot expect one statistical non-normal distributions exhibiting an average skewness of ␥3,F1 ⫽
method to differentiate between these conditions and therefore 0 共SD␥3,F1 ⫽ 0.27兲, ␥3,F2 ⫽ 0.69 共SD␥3,F2 ⫽ 0.10兲, ␥3,F3 ⫽ 0
defined minor factors as rather small sources of systematic vari- 共SD␥3,F3 ⫽ 0.49兲, ␥3, F4 ⫽ ⫺1.25 共SD␥3,F4 ⫽ 0.19兲 in both non-
ance. normality conditions. Kurtosis was on average ␥4 ⫽ 12 across all
Multivariate distribution. Three types of distributions were quantile mixtures and non-normality conditions 共SD␥4, f1 ⫽ 10.95,
used (normal, non-normal based on non-normal errors, non-normal SD␥4, f2 ⫽ 3.70, SD␥4, f3 ⫽ 3.78, SD␥4, f4 ⫽ 10.72兲. The realized levels
based on non-normal latent factors). The two types of non-normal of skewness and kurtosis are well within the boundaries commonly
distributions were included because recent evidence suggests that occurring in psychological assessment (Blanca, Arnau, López-
the performance of factor-based models may vary depending on Montiel, Bono, & Bendayan, 2013; Cain et al., 2017; Micceri, 1989).
whether the non-normality in the observed variables arises from
non-normal correlated variables (such as factors) or from non-
Data Generation and Analysis
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Moshagen, 2015; Foldnes & Grønneberg, 2015; Mair, Satorra, & In total, the design involved 4 (number of observations) ⫻ 3
Bentler, 2012). Normally distributed data were generated using (number of latent factors) ⫻ 4 (factor correlation) ⫻ 3 (number of
Cholesky decomposition. Non-normal random variables Xi were indicators per factor) ⫻ 2 (loading magnitude) ⫻ 2 (cross-load-
generated as the sum of two (standardized) random variables Li, Ei, ings) ⫻ 3 (minor factors) ⫻ 3 (underlying distribution) ⫽ 5,184
and a scalar c, with: conditions. For every condition, 500 independent random samples
were generated, leading to a total of 2,592,000 data sets. The data
Xi ⫽ cLi ⫹ Ei, 1 ⱕ i ⱕ p, (17)
sets were analyzed by all 11 extraction methods under scrutiny.
so that the resulting correlation matrix of Xi is equal to the model Analyses were performed in the statistical computing language
implied correlation matrix on the population level. The random R (R Core Team, 2016). All EFA methods used maximum likeli-
variables L1, . . . , Lp are correlated, whereas E1, . . . , Ep are hood estimation based on the package psych (Revelle, 2015). For
independent and thus uncorrelated. Furthermore, all Li are required the Hull method, we calculated the CFI based on the 2 provided
to be independent from all Ei (1 ⱕ i ⱕ p). by the psych package. We used R code provided by Ruscio and
In the condition with non-normal errors, the independent ran- Roche (2012) for the CD approach and custom implementations of
dom variables Ei are non-normal, Li ⬃ N共0, 1兲, and c ⫽ 2, whereas the KGC, traditional PA, and the EKC.5
the condition with non-normal latent factors incorporated non- We recorded the suggested number of factors for each simulated
normal Li, Ei ⬃ N共0, 1兲, and c ⫽ 2.5. The non-normal Ei and Li data set and each method. Combining this information with the
were in turn generated using the NORTA approach (Cario & population values defining each data set, we determined the bias
Nelson, 1997). As inverse cumulative distribution functions F⫺1 toward over- or underextraction for each data set. Bias was defined
for NORTA, we estimated quantile mixture distributions (Auer- as the number of suggested factors minus the actual number of
swald, 2017) with weights ai, 0 ⱕ ai ⱕ 1, 1 ⱕ i ⱕ 4, for each set factors in the population. Thus, negative values indicate underex-
of four indicators: traction, positive values indicate overextraction, and zero indicates
no bias.
• F⫺1 ⫺1 ⫺1
1 ⫽ a1FX5⫹X3 ⫹ 共1 ⫺ a1兲FN共0,1兲, where X ⬃ N共0, 1兲
• F⫺1
2 ⫽ a F ⫺1
2 Lognormal共0,1兲 ⫹ 共1 ⫺ ⫺1
a2兲FN共0,1兲,
⫺1 ⫺1 ⫺1
• F3 ⫽ a3FX ⫹ 共1 ⫺ a3兲FN共0,1兲, where X has a discrete Results
probability distribution with probability mass function fX
and Due to the complexity of our design, we evaluate the perfor-
mance of extraction criteria separately for designs with (a) only
⫺10 with probability p ⫽ .01
冦
one underlying factor, (b) multiple orthogonal factors, (c) multiple
⫺0.1 with probability p ⫽ .49 correlated factors, and (d) factor models with minor factors. The
f X(x) ⫽
0.1 with probability p ⫽ .49 latter is considered separately because the identification of a minor
10 with probability p ⫽ .01 factor would not necessarily be an argument against the theoretical
validity of a method. In each section, we emphasize three main
• F⫺1 ⫺1 ⫺1
4 ⫽ a4FY ⫹ 共1 ⫺ a4兲FN共0,1兲,
results: First, we report the accuracy and bias across all extraction
where Y is
criteria to determine which conditions are more challenging for
Y⫽ 再 X4 if X ⱖ 0
兹X if X ⬍ 0
EFA in general. Second, we determine which criteria perform
better than others under different conditions. Third, we evaluate
the change in accuracy for each extraction criterion as both N and
and X ⬃ N共0, 1兲. factor determinacy (i.e., loading magnitude and number of indica-
We chose quantile mixture distributions that allow the estima- tors per factor) increases. When reporting on average accuracies
tion of the weights ai so that the univariate kurtosis was 12 for each and biases, we use abbreviations for the different data conditions
resulting indicator. Using different quantile mixture distributions as indicated in Table 2.
ensures that the resulting marginal distributions are different for
each indicator Xi while still exhibiting the same kurtosis. When a 5
The syntax is available under https://osf.io/gqma2/?view_only⫽
factor was based on eight or 12 indicators, each quantile mixture d03efba1fd0f4c849a87db82e6705668.
DETERMINING THE NUMBER OF FACTORS 11
Table 2
Abbreviations for Different Data Conditions
In each section, we report estimates of logistic regressions the large number of regression terms), we applied the logistic
predicting whether each method suggested the correct number of regression to the average accuracy of each cell in our design and
factors to quantify the effects, defined as accuracy. In these logistic calculated the regression coefficients directly. Similarly, we com-
regressions, all applicable conditions were effect-coded with puted a linear regression model predicting the extraction biases
PAPCA-M, N ⫽ 500, three latent variables, orthogonal factors, eight from the average of each cell. The linear model incorporated the
indicators per factor, average loadings of .5, no cross-loadings, no same predictors as the logistic model with the same reference
minor factors, and normal distribution as the reference categories. conditions, but used the extraction bias as the criterion.
Thus, an odds ratio (e, OR) of 2 would indicate that the odds of Finally, we assessed the performance of combination rules as we
identifying the correct number of factors in this specific condition assumed that no one method would outperform every other method
are twice as high than the grand mean and all else being equal. In in all conditions. The combination rules consisted of pairs of
each case, the logistic regression included main effects and all extraction criteria, for which we calculated the degree to which the
possible interactions. Due to the large number of conditions (and methods suggested the same number of factors and the accuracy
1.0
1.0
0.8
0.8
0.8
p(correct)
p(correct)
p(correct)
0.6
0.6
0.6
0.4
0.4
0.4
PAPCA−M PA−R
PAPCA−95 CD
KGCPCA Hull
PAEFA−M EKC
0.2
0.2
0.2
PAEFA−95 SMT
KGCEFA
100 200 500 1000 100 200 500 1000 100 200 500 1000
N N N
Figure 5. Accuracy of factor extraction criteria for unidimensional factor models depending on the number of
indicators per factor and sample size. PAPCA-M ⫽ traditional parallel analysis based on the average of PCA
eigenvalues; PAPCA-95 ⫽ traditional parallel analysis based on the 95th quantile of PCA eigenvalues; KGCPCA ⫽
Kaiser-Guttman Criterion based on PCA eigenvalues; PAEFA-M ⫽ traditional parallel analysis based on the average
of EFA eigenvalues; PAEFA-95 ⫽ traditional parallel analysis based on the 95th quantile of EFA eigenvalues;
KGCEFA ⫽ Kaiser-Guttman Criterion based on EFA eigenvalues; PA-R ⫽ revised parallel analysis; CD ⫽
comparison data; Hull ⫽ Hull method; EKC ⫽ Empirical Kaiser Criterion; SMT ⫽ sequential 2 model tests.
12 AUERSWALD AND MOSHAGEN
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
p(correct)
p(correct)
p(correct)
0.4
0.4
0.4
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
PAPCA−M PA−R
0.2
0.2
0.2
PAPCA−95 CD
KGCPCA Hull
PAEFA−M EKC
PAEFA−95 SMT
0.0
0.0
0.0
KGCEFA
100 200 500 1000 100 200 500 1000 100 200 500 1000
N N N
Figure 6. Accuracy of factor extraction criteria for orthogonal factor models depending on number of indicators per
factor and sample size. PAPCA-M ⫽ traditional parallel analysis based on the average of PCA eigenvalues; PAPCA-95 ⫽
traditional parallel analysis based on the 95th quantile of PCA eigenvalues; KGCPCA ⫽ Kaiser-Guttman Criterion
based on PCA eigenvalues; PAEFA-M ⫽ traditional parallel analysis based on the average of EFA eigenvalues;
PAEFA-95 ⫽ traditional parallel analysis based on the 95th quantile of EFA eigenvalues; KGCEFA ⫽ Kaiser-Guttman
Criterion based on EFA eigenvalues; PA-R ⫽ revised parallel analysis; CD ⫽ comparison data; Hull ⫽ Hull method;
EKC ⫽ Empirical Kaiser Criterion; SMT ⫽ sequential 2 model tests.
given prior agreement. In doing so, we relied only on parameters ORHull ⫽ 55.03), and the EKC 共accEKC ⫽ 100%, OREKC ⫽ 28.16).
known to investigators. These results are in line with the theoretical expectations devel-
oped for the EKC, which predicts EKC to work in all unidimen-
Unidimensional Factor Models sional factor conditions in this simulation. SMT correctly identi-
fied the number of factors in 93% of unidimensional factor models,
Figure 5 shows the average accuracies for unidimensional factor comparable with the average performance of the extraction criteria
models. Overall, most methods displayed high accuracies under scrutiny 共OR2 ⫽ 0.29兲, and showed a slight tendency to
共acc
⫽ 91%兲 and low biases 共bias ⫽ 0.08兲. As expected, the 2 ⫽ 0.04兲. Consistent with previous results, SMT
overextract 共bias
performance of most methods increased with factor determinacy
was less accurate if data were based on non-normal latent factors
共acc
#x⫽12 ⫽ 94%, OR#x⫽12 ⫽ 1.37, acc l⫽.65 ⫽ 95%, ORl⫽.65 ⫽
Lat-NN,2 ⫽ 90%, ORLat-NN,2 ⫽ 0.74兲 as compared with non-
共acc
1.64兲. In contrast, accuracies only marginally increased with sam-
Err-NN,2 ⫽ 95%, ORErr-NN,2 ⫽ 1.15兲 or normal
normal errors 共acc
ple size 共acc
N⫽1,000 ⫽ 92%, ORN⫽1,000 ⫽ 1.08, acc N⫽100 ⫽ 90%兲.
Normal,2 ⫽ 95%兲. As was to be expected,
distributions 共acc
Non-normal distributions did not negatively affect factor ex-
KGCPCA displayed low accuracies 共acc KGCPCA ⫽ 79%,
traction criteria in general 共acc
Lat-NN ⫽ 90%, ORLat-NN ⫽ 0.87,
Err-NN ⫽ 92%, ORErr-NN ⫽ 1.07兲.
acc ORKGCPCA ⫽ 0.08兲 and consistently overestimated the number of
factors 共bias
Four methods displayed very high accuracy for unidimen- KGC PCA ⫽ 0.29, KGC PCA ⫽ 0.21兲. In contrast,
sional factor models: PAPCA-M 共acc PAPCA-M ⫽ 100%兲, PAPCA-95 KGCEFA performed comparatively well 共acc KGCEFA ⫽ 94%,
共acc
PAPCA-95 ⫽ 100%, ORPAPCA-95 ⫽ 45.36兲, Hull 共acc Hull ⫽ 100%, ORKGCEFA ⫽ 0.30兲, but displayed a slight tendency to underex-
Figure 7 (opposite). Accuracy of factor extraction criteria for correlated factor models depending on number of indicators per factor, factor correlation,
and sample size. The top panels display the accuracy for low factor correlations ( ⫽ .25), the middle panels for medium factor correlations ( ⫽ .50), and
the bottom panels for high factor correlations ( ⫽ .75). PAPCA-M ⫽ traditional parallel analysis based on the average of PCA eigenvalues; PAPCA-95 ⫽
traditional parallel analysis based on the 95th quantile of PCA eigenvalues; KGCPCA ⫽ Kaiser-Guttman Criterion based on PCA eigenvalues; PAEFA-M ⫽
traditional parallel analysis based on the average of EFA eigenvalues; PAEFA-95 ⫽ traditional parallel analysis based on the 95th quantile of EFA
eigenvalues; KGCEFA ⫽ Kaiser-Guttman Criterion based on EFA eigenvalues; PA-R ⫽ revised parallel analysis; CD ⫽ comparison data; Hull ⫽ Hull
method; EKC ⫽ Empirical Kaiser Criterion; SMT ⫽ sequential 2 model tests.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
100
100
100
PAEFA−M
KGCEFA
PAPCA−M
KGCPCA
PAEFA−95
PAPCA−95
200
200
200
N
N
N
CD
Hull
EKC
SMT
PA−R
4 Indicator
4 Indicator
4 Indicator
500
500
500
1000
1000
1000
p(correct) p(correct) p(correct)
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
100
100
100
200
200
200
N
N
N
Figure 7 (opposite).
8 Indicator
8 Indicator
8 Indicator
500
500
500
1000
1000
1000
DETERMINING THE NUMBER OF FACTORS
100
100
100
200
200
200
N
N
N
12 Indicator
12 Indicator
12 Indicator
500
500
500
1000
1000
1000
13
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
14
100
100
100
PAEFA−M
KGCEFA
PAPCA−M
KGCPCA
PAEFA−95
PAPCA−95
200
200
200
N
N
N
CD
Hull
EKC
SMT
PA−R
500
500
500
Unidimensional
Unidimensional
Unidimensional
1000
1000
1000
p(correct) p(correct) p(correct)
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
100
100
100
200
200
200
N
N
N
Figure 8 (opposite).
Orthogonal
Orthogonal
Orthogonal
500
500
500
AUERSWALD AND MOSHAGEN
1000
1000
1000
100
100
100
200
200
200
N
N
N
Correlated
Correlated
Correlated
500
500
500
1000
1000
1000
DETERMINING THE NUMBER OF FACTORS 15
tract 共bias #xⱖ8,PAPCA-M ⫽ 99%, acc
very high accuracies 共acc #xⱖ8,PAPCA-95 ⫽
KGCEFA ⫽ ⫺0.03, KGCEFA ⫽ ⫺0.11). All PA methods
based on EFA eigenvalues were inferior to their PCA-based counter- #xⱖ8,Hull ⫽ 100%, acc
100%, acc #xⱖ8,EKC ⫽ 99%兲. PAPCA-M and
PAEFA-M ⫽ 72%, ORPAEFA-M ⫽ 0.05, acc
parts 共acc PAEFA-95 ⫽ PAPCA-95 outperformed all other methods in conditions with four
92%, ORPAEFA-95 ⫽ 0.25兲. In line with our expectations, PA-R #x⫽4,PAPCA-M ⫽ 96%, acc
indicators 共acc #x⫽4,PAPCA-95 ⫽ 95%兲, where
outperformed other PA methods based on EFA eigenvalues Hull and EKC displayed lower hit rates 共acc #x⫽4,Hull ⫽ 90%,
OR#x⫽4,Hull ⫽ 0.26, acc
#x⫽4,EKC ⫽ 87%, OR#x⫽4,EKC ⫽ 0.29兲 and
PA-R ⫽ 94%, ORPA-R ⫽ 0.31兲. However, accuracies were con-
共acc
sistently lower compared with PCA-based PA, especially if the underestimated the number of factors 共bias #x⫽4,Hull ⫽ ⫺0.16,
bias#x⫽4,EKC ⫽ ⫺0.21兲. SMT displayed high accuracies in condi-
number of indicators per factor was small 共acc #x⫽4,PA-R ⫽
tions with larger sample sizes and at least eight indicators per
81%, OR#x⫽4,PA-R ⫽ 0.04兲. Whereas PAEFA-95 slightly underes-
factor 共acc
Nⱖ200,#xⱖ8,SMT ⫽ 92%兲, but exhibited worse results with
timated the number of factors on average 共bias PAEFA-95 ⫽ small sample sizes and short scales 共acc N⫽100,#x⫽4,SMT ⫽ 63%兲
⫺0.05,  PAEFA-95 ⫽ ⫺0.12兲, PAEFA-M regularly overextracted
where the number of factors was underestimated 共bias N⫽100,#x⫽4,SMT ⫽
共bias PAEFA-M ⫽ 0.45,  PAEFA-M ⫽ 0.37兲. CD exhibited lower than ⫺0.42兲. KGCPCA exhibited low accuracies and strongly overesti-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
average overall accuracies 共acc CD ⫽ 81%, ORCD ⫽ 0.09) and mated the number of orthogonal factors 共acc KGCPCA ⫽ 38%,
⫽ 0.23,  ⫽ 0.15兲, especially if the
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 8 (opposite). Accuracy of factor extraction criteria for factor models with or without minor factors. The top panels display the accuracy for no
minor factors, the middle panels for weak minor factors, and the bottom panels for moderate minor factors. PAPCA-M ⫽ traditional parallel analysis based
on the average of PCA eigenvalues; PAPCA-95 ⫽ traditional parallel analysis based on the 95th quantile of PCA eigenvalues; KGCPCA ⫽ Kaiser-Guttman
Criterion based on PCA eigenvalues; PAEFA-M ⫽ traditional parallel analysis based on the average of EFA eigenvalues; PAEFA-95 ⫽ traditional parallel analysis
based on the 95th quantile of EFA eigenvalues; KGCEFA ⫽ Kaiser-Guttman Criterion based on EFA eigenvalues; PA-R ⫽ revised parallel analysis; CD ⫽
comparison data; Hull ⫽ Hull method; EKC ⫽ Empirical Kaiser Criterion; SMT ⫽ sequential 2 model tests.
16 AUERSWALD AND MOSHAGEN
of cross-loadings 共acccross ⫽ 56%, acc ¬cross ⫽ 55%兲, whereas the Accuracies generally decreased in the condition with moderate
underlying distribution had virtually no effect overall minor factors 共acc
mm ⫽ 69%, ORmm ⫽ 0.65兲, especially for or-
共acc
Normal ⫽ 57%, acc Lat-NN ⫽ 54%, acc Err-NN ⫽ 57%兲. thogonal factor models with large sample sizes 共accmm,ort,Nⱖ500 ⫽
When factor correlations were low and the sample size 73%, compared with acc nm,ort,Nⱖ500 ⫽ 89%, ORmm,ort ⫽ 0.65). As
was larger, PAPCA-M and PAPCA-95 retrieved the number of was to be expected, most of these errors were overextractions
factors with very high accuracy 共acc ⫽.25,Nⱖ200,PAPCA-M ⫽ 99%,
共biasmm,ort,Nⱖ500 ⫽ 0.35兲. Compared with the other methods, SMT
⫽.25,Nⱖ200,PAPCA-95 ⫽ 98%兲. Although the accuracy of PAPCA-M
acc and CD were particularly affected by the presence of moderate minor
was lower with fewer indicators per factor and smaller sample factors (ORmm,SMT ⫽ 0.37, ORmm,CD ⫽ 0.72), whereas the EKC,
sizes 共acc
⫽.25,N⫽100,#x⫽4,PAPCA-M ⫽ 72%兲, no other method Hull, and PA appeared to be more robust (ORmm,EKC ⫽ 1.51,
outperformed PAPCA-M under these conditions. In line with ORmm,Hull ⫽ 1.12, for any traditional PA method: ORmm,PA ⱖ
our expectations, most of these errors were underextractions 1.10).
共bias ⫽.25,N⫽100,#x⫽4,PA PCA-M ⫽ ⫺0.27兲.
In conditions where ⫽ .50, no single extraction criterion
Combination Rules
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
KGCPCA and KGCEFA, all methods underestimated the number of The results presented thus far indicate that PAPCA-M and
factors 共bias
⫽.50,KGCPCA ⫽ 3.14, bias⫽.50,KGCEFA ⫽ 0.21, all
PAPCA-95 displayed the highest accuracies across conditions. How-
other bias⫽.50 ⬍ ⫺0.06), again reflecting the effects of lower ever, most of the other methods outperformed PAPCA-M and
factor determinacy. With only four indicators per factor, SMT PAPCA-95 in at least some conditions: EKC and Hull provided very
exhibited the best performance of all methods under scrutiny high hit rates for unidimensional or orthogonal factor models even
共acc
⫽.50,#x⫽4,SMT ⫽ 70%兲. For large sample sizes, SMT displayed when the sample size was small, and SMT and CD were more
moderate to high accuracies 共acc #x⫽4,Nⱖ500,⫽.50,SMT ⫽ 92%兲 and suitable when factors were highly correlated. The question arises
virtually no bias 共bias whether extraction methods can be beneficially used in conjunc-
#x⫽4,Nⱖ500,⫽.50,SMT ⫽ 0.03兲. In conditions
with shorter scales, CD also performed comparatively well tion to determine the number of retained factors. However, a
共acc
⫽.50,#x⫽4,CD ⫽ 61%兲. complication is that investigators obviously have no access to
The accuracy of all extraction criteria was very low information regarding the true number of factors, the correlation
in conditions with highly correlated factors, especially with between the factors, or the average loading magnitude before
few indicators per factor 共OR⫽.75 ⫽ 0.23, OR#x⫽4 ⫽ 0.41, applying EFA and deciding how many factors to extract. We thus
⫽.75,#xⱕ8 ⫽ 17%). Only SMT achieved acceptable accuracies
acc attempted to determine combination rules only considering infor-
in these conditions, provided that the sample size was at least mation that is immediately available to researchers conducting an
500 共acc ⫽.75,#xⱕ8,Nⱖ500,SMT ⫽ 73%, compared with EFA, namely the number of observations, the average correlation
⫽.75,#xⱕ8,Nⱕ200,SMT ⫽ 27%). For scales that consisted of 12
acc among the observed variables, and the results of all factor extrac-
indicators per factor, performance was generally higher if the tion criteria.6 Importantly, note that the resulting overall accuracies
sample size was at least N ⫽ 500 (ORN⫽1,000,#x⫽12 ⫽ 1.27), do not necessarily reflect a method’s true performance in empirical
particularly if PAPCA-M, PAPCA-95, or CD were employed practice, because the conditions realized in our study are not
共acc
⫽.75,#x⫽12,Nⱖ500,PAPCA-M ⫽ 87%, acc ⫽.75,#x⫽12,Nⱖ500,PAPCA-95 ⫽ necessarily equally likely to occur in the real world.7
⫽.75,#x⫽12,Nⱖ500,CD ⫽ 86%兲.
82%, acc The results using various extraction criteria can be combined
KGCPCA showed lower accuracies for increasing number of according to very different schemes. In what follows, we consider
indicators per factor 共acc ⱖ.25,#x⫽12,KGCPCA ⫽ 12%, compared a combination rule based on the idea that evidence to extract a
ⱖ.25,#x⫽4,KGCPCA ⫽ 60%). Irrespective of the under-
with acc particular number of factors is strongest when two criteria agree
lying correlation, most of these errors were overextractions with respect to the suggested number of retained factors.8 Two
共bias criteria are of importance when considering the performance of
ⱖ.25,#x⫽12,KGCPCA ⫽ 6.05, compared with biasⱖ.25,#x⫽4,KGCPCA ⫽
0.25). KGCEFA displayed decreasing performance with increas- combination rules: the coverage rate (cr), which expresses the
ing sample size for high factor correlations and eight indicators degree to which the two extraction criteria involved in a combi-
per factor 共acc ⫽.25,#x⫽8,Nⱕ200,KGCEFA ⫽ 45%, compared with nation rule agree on the number of factors, and the conditional hit
⫽.25,#x⫽8,Nⱖ500,KGCEFA ⫽ 16%). PAEFA-M again showed decreas-
acc rate, which is the hit rate of this particular combination, given that
ing accuracies for increased sample sizes in conditions with weakly they agreed on the suggested number of factors. A satisfactory
correlated factors and fewer indicators 共acc ⫽.25,#x⫽4,Nⱕ200,PAEFA-M ⫽ combination rule requires both a high conditional hit rate and a
50%, compared with acc ⫽.25,#x⫽4,Nⱖ500,PAEFA-M ⫽ 28%). high coverage rate, because this indicates that the combination rule
tends to be both correct and widely applicable. Tables 3 and 4
Models With Minor Factors
6
Figure 8 displays the results for the conditions involving minor We also considered the number of indicators, but found that the
resulting accuracies and coverage rates were comparable for all combina-
factors, summarized for conditions with unidimensional, orthogo- tions of extraction criteria.
nal, and correlated factors. The presence of weak minor factors had 7
We thank Jamie DeCoster and Marcel van Assen for their insightful
no impact on average accuracies 共acc wm ⫽ 73%, acc nm ⫽ 72%兲 comments on this issue.
and did not lead to overextractions 共bias ⫽ ⫺0.12, bias ⫽ 8
We also compared all triplets of factor extraction criteria where the
wm nm
resulting number of retrieved factors was equal to the median of the
⫺0.11兲. The performance of every extraction criterion was virtu- suggested number of each triplet. The highest overall accuracy resulted for
ally identical in the presence of weak minor factors, even if the PAPCA-M, PA-R, and SMT (83%), which was equal to PAPCA-M alone
sample size was very large. (83%).
DETERMINING THE NUMBER OF FACTORS 17
Table 3
Percentage of Correctly Identified Factors for Pairs of Methods Given That Both Methods Agree on the Number of Factors
(Percentage of Cases for Which Pairs of Methods Agree on the Number of Factors) and N ⱕ 200
Extraction method PAPCA-M PAPCA-95 KGCPCA PAEFA-M PAEFA-95 KGCEFA PA-R CD Hull EKC SMT
PAPCA-95 81 (91)
KGCPCA 99 (28) 100 (27)
PAEFA-M 83 (76) 84 (72) 87 (27)
PAEFA-95 81 (87) 79 (87) 98 (26) 81 (76)
KGCEFA 89 (58) 88 (58) 99 (25) 87 (50) 86 (58)
PA-R 83 (80) 80 (81) 99 (26) 83 (70) 78 (85) 84 (59)
CD 83 (74) 82 (74) 92 (26) 82 (65) 82 (71) 87 (51) 82 (68)
Hull 85 (81) 81 (86) 100 (27) 86 (67) 82 (80) 90 (55) 79 (79) 85 (67)
EKC 83 (81) 78 (87) 100 (26) 85 (66) 79 (81) 87 (56) 77 (79) 82 (69) 77 (86)
SMT 91 (73) 91 (72) 95 (30) 90 (64) 91 (70) 90 (54) 91 (68) 86 (68) 93 (67) 91 (67)
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Single method 77 74 32 67 72 55 70 65 70 68 74
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Note. PAPCA-M ⫽ traditional PA with mean PCA eigenvalues; PAPCA-95 ⫽ traditional PA with the 95th quantile of PCA eigenvalues; KGCPCA ⫽
Kaiser-Guttman Criterion with PCA eigenvalues; PAEFA-M ⫽ traditional PA with mean EFA eigenvalues; PAEFA-95 ⫽ traditional PA with the 95th quantile
of EFA eigenvalues; KGCEFA ⫽ Kaiser-Guttman Criterion with EFA eigenvalues; PA-R ⫽ revised PA; CD ⫽ comparison data; Hull ⫽ Hull method;
EKC ⫽ Empirical Kaiser Criterion; SMT ⫽ sequential 2 model tests.
show the coverage rates and conditional hit rates of all pairs of methods disagreed to evaluate whether there is an optimal strategy
extraction criteria, separately for small (N ⱕ 200) and large (N ⱖ in situations where the proposed combination rules provide con-
500) sample sizes. As can be seen, all methods exhibited higher flicting results (see Table 5). As can be seen, these conditions were
accuracies when used in conjunction, thereby illustrating the ben- associated with low hit rates for all methods.9 Particularly low hit
efit of combining the information provided by various criteria. rates were evident for small sample sizes (N ⱕ 200), where the
Combination rules that involved SMT and either one of overall accuracy of the considered extraction criteria was ⱕ44%.
PAPCA-M, PAPCA-95, Hull, or the EKC provided both high Under these conditions, the highest hit rate can be obtained by
accuracies and relatively high coverage rates 共acc SMT,PAPCA-M ⫽ relying on any of the variants of traditional PA, CD, or PA-R.
95%, crSMT,PAPCA-M ⫽ 74%, accSMT,PAPCA-95 ⫽ 95%, cr SMT,PAPCA-95 ⫽ Clearly, however, determining the number of factors to retain is
SMT,Hull ⫽ 97%, cr
72%, acc SMT,Hull ⫽ 68%, acc
SMT,EKC ⫽ 95%, difficult under these conditions, so increasing the sample size
SMT,EKC ⫽ 68%). In large sample sizes and in conditions with
cr would be advisable. Larger samples sizes not only increase the
above-average correlations among the observed variables (r̄ ⫽ accuracy and coverage rate of the proposed combination rules, but
.23), all hit rates of combination rules involving SMT and one also improve hit rates of single extraction criteria when the com-
of the aforementioned criteria were close to 100% bination rules provide conflicting results. In particular, in situa-
共acc
SMT,PAPCA-M,r⬎r ⫽ 99%. SMT,PAPCA-95,r⬎r ⫽ 99%
acc tions where SMT and either PAPCA-M, PAPCA-95, Hull, or the EKC
SMT,Hull,r⬎r ⫽ 100% acc
acc SMT,EKC,r⬎r ⫽ 100% For larger sam- disagree, the results of PAPCA-M, PAPCA-95, CD, or the EKC can
ples, combinations of CD with either PAPCA-M, PAPCA-95, Hull, or be used to inform factor extraction 共acc Nⱖ500,PAPCA-M ⱖ 63%,
the EKC also resulted in accuracies close to 100% and high Nⱖ500,PAPCA-95 ⱖ 60%, acc
acc Nⱖ500,CD ⱖ 58%, acc Nⱖ500,EKC ⱖ
coverage rates. Other combination rules that utilized criteria 51%兲.
relying on similar methods (e.g., PAPCA-M and PAPCA-95)
achieved higher coverage rates 共cr PAPCA-M,PAPCA-95 ⫽ 94%兲 at the
expense of overall accuracy 共acc PAPCA-M,PAPCA-95 ⫽ 86%兲. Across Discussion
conditions, combination rules involving KGCPCA consistently dis-
played very high hit rates even in small sample sizes. However, Psychological researchers often need to determine the number
these hit rates were accompanied by exceptionally low coverage of latent factors underlying multiple observed variables. EFA is
rates 共cr
Nⱕ200,PAPCA-95,KGCPCA ⫽ 27%兲, indicating that KGCPCA only often employed to this end. An important issue when performing
identifies the correct number of factors if factor determinacy is an EFA is the number of latent factors required to adequately
high. Thus, whereas agreement between KGCPCA and (almost) any describe the covariance structure among the observed data. The
other extraction criterion allows for high confidence that the num- present study subjected a large number of traditional and modern
ber of suggested factors is correct, combinations involving extraction criteria to a critical test by examining their performance
KGCPCA do not cover a sufficient number of cases to be consid- under data conditions that are often encountered in psychological
ered a useful general combination rule. Taken together, combining research, systematically varying the number of factors, the factor
SMT and either one of PAPCA-M, PAPCA-95, Hull, or the EKC correlations, the number of indicators, the magnitude of loadings,
consistently provided excellent hit rates (beyond what can be the underlying multivariate distribution of manifest variables, as
achieved by considering any one criterion in isolation) and covered well as the presence of cross-loadings and minor factors.
a relatively wide range of conditions.
While concurrence between SMT and either PAPCA-M, PAPCA-95, 9
The average reliability in these conditions was ⫽ .77 and close to the
Hull, or the EKC reliably indicated that the suggested number of average reliability of scales in empirical research (e.g. Fabrigar et al.,
factors is correct, we also examined the conditions in which these 1999).
18 AUERSWALD AND MOSHAGEN
Table 4
Percentage of Correctly Identified Factors for Pairs of Methods Given That Both Methods Agree on the Number of Factors
(Percentage of Cases for Which Pairs of Methods Agree on the Number of Factors) and N ⱖ 500
Extraction method PAPCA-M PAPCA-95 KGCPCA PAEFA-M PAEFA-95 KGCEFA PA-R CD Hull EKC SMT
PAPCA-95 91 (97)
KGCPCA 100 (60) 100 (60)
PAEFA-M 93 (63) 92 (63) 91 (46)
PAEFA-95 94 (77) 93 (77) 95 (55) 79 (72)
KGCEFA 92 (79) 92 (80) 100 (49) 90 (58) 86 (77)
PA-R 92 (78) 91 (79) 95 (55) 82 (64) 81 (82) 85 (76)
CD 97 (75) 98 (74) 94 (54) 86 (60) 91 (68) 98 (61) 92 (66)
Hull 94 (88) 92 (89) 100 (58) 92 (61) 91 (77) 89 (79) 87 (79) 98 (68)
EKC 93 (91) 91 (93) 100 (58) 92 (62) 91 (78) 90 (81) 90 (78) 98 (70) 91 (89)
SMT 99 (74) 99 (73) 93 (59) 87 (61) 91 (70) 99 (61) 91 (69) 92 (72) 100 (69) 100 (70)
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Single method 90 89 63 60 74 74 73 77 82 85 80
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Note. PAPCA-M ⫽ traditional PA with mean PCA eigenvalues; PAPCA-95 ⫽ traditional PA with the 95th quantile of PCA eigenvalues; KGCPCA ⫽
Kaiser-Guttman Criterion with PCA eigenvalues; PAEFA-M ⫽ traditional PA with mean EFA eigenvalues; PAEFA-95 ⫽ traditional PA with the 95th quantile
of EFA eigenvalues; KGCEFA ⫽ Kaiser-Guttman Criterion with EFA eigenvalues; PA-R ⫽ revised PA; CD ⫽ comparison data; Hull ⫽ Hull method;
EKC ⫽ Empirical Kaiser Criterion; SMT ⫽ sequential 2 model tests.
The performance of all extraction criteria varied considerably Factor Determinacy and Sample Size
depending on factor determinacy and sample size. In unidimen-
The present study showed that the performance of all extraction
sional and orthogonal factor designs, traditional PA (based on the
criteria strongly depends on the signal-to-noise ratio in the data. In
sample correlation matrix and either the mean or 95th percentile of
line with previous simulation studies, extraction criteria generally
eigenvalues), Hull, and the EKC consistently retrieved the correct
deteriorated in conditions where the (expected) explained variance
number of factors. For highly correlated scales, the accuracies of of a common factor was low, either due to low loading magni-
all extraction criteria was lower due to frequent underextractions, tudes, high factor correlations, or few indicators per factor. These
which is consistent with theoretical expectations regarding sample conditions correspond to common factor models in which the
and population eigenvalues (Braeken & van Assen, 2017). In these eigenvalues associated with true underlying factors are numeri-
conditions, PAPCA-M and PAPCA-95 displayed high accuracies, cally closer to the remaining eigenvalues (Braeken & van Assen,
provided there was a sufficiently large number of indicators per 2017). Because most extraction criteria rely on the pattern of
factor and larger sample sizes. Unlike all other approaches, CD sample eigenvalues to determine how many factors to extract,
and SMT performed comparatively well in conditions with short, conditions with low factor determinacy are necessarily challeng-
highly correlated scales. ing, especially if the sample size is small. At the same time, our
Table 5
Percentage of Correctly Identified Number of Factors Given That SMT and Either PAPCA-M, PAPCA-95, Hull, or the EKC Provide
Different Solutions
N ⱕ 200 N ⱖ 500
SMT disagree with SMT disagree with
Extraction method PAPCA-M PAPCA-95 Hull EKC PAPCA-M PAPCA-95 Hull EKC
PAPCA-M 37 39 43 44 63 64 67 66
PAPCA-95 34 31 35 36 62 60 64 63
KGCPCA 16 17 16 18 42 42 40 43
PAEFA-M 37 39 42 42 34 33 34 33
PAEFA-95 36 35 39 38 46 45 44 42
KGCEFA 25 25 26 27 55 53 53 50
PA-R 37 37 40 39 41 40 42 40
CD 34 35 37 38 58 59 62 61
Hull 28 26 22 28 52 50 44 49
EKC 25 23 24 20 59 57 57 51
SMT 26 30 34 38 26 29 37 35
Note. PAPCA-M ⫽ traditional PA with mean PCA eigenvalues; PAPCA-95 ⫽ traditional PA with the 95th quantile of PCA eigenvalues; KGCPCA ⫽
Kaiser-Guttman Criterion with PCA eigenvalues; PAEFA-M ⫽ traditional PA with mean EFA eigenvalues; PAEFA-95 ⫽ traditional PA with the 95th quantile
of EFA eigenvalues; KGCEFA ⫽ Kaiser-Guttman Criterion with EFA eigenvalues; PA-R ⫽ revised PA; CD ⫽ comparison data; Hull ⫽ Hull method;
EKC ⫽ Empirical Kaiser Criterion; SMT ⫽ sequential 2 model tests.
DETERMINING THE NUMBER OF FACTORS 19
results also indicate that lower factor determinacy adversely af- Garrido et al., 2013; Glorfeld, 1995; Peres-Neto et al., 2005). This
fects extraction criteria such as SMT to a lesser extent, which are is particularly noteworthy concerning the Hull method and the
based on the fit of a structural equation model. EKC, given that these methods explicitly assume a normal distri-
The beneficial effect of cross-loadings, implemented as addi- bution (Braeken & van Assen, 2017; Lorenzo-Seva et al., 2011).
tional standardized loadings on a different latent factor, on the SMT was the only extraction criterion that was adversely affected
accuracy of factor recovery can also be explained by increased by non-normality in the latent variables in some conditions, con-
factor determinacy. The additional loadings increased the ex- sistent with evidence from confirmatory factor analysis (e.g., Au-
plained variance of a common factor, making the pattern of sample erswald & Moshagen, 2015; Foldnes & Grønneberg, 2015; Mair et
eigenvalues more distinct and easier to identify. Consequently, the al., 2012). In the present study, we only considered SMT using
results suggest that the investigated extraction criteria can be uncorrected ML-based 2, so a natural extension is to investigate
safely applied if cross-loadings are present. SMT under non-normality with appropriate corrections, such as
Given the overall advantage of higher factor determinacy and the Satorra-Bentler correction (Satorra & Bentler, 1994). Overall,
larger samples, it is also noteworthy that some extraction criteria however, the results of the present study indicate that the investi-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
displayed decreased accuracies under these conditions. The KGC gated extraction criteria can be applied safely under a wide range
This document is copyrighted by the American Psychological Association or one of its allied publishers.
severely overextracted the number of factors if unidimensional or of distributional properties of the observed data.
orthogonal factors were indicated by a large number of items (see
also Cattell & Vogelmann, 1977; Hakstian et al., 1982; Zwick &
Issues in Implementing PA
Velicer, 1986). The overextraction bias occurs because the KGC
does not take sampling variations into account, which can lead to When PA is used to determine the number of factors, two
eigenvalues greater than one even in the absence of common choices need to be made. The first choice pertains to how to
factors. CD, which performed well in challenging conditions in- summarize the random reference eigenvalues to which the empir-
volving highly correlated factors, consistently overestimated the ical eigenvalues are compared. Our results show that both variants
number of factors in unidimensional models with four indicators, of traditional PA displayed very similar hit rates for unidimen-
especially if the sample was large. Similarly, variants of parallel sional, orthogonal, and correlated factor designs. Given the disad-
analysis that relied upon the eigenvalues of an EFA model vantages of average-based PA in conditions with uncorrelated
(PAEFA-M, PAEFA-95, and PA-R) did not improve with increasing variables and the comparable hit rates otherwise, we recommend
sample size in conditions with fewer indicators per factor. As a PA be used with the 95th percentile as a reference value.
result, the aforementioned methods should be applied in conjunc- The second choice pertains to the matrix from which the em-
tion with other extraction criteria to protect against potential over- pirical and sampled eigenvalues are derived. The eigenvalues can
extraction biases. be obtained either from the correlation matrix, corresponding to a
PCA, or from a matrix in which the diagonal of the correlation
matrix is replaced with the item communalities estimated by a
Minor Factors
common factor model. Because the primary purpose of empirical
The performance of all extraction criteria under scrutiny was studies often is to uncover a set of latent variables that explain
virtually unaffected by the presence of weak minor factors, covariations among observed variables, the common factor model
whereas moderate minor factors primarily affected SMT and CD is usually recommended over PCA (e.g., Fabrigar et al., 1999;
in conditions with large sample sizes and orthogonal factors. These McArdle, 1990; Widaman, 1993). Traditional PA, on the other
results are in line with the expectations derived by Braeken and hand, typically uses the eigenvalues of the correlation matrix as a
van Assen (2017) in that overextractions are likely to occur only if criterion, which could be considered inconsistent, because EFA is
the explained variance of a minor factor sufficiently changes the derived from the common factor model (Ford, MacCallum, & Tait,
pattern of population eigenvalues. Consequently, the methods that 1986; Humphreys & Montanelli, 1975). However, a common
display higher performance when factors are highly correlated, factor model determines both the eigenvalues used in a PCA and
such as SMT and CD, were also more strongly affected by minor EFA. Indeed, Braeken and van Assen (2017) derived the distribu-
factors. Future research should consider whether the results extend tion of eigenvalues of the correlation matrix for normally distrib-
to other sources of systematic variance as well, such as correlated uted observed variables from a common factor model. In contrast,
unique factors. We expect that, similar to minor factors, correlated the eigenvalues of a common factor model additionally depend on
unique factors affect the hit rates of extraction criteria only if they the method that estimates the communalities. Our simulations
represent a substantial proportion of the systematic variation, for suggest that PA should be based on the PCA eigenvalues (see also
example in conditions with few indicators. Garrido et al., 2013). In conditions with few indicators per factor,
PAEFA displayed lower hit rates than PAPCA and did not consis-
tently improve with sample size. We see two factors that contribute
Non-Normal Multivariate Distributions
to this finding. First, the comparison samples for PA seem inap-
Previous studies investigating non-normality only evaluated tra- propriate for EFA eigenvalues. Whereas EFA assumes common
ditional PA and only varied the marginal distributions without variance among the observed variables, the variables of the com-
considering other extraction criteria or manipulating the multivar- parison samples are perfectly uncorrelated in the population. This
iate distribution itself. The present study showed that most extrac- leads to communality estimates close to zero for the comparison
tion criteria were highly robust under commonly observed samples, but not for the empirical sample. The EFA eigenvalues,
amounts of skewness and kurtosis in the manifest variables, which are based on the correlation matrix with communalities on
thereby replicating and extending previous results (Dinno, 2009; the diagonal, are adversely affected by the inappropriate compar-
20 AUERSWALD AND MOSHAGEN
ison. Second, while PCA tends to overestimate the explained variables translate to changes in the distribution of the observed
variance of common factors (Widaman, 1993), this overestimation variables in a nontrivial way. Specifically, the same values of
affects both the empirical and simulated sample alike. As we skewness and kurtosis in the observed ordinal variables can result
discussed in the section on general issues in factor extraction, the from different skewness and kurtosis in the underlying continuous
deciding factor appears to be the numerical difference between the variables depending on the thresholds chosen to obtain the ordinal
last eigenvalue associated with a true factor and the first remaining variables. Nevertheless, future studies should also examine the
eigenvalue, which is the same for EFA and PCA population performance of other factor extraction criteria for ordinal or di-
eigenvalues. Note that we only used ML to estimate communalities chotomous observed variables.
for variants of PA based on a common factor model. Future studies A second limitation pertains to the selection of the extraction
should systematically vary the estimation procedure, including criteria examined in the simulations. While we included a number
minimum rank factor analysis (Garrido et al., 2013) or estimate of modern techniques that have not yet been thoroughly investi-
communalities as multiple R2 between one variable and the re- gated, we did not consider methods that have been shown to be
maining variables (Crawford et al., 2010). inferior to traditional PA in previous simulation studies (Peres-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
The inferiority of a common factor PA also explains why Neto et al., 2005; Raîche et al., 2013; Ruscio & Roche, 2012;
This document is copyrighted by the American Psychological Association or one of its allied publishers.
revised PA exhibited lower accuracies than traditional PAPCA in Zwick & Velicer, 1986). These include fit indices of different
our study. Following the recommendations of Green et al. (2012), structural equation models (e.g., Ruscio & Roche, 2012), the
we implemented revised PA based on the common factor model. minimum average partial method (Velicer, 1976), or several non-
This difference likely resulted in the relatively low overall perfor- graphical solutions for Cattel’s Scree test (e.g., Raîche et al.,
mance of revised PA despite its theoretical advantages. Future 2013). Overall, there are more than 40 criteria to assess the
research should consider whether implementing revised PA based dimensionality of observed variables (Peres-Neto et al., 2005;
on PCA eigenvalues can be used to improve performance of this Raîche et al., 2013; Ruscio & Roche, 2012) and our selection was
approach. based on their relevance for factor analysis in psychology and
performance in previous simulation studies. However, it might be
possible that a criterion not considered here may improve com-
Combination Rules
bined hit rates when used in conjunction with another criterion.
Given that no single approach displayed the highest accuracy in Finally, the extraction criteria investigated in this study can be
all conditions, we investigated whether performance can be max- implemented in different ways. For example, the Hull method eval-
imized by jointly considering the outcomes of different extraction uates solutions based on a certain goodness-of-fit index, so different
criteria. Indeed, performance increased considerably when multi- fit-indices may lead to different results. In line with the recommen-
ple factor extraction criteria were used simultaneously. When SMT dation by Lorenzo-Seva et al. (2011), we implemented the Hull
and either PAPCA-M, PAPCA-95, Hull, or the EKC agree (which method using the CFI. However, performance might be improving
occurred in 74%, 72%, 68%, and 68% of all simulated data sets, when relying on another index (Moshagen & Auerswald, 2018). In
respectively), the correct number of factors is consistently identi- particular, we would recommend studying the behavior of the Hull
fied. In the data sets where these methods disagreed, confident method with goodness-of-fit indices that do not account for model
judgments could only be made when sample sizes were large (N ⱖ parsimony, such as the SRMR or McDonald’s mc (McDonald, 1989).
500). In these cases, CD, the EKC, or one of the variants of Furthermore, the chi-square used in the SMT was directly derived
traditional PAPCA correctly identified the number of factors with from ML estimation. Given that the chi-square statistic is inflated in
moderate accuracies, whereas all other approaches performed the presence of non-normal data, it would be interesting to investigate
more poorly. whether robust statistics (such as the SB-correction; Satorra &
SMT performed especially well in conditions with highly cor- Bentler, 1994) improve performance in conditions involving non-
related factor models but tended to overextract in the presence of normality. Likewise, the tendency of SMT to overextract in the
moderate minor factors and displayed lower performance when presence of minor factors might be counteracted by altering the testing
data were not normally distributed. As such, a useful complement strategy, such as using balanced error probabilities (Moshagen &
to SMT would be an extraction criterion that does not tend to Erdfelder, 2016) or equivalence testing (Yuan, Chan, Marcoulides, &
overextract and is robust against both minor factors and non- Bentler, 2016).
normality. PAPCA-M performed well when data were non-normal
or based on minor factors, but was reported to overextract in
Conclusion
previous simulation studies (Glorfeld, 1995). Because PAPCA-95,
Hull, and the EKC were both robust and did not overextract, we We investigated the performance of various criteria to determine
recommend combination rules consisting of SMT and one of the the number of retained factors in EFA. Our results indicate that the
aforementioned. highest accuracy can be obtained when considering the outcomes
of several criteria simultaneously. In particular, we recommend
that investigators compare the results of sequential 2 model tests
Limitations
and either PAPCA-95, Hull, or the EKC. If both methods suggest the
The results of Monte Carlo studies should only be interpreted same number of factors, this most often reflects the correct number
within the bounds of the realized conditions. One limitation of our of underlying factors. If the methods disagree, CD, the EKC, or
study is that we only considered continuous response variables one of the variants of traditional PAPCA are viable extraction
because we were also interested in the effect of non-normality in criteria provided that the sample is large. However, these condi-
the observed variables. Changes in the distribution of the latent tions are generally associated with greater difficulties in identify-
DETERMINING THE NUMBER OF FACTORS 21
ing the number of factors for all approaches we investigated, so Cattell, R. B. (1966). The scree test for the number of factors. Multivar-
larger sample sizes are required to make confident decisions. In the iate Behavioral Research, 1, 245–276. http://dx.doi.org/10.1207/
suggested decision rule, disagreement between SMT and either s15327906mbr0102_10
PAPCA-95, Hull, or the EKC can thus serve as an indicator that the Cattell, R. B., & Vogelmann, S. (1977). A comprehensive trial of the scree
and KG criteria for determining the number of factors. Multivariate
latent structure will be difficult to uncover.
Behavioral Research, 12, 289 –325. http://dx.doi.org/10.1207/
The present study also investigated the effects of minor factors,
s15327906mbr1203_2
cross-loadings, and non-normal distributions. Minor factors ad- Ceulemans, E., & Kiers, H. A. (2006). Selecting among three-mode prin-
versely affected extraction criteria only if they explained at least a cipal component models of different types and complexities: A numer-
moderate amount of variance. Cross-loadings increased the ex- ical convex hull based method. British Journal of Mathematical and
plained variance of true factors and therefore tended to increase the Statistical Psychology, 59, 133–150. http://dx.doi.org/10.1348/
performance of extraction criteria. Non-normal distributions that 000711005X64817
were based on non-normal latent distributions lead to decreasing Cho, S.-J., Li, F., & Bandalos, D. (2009). Accuracy of the parallel analysis
accuracy for SMT, while all other extraction criteria were virtually procedure with polychoric correlations. Educational and Psychological
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
cational and Psychological Measurement, 55, 377–393. http://dx.doi Linn, R. L. (1968). A Monte Carlo approach to the number of factors
.org/10.1177/0013164495055003002 problem. Psychometrika, 33, 37–71.
Green, S. B., Levy, R., Thompson, M. S., Lu, M., & Lo, W.-J. (2012). A Lorenzo-Seva, U., Timmerman, M. E., & Kiers, H. A. (2011). The Hull
proposed solution to the problem with using completely random data to method for selecting the number of common factors. Multivariate Be-
assess the number of factors with parallel analysis. Educational and havioral Research, 46, 340 –364. http://dx.doi.org/10.1080/00273171
Psychological Measurement, 72, 357–374. http://dx.doi.org/10.1177/ .2011.564527
0013164411422252 Mair, P., Satorra, A., & Bentler, P. M. (2012). Generating nonnormal multi-
Green, S. B., Thompson, M. S., Levy, R., & Lo, W.-J. (2015). Type I and variate data using copulas: Applications to SEM. Multivariate Behavioral
type II error rates and overall accuracy of the revised parallel analysis Research, 47, 547–565. http://dx.doi.org/10.1080/00273171.2012.692629
method for determining the number of factors. Educational and Psycho- Marčenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for
logical Measurement, 75, 428 – 457. http://dx.doi.org/10.1177/ some sets of random matrices. Mathematics of the USSR-Sbornik, 1,
0013164414546566 457– 483. http://dx.doi.org/10.1070/SM1967v001n04ABEH001994
Guttman, L. (1954). Some necessary conditions for common-factor anal- McArdle, J. J. (1990). Principles versus principals of structural factor
ysis. Psychometrika, 19, 149 –161. http://dx.doi.org/10.1007/ analyses. Multivariate Behavioral Research, 25, 81– 87. http://dx.doi
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
BF02289162 .org/10.1207/s15327906mbr2501_10
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Hakstian, A. R., Rogers, W. T., & Cattell, R. B. (1982). The behavior of McDonald, R. P. (1989). An index of goodness-of-fit based on noncen-
number-of-factors rules with simulated data. Multivariate Behavioral trality. Journal of Classification, 6, 97–103. http://dx.doi.org/10.1007/
Research, 17, 193–219. http://dx.doi.org/10.1207/s15327906mbr1702_3 BF01908590
Harman, H. H. (1976). Modern factor analysis (3rd ed.). Illinois: Univer- McDonald, R. P. (1999). Test theory: A unified treatment. Hillsdale, NJ:
sity of Chicago Press. Erlbaum.
Hayashi, K., Bentler, P. M., & Yuan, K.-H. (2007). On the likelihood ratio Micceri, T. (1989). The unicorn, the normal curve, and other improbable
test for the number of factors in exploratory factor analysis. Structural creatures. Psychological Bulletin, 105, 156 –166.
Equation Modeling, 14, 505–526. http://dx.doi.org/10.1080/107055107 Moshagen, M., & Auerswald, M. (2018). On congruence and incongruence
01301891 of measures of fit in structural equation modeling. Psychological Meth-
Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention ods, 23, 318 –336. http://dx.doi.org/10.1037/met0000122
decisions in exploratory factor analysis: A tutorial on parallel analysis. Moshagen, M., & Erdfelder, E. (2016). A new strategy for testing structural
Organizational Research Methods, 7, 191–205. http://dx.doi.org/10 equation models. Structural Equation Modeling: A Multidisciplinary
.1177/1094428104263675 Journal, 23, 54 – 60. http://dx.doi.org/10.1080/10705511.2014.950896
Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor analysis Mulaik, S. A. (2010). Foundations of factor analysis (2nd ed.). Boca
in published research: Common errors and some comment on improved Raton, FL: Chapman & Hall/CRC.
practice. Educational and Psychological Measurement, 66, 393– 416. Peres-Neto, P. R., Jackson, D. A., & Somers, K. M. (2005). How many
http://dx.doi.org/10.1177/0013164405282485 principal components? Stopping rules for determining the number of
Horn, J. L. (1965). A rationale and test for the number of factors in factor non-trivial axes revisited. Computational Statistics & Data Analysis, 49,
analysis. Psychometrika, 30, 179 –185. http://dx.doi.org/10.1007/ 974 –997. http://dx.doi.org/10.1016/j.csda.2004.06.015
BF02289447 Raîche, G., Riopel, M., & Blais, J.-G. (2006, June). Nongraphical solutions
Hubbard, R., & Allen, S. J. (1987). An empirical comparison of alternative for the Cattell’s scree test. Paper presented at the annual meeting of the
methods for principal component extraction. Journal of Business Re- Psychometric Society, Montreal, Quebec, Canada.
search, 15, 173–190. http://dx.doi.org/10.1016/0148-2963(84)90047-X Raîche, G., Walls, T. A., Magis, D., Riopel, M., & Blais, J.-G. (2013).
Humphreys, L. G., & Ilgen, D. R. (1969). Note on a criterion for the Non-graphical solutions for Cattell’s scree test. Methodology, 9, 23–29.
number of common factors. Educational and Psychological Measure- http://dx.doi.org/10.1027/1614-2241/a000051
ment, 29, 571–578. http://dx.doi.org/10.1177/001316446902900303 R Core Team. (2016). R: A language and environment for statistical
Humphreys, L. G., & Montanelli, R. G. (1975). An investigation of the computing [Computer software manual]. Vienna, Austria: Author. Re-
parallel analysis criterion for determining the number of common fac- trieved from https://www.R-project.org/
tors. Multivariate Behavioral Research, 10, 193–205. http://dx.doi.org/ Revelle, W. (2015). psych: Procedures for psychological, psychometric,
10.1207/s15327906mbr1002_5 and personality research (R package version 1.5.8) [Computer software
IBM Corp. (2015). IBM SPSS Statistics for Windows, Version 23.0. Ar- manual]. Retrieved from http://CRAN.R-project.org/package⫽psych
monk, NY: Author. Ruscio, J., & Kaczetow, W. (2008). Simulating multivariate nonnormal
Jackson, D. L., Gillaspy, J. A., & Purc-Stephenson, R. (2009). Reporting practices data using an iterative algorithm. Multivariate Behavioral Research, 43,
in confirmatory factor analysis: An overview and some recommendations. 355–381. http://dx.doi.org/10.1080/00273170802285693
Psychological Methods, 14, 6. http://dx.doi.org/10.1037/a0014694 Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain
Jöreskog, K. G. (2007). Factor analysis and its extensions. In R. Cudeck & in an exploratory factor analysis using comparison data of known
R. C. MacCallum (Eds.), Factor analysis at 100: Historical develop- factorial structure. Psychological Assessment, 24, 282–292. http://dx.doi
ments and future directions (pp. 47–77). Mahwah, NJ: Erlbaum. .org/10.1037/a0025697
Kaiser, H. F. (1960). The application of electronic computers to factor Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and
analysis. Educational and Psychological Measurement, 20, 141–151. standard errors in covariance structure analysis. In A. Eye & C. C. Clogg
http://dx.doi.org/10.1177/001316446002000116 (Eds.), Latent variable analysis: Applications for developmental re-
Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four search (pp. 399 – 419). Thousand Oaks, CA: Sage.
commonly reported cutoff criteria: What did they really say? Organiza- Schmitt, T. A. (2011). Current methodological considerations in explor-
tional Research Methods, 9, 202–220. http://dx.doi.org/10.1177/ atory and confirmatory factor analysis. Journal of Psychoeducational
1094428105284919 Assessment, 29, 304 –321. http://dx.doi.org/10.1177/0734282911406653
Lawley, D. N. (1940). The estimation of factor loadings by the method of Schönemann, P. H., & Wang, M.-M. (1972). Some new results on factor
maximum likelihood. Proceedings of the Royal Society of Edinburgh, indeterminacy. Psychometrika, 37, 61–91. http://dx.doi.org/10.1007/
60, 64 – 82. http://dx.doi.org/10.1017/S037016460002006X BF02291413
DETERMINING THE NUMBER OF FACTORS 23
Steger, M. F. (2006). An illustration of issues in factor extraction and Widaman, K. F. (1993). Common factor analysis versus principal compo-
identification of dimensionality in psychological assessment data. Jour- nent analysis: Differential bias in representing model parameters? Mul-
nal of Personality Assessment, 86, 263–272. http://dx.doi.org/10.1207/ tivariate Behavioral Research, 28, 263–311. http://dx.doi.org/10.1207/
s15327752jpa8603_03 s15327906mbr2803_1
Stevens, J. P. (2009). Applied multivariate statistics for the social sciences. Wood, J. M., Tataryn, D. J., & Gorsuch, R. L. (1996). Effects of under- and
New York, NY: Taylor & Francis. overextraction on principal axis factor analysis with varimax rotation.
Thurstone, L. L. (1947). Multiple factor analysis. Illinois: University of Psychological Methods, 1, 354 –365. http://dx.doi.org/10.1037/1082-
Chicago Press. 989X.1.4.354
Turner, N. E. (1998). The effect of common variance and structure pattern Worthington, R. L., & Whittaker, T. A. (2006). Scale development re-
on random data eigenvalues: Implications for the accuracy of parallel
search: A content analysis and recommendations for best practices. The
analysis. Educational and Psychological Measurement, 58, 541–568.
Counseling Psychologist, 34, 806 – 838. http://dx.doi.org/10.1177/
http://dx.doi.org/10.1177/0013164498058004001
0011000006288127
Velicer, W. F. (1976). Determining the number of components from the
Yuan, K.-H., Chan, W., Marcoulides, G. A., & Bentler, P. M. (2016).
matrix of partial correlations. Psychometrika, 41, 321–327. http://dx.doi
Assessing structural equation models by equivalence testing with ad-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
.org/10.1007/BF02293557
Velicer, W. F., Eaton, C. A., & Fava, J. L. (2000). Construct explication justed fit indexes. Structural Equation Modeling: A Multidisciplinary
This document is copyrighted by the American Psychological Association or one of its allied publishers.
through factor or component analysis: A review and evaluation of Journal, 23, 319 –330. http://dx.doi.org/10.1080/10705511.2015
alternative procedures for determining the number of factors or compo- .1065414
nents. In R. D. Goffin & E. Helmes (Eds.), Problems and solutions in Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for
human assessment: Honoring Douglas N. Jackson at seventy (pp. 41– determining the number of components to retain. Psychological Bulletin,
71). Boston, MA: Kluwer Academic. 99, 432– 442. http://dx.doi.org/10.1037/0033-2909.99.3.432
Appendix
Eigenvalues and Explained Variance
The goal of this section is to explain the correspondence be- ⫽ E(XXT) ⫺ ⌬ (23)
tween explained variance of the common factor model and the
eigenvalues of the matrix of correlations RC with communalities ⫽ E共 T
XCXC 兲. (24)
on the diagonal, assuming that the (hypothetical) data fit the We denote the values of XC associated with observations in X
common factor model perfectly. Note that the explained variance as xCk, 1 ⱕ k ⱕ N for N observations and try to find factors that
in a PCA can be similarly derived if we used the correlation matrix linearly explain variations in XC. This is equivalent to finding lines
R instead of RC. on which we project each xCk such that the variance of the length
Suppose we have standardized observed variables X ⫽ (x1, . . . , of projections is maximal (and the variance of the distances to the
xm)T with correlation matrix R from which we partial out the unique- line is minimal). A line is a set of points that satisfy
ness ⌬. We denote the resulting variables as XC ⫽ (xC1, . . . , xCm)T,
so that x ⫽ ␣, (25)
xi ⫽ xCi ⫹ ␦i, 1 ⱕ i ⱕ p, (18) where is a vector of length p and ␣ 僆 ⺢. The length of the
which implies projection of xCk on this line is
E共XXT) ⫽ E共XCXC
T
兲 ⫹ ⌬. (19) 具xCk, 典
. (26)
㛳㛳
X is standardized, so
Note that the length of does not change the line in Equation 25,
E(XXT) ⫽ R, (20) so that we can set 㛳㛳 ⫽ 1 without loss of generality. The length of
with projections then is 具xCk, 典. In order to maximize the variance of
具xCk, 典, we first obtain the average of the projections. In this step,
R ⫽ RC ⫹ ⌬. (21) we utilize that the vector is part of an orthonormal basis of our
As can be seen from Equations 19 and 21, the covariance matrix vector space ⺢p. An orthonormal basis is a set of p linearly
of XC is equal to RC, because independent vectors (each of length 1), that can express any vector
ⴱ 僆 ⺢p as a linear combination of elements of the basis. We
RC ⫽ R ⫺ ⌬ (22) denote the orthogonal basis that contains as
(Appendix continues)
24 AUERSWALD AND MOSHAGEN
兵, ⬘2 , . . . , ⬘p 其 . (27) The first eigenvalue corresponds to the explained variance if we
choose the eigenvector e1 as a projection line. Suppose we choose
We can rewrite every xCk as
any other vector as a projection line. The eigenvectors e1, . . . , ep
xCk ⫽ ␣1k ⫹ ␣2k⬘2 ⫹ . . . ⫹␣pk⬘,
p (28) form an orthonormal basis of our space. We can therefore rewrite
as
because {, =2, . . . , =p} is an orthonormal basis, so that
p
p ⫽ 具 e1, 典 e1 ⫹ 具 e2, 典 e2 ⫹ . . . ⫹ 具 ep, 典 ep ⫽ 兺 具ei, 典ei.
xCk ⫽ ␣1k ⫹ 兺 ␣ik⬘i
i⫽2
(29) i⫽1
(41)
p
) TxCk ⫽ T␣1k ⫹ T 兺 ␣ik⬘i (30) The variance of the length of projections for then is
冉兺 冊 冉兺 冊 冉兺 冊 冉兺 冊
i⫽2
p T p p T p
p
具ei, 典ei RC 具ei, 典ei ⫽ 具ei, 典ei 具ei, 典RCei
) TxCk ⫽ ␣1kT ⫹ 兺 ␣ikT⬘i
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
i⫽2
(42)
) TxCk ⫽ ␣1k .
冉兺 冊 冉兺 冊
(32)
p T p
In the last step, we used that ⬜⬘,i 2 ⱕ i ⱕ p and T ⫽ 㛳㛳 ⫽ ⫽ 具ei, 典ei 具ei, 典iei (43)
i⫽1 i⫽1
1. The mean of projections therefore is
p
N
兺 ␣1k ⫽ k⫽1
N
兺 TxCk (33)
⫽ 兺 具ei, 典2i 㛳 ei㛳2
i⫽1
(44)
k⫽1
In the last step, we used that ei ⬜ ei⬘ for 1 ⱕ i, i= ⱕ p and i ⫽ i⬘.
⫽ T 冉兺 冊
k⫽1
N
xCk (34) Note that the eigenvectors are standardized, so that 㛳ei㛳 ⫽ 1.
Further note that 具ei, 典2 ⱖ 0 and
⫽0 (35) p
because xCk is centered. We can therefore obtain the variance of 兺 具ei, 典2 ⫽ 1
i⫽1
(45)
the length of projections of xCk as
because 㛳㛳 ⫽ 1. Therefore, the variance of the length of projec-
N N tions for is a weighted sum of eigenvalues where the weights are
1
兺 具x , 典2 ⫽ 1
N ⫺ 1 k⫽1 Ck
(x · )2
N ⫺ 1 k⫽1 Ck 兺 (36) all non-negative and sum to one, such that
p
⫽ 1 T
N⫺1 冉兺 冊
N
k⫽1
xTCkxCk (38)
Hence, ⫽ e1 obtains a maximum of explained variance. If we
choose a second factor, we choose a line orthogonal to e1 and, by
analogy, arrive at the conclusion that ⫽ e2 with corresponding
⫽ RC .
T
(39) explained variance l2. For m extracted factors, the explained vari-
ance is
The variance of the length of projections is · RC · , we try T
m
to obtain the maximum.
We denote the eigenvectors of RC as e1, . . . , ep and the 兺
j⫽1
j. (47)
corresponding eigenvalues as 1, . . . , p such that 1 ⱖ 2 ⱖ
. . . ⱖ p. If we choose ⫽ e1, the variance is Received March 13, 2017
Revision received August 12, 2018
eT1 RC e1 ⫽ eT1 (1e1) ⫽ 1 . (40) Accepted August 21, 2018 䡲