Factor Analysis
Factor Analysis
Factor Analysis
Pertemuan10-APG-
Outline
• Introduction
• Orthogonal Factor Model with 𝑚 Common Factors
• Model Estimation:
• The Principal Component (and Principal Factor) Method
• The Maximum Likelihood Method
• Factor Rotation
• Estimation of Factor Scores:
• The Weighted Least Squares Method
• The Unweighted (Ordinary) Least Squares Method
• The Regression Method
• Major Steps in Exploratory Factor Analysis (EFA)
• Perspectives and A Strategy for Factor Analysis
• Application with R
2 Factor Analysis
Introduction
Factor analysis is a theory driven statistical data reduction technique used to explain covariance
among observed random variables in terms of fewer unobserved random variables named
factors/ latent variable.
There are two kinds of factor analysis:
• Exploratory Factor Analysis (EFA)
• Confirmatory Factor Analysis (CFA)
Exploratory:
• summarize data
• describe correlation structure between variables
• generate hypotheses
Confirmatory:
• Testing correlated measurement errors (Test a priori theory)
• Test reliability of measures
• Redundancy test of one-factor vs. multi-factor models
3 Factor Analysis
Exploratory Factor Analysis (EFA)
• The first stage consists of the estimation of the parameters in the model and the
rotation of the factors, followed by an (often heroic) attempt to interpret the fitted
model.
• The second stage is concerned with estimating latent variable scores for each
individual in the data set
4 Factor Analysis
Orthogonal Factor Model with 𝑚 Common Factors
5 Factor Analysis
Covariance Structure for The Orthogonal Factor Model
• Communality of 𝑋𝑖 : ℎ𝑖2
= % variance of 𝑋𝑖 explained by 𝐹1 , 𝐹2 , … , 𝐹𝑚
𝜓𝑖 = 𝜎𝑖𝑖 − ℎ𝑖2
or
𝜓𝑖 = 1 − ℎ𝑖 (for standardize form)
2
6 Factor Analysis
One Common Factor Model:
Model Interpretations
l1 X1 𝜀1 Example:
Spearman considered a sample of
l2 children's examination marks in three
X2 𝜀2 subjects, Classics (X1), French (X2), and
F
l3 English (X3).
In this example, the underlying
X3 𝜀3 latent variable or common factor 𝐹,
might possibly be equated with
𝑋1 = 𝜇1 + 𝑙1 𝐹 + 𝜀1 intelligence or general intellectual
𝑋2 = 𝜇2 + 𝑙2 𝐹 + 𝜀2 ability.
𝑋3 = 𝜇3 + 𝑙3 𝐹 + 𝜀3
7 Factor Analysis
Two-Common Factor Model (Orthogonal):
Orthogonal (𝐹1 & 𝐹2 independent):
X1 𝜀1
l11 𝑐𝑜𝑣 𝐹1 , 𝐹2 = 0
l21
F1 l31 X2 𝜀2 Model Interpretation
l41
Given all variables in standardized form,
l51
i.e. 𝑉𝑎𝑟 𝑋𝑖 = 𝑉𝑎𝑟 𝐹𝑖 = 1
l61
X3 𝜀3 Factor loadings: lij
𝑙𝑖𝑗 = 𝑐𝑜𝑟𝑟 𝑋𝑖 , 𝐹𝑗
l12 l22 l
32 X4 𝜀4 Communality of 𝑋𝑖 : ℎ𝑖2
l42 ℎ𝑖2 = 𝑙𝑖1 2 + 𝑙𝑖2 2
= % variance of 𝑋𝑖 explained by 𝐹1 & 𝐹2
F2 l52
X5 𝜀5
l62 Uniqueness of 𝑋𝑖 (specific variance): 1 − ℎ𝑖2
= residual variance of 𝑋𝑖
X6 𝜀6
8 Factor Analysis
Model Estimation
• Goal: Does the factor model, with a small number of factors, adequately represent the data?
• If the off-diagonal elements of sample covariance matrix 𝑺 are small or those of the
sample correlation matrix R essentially zero, the variables are not related, and a factor analysis
will not prove useful.
• If covariance matrix 𝜮 appears to deviate significantly from a diagonal matrix, then a factor
model can be entertained, and the initial problem is one of estimating the factor loadings 𝑙𝑖𝑗
and specific variances 𝜓𝑖 .
• We shall consider two of the most popular methods of parameter estimation,
• the principal component (and the related principal factor) method
• and the maximum likelihood method.
• The solution from either method can be rotated in order to simplify the interpretation
of factors
• It is always prudent to try more than one method of solution, if the factor model is
appropriate for the problem at hand, the solutions should be consistent with one
another.
9 Factor Analysis
The Principal Component (and Principal Factor) Method
We prefer models that explain the covariance structure in terms of just a few common factors,
(𝑚 < 𝑝), thus
𝜮 ≐ 𝑳𝑳′ + 𝚿
10 Factor Analysis
11 Factor Analysis
• Ideally, the contributions of the first few factors to the sample variances of the variables
should be large.
2
• The contribution to the sample variance 𝑠𝑖𝑖 from the first common factor is 𝑙ሚ𝑖1 .
• The contribution to the total sample variance, 𝑠11 + 𝑠22 + ⋯ + 𝑠𝑝𝑝 = 𝑡𝑟 𝑺 , from the
first common factor is then
′
2 2 2
𝑙ሚ11 + 𝑙ሚ21 + ⋯ + 𝑙ሚ𝑝1 = 𝜆መ 1 , 𝒆ො 𝟏 𝜆መ1 , 𝒆ො 𝟏 = 𝜆መ 1
since the eigenvector 𝒆ො 𝟏 has unit length.
• In general,
12 Factor Analysis
The Maximum Likelihood Method
• If the common factors 𝐹 and the specific factors 𝜺 can be assumed to be normally
distributed, then maximum likelihood estimates of the factor loadings and specific variances
may be obtained.
• When 𝑭𝑗 and 𝜺𝑗 are jointly normal, the observations 𝑿𝑗 − 𝝁 = 𝑳𝑭𝑗 + 𝜺𝑗 are then normal,
and the likelihood is
𝑛
−𝑛𝑝/2
1 ′
𝐿 𝝁, 𝜮 = 2𝜋 𝜮 −𝑛/2 exp − 𝑡𝑟 𝜮−1 𝐱𝑗 − 𝐱ത 𝐱𝑗 − 𝐱ത + 𝐱ത − 𝝁 𝐱ത − 𝝁 ′
2
𝑗=1
13 Factor Analysis
Let 𝑿𝟏 , 𝑿𝟐 ,…, 𝑿𝒏 be a random sample from 𝑁𝑝 𝝁, 𝜮 , where 𝜮 = 𝑳𝑳′ + 𝚿 is the
covariance matrix for the 𝑚 common factor model of 𝑿 = 𝝁 + 𝑳𝑭 + 𝜺 . The
maximum likelihood estimators 𝑳 , 𝚿 and 𝝁
ෝ = 𝐱ത maximize likelihood 𝐿 𝝁, 𝜮 subject
to 𝑳 ′ 𝚿
−𝟏 𝑳 being diagonal.
• So the maximum likelihood estimates of the communalities are
2 2 2 2
ℎ 𝑖 = 𝑙ሚ𝑖1 + 𝑙ሚ𝑖2 + ⋯ + 𝑙ሚ𝑖𝑚 , 𝑖 = 1,2, … , 𝑝
2 2 2
Proportion of Total sample ሚ
𝑙1𝑗 + ሚ
𝑙2𝑗 + ⋯ + ሚ
𝑙𝑝𝑗
=
variance due to 𝑗th factor 𝑠11 + 𝑠22 + ⋯ + 𝑠𝑝𝑝
𝟏
−𝟐
• If the variables 𝑿 are standardized so that 𝒁 = 𝑽 𝑿 − 𝝁 the covariance matrix
will be a correlation matrix 𝝆, and
2 2 2
Proportion of Total sample ሚ
𝑙1𝑗 + ሚ
𝑙2𝑗 + ⋯ + ሚ
𝑙𝑝𝑗
=
variance due to 𝑗th factor 𝑝
14 Factor Analysis
Factor Rotation
• Goal is simple structure
• Make factors more easily interpretable
• While keeping the number of factors and communalities of Xs fixed!!!
• Rotation does NOT improve fit!
When number of factors 𝑚 > 1 , there is always some inherent ambiguity associated with the
factor model. To see this, let 𝑻 be any 𝑚 × 𝑚 orthogonal matrix, so that 𝑻𝑻′ = 𝑻′𝑻 = 𝑰.
Then the expression in 𝑿 − 𝝁 = 𝑳𝑭 + 𝜺 can be written
𝑿 − 𝝁 = 𝑳𝑻𝑻′ 𝑭 + 𝜺 = 𝑳∗ 𝑭∗ + 𝜺
Where 𝑳∗ = 𝑳𝑻 and 𝑭∗ = 𝑻′ 𝑭
𝐸 𝑭∗ = 𝐸 𝑻′ 𝑭 = 𝑻′ 𝐸 𝑭 = 𝟎
𝐶𝑜𝑣 𝑭∗ = 𝐶𝑜𝑣 𝑻′ 𝑭 = 𝑻′ 𝐶𝑜𝑣 𝑭 𝑻 = 𝑻′ 𝑰𝑻 = 𝑻′ 𝑻 = 𝑰
• That is, the factors 𝑭 and 𝑭∗ = 𝑻′ 𝑭 have the same statistical properties, and even though the
loadings 𝑳∗ are, in general, different from the loadings 𝑳.
• So the loadings 𝑳 and 𝑳∗ give the same representation.
• The communalities are also unaffected by the choice of 𝑻
15 Factor Analysis
Types of Rotations:
• Orthogonal Rotation (The uncorrelated common factors are regarded as unit vectors along
perpendicular coordinate axes):
• Quartimax
• Varimax
Varimax procedure selects the orthogonal transformation 𝑻 that makes,
𝑚 𝑝 𝑝 2
∗4 ∗2
𝑉 = 1/𝑝 𝑙ሚ𝑖𝑗 − 𝑙ሚ𝑖𝑗 /𝑝
𝑗=1 𝑖=1 𝑖=1
as large as possible.
𝑚
16 Factor Analysis
Factor Scores
• The estimated values of the common factors, called factor scores
• Factor scores are not estimates of unknown parameters in the usual sense.
• The factors F themselves are variables
• “Object’s” score is weighted combination of scores on input variables
𝒇 = 𝑾
𝑿
• These weights are NOT the factor loadings!
• Different approaches exist for estimating 𝑾:
• Weighted least squares method
• Ordinary least squares method
• Regression method
• Factor scores are not unique
• Using factors scores instead of factor indicators can reduce measurement error, but does
NOT remove it.
• Therefore, using factor scores as predictors in conventional regressions leads to inconsistent
coefficient estimators!
17 Factor Analysis
The Weighted Least Squares Method
Suppose first that the mean vector 𝝁, the factor loadings 𝑳, and the specific variance
𝚿 are known for the factor model
𝑿 − 𝝁 = 𝑳𝑭 + 𝜺
Further, regard the specific factors 𝜺 = 𝜀1 , 𝜀2 , … , 𝜀𝑝 as errors. Since 𝑉𝑎𝑟 𝜀𝑖 = 𝜓𝑖 , 𝑖 =
1,2, … , 𝑝 need not be equal, Bartlett has suggested that weighted least squares be used to
estimate the common factor values.
Bartlett proposed choosing the estimates 𝐟መ of 𝐟 to minimize the sum of the squares of the
errors, weighted by the reciprocal of their variances, is
𝑝
𝜀𝑖 2
= 𝜺′ 𝚿 −𝟏 𝜺 = 𝐱 − 𝝁 − 𝑳𝐟 ′ 𝚿 −𝟏 𝐱 − 𝝁 − 𝑳𝐟
𝜓𝑖
𝑖=1
The solution is
−𝟏
𝐟መ = 𝑳′ 𝚿 −𝟏 𝑳 𝑳′ 𝚿 −𝟏 𝐱 − 𝝁
By maximum likelihood method, we have the estimates 𝑳 , 𝚿, and 𝝁ෝ = 𝐱ത and we take this as the
true values and obtain the factor scores for the 𝑗th case as
−𝟏 𝑳 −𝟏 𝑳 ′ 𝚿
𝐟መ𝒋 = 𝑳 ′ 𝚿 −𝟏 𝐱 𝒋 − 𝐱ത , 𝑗 = 1,2, … , 𝑚
∗
If rotated loading 𝑳∗ = 𝑳𝑻 are used, then 𝐟መ𝒋 = 𝑻′𝐟መ𝒋
18 Factor Analysis
The Unweighted (Ordinary) Least Squares Method
Suppose first that the mean vector 𝝁, the factor loadings 𝑳, and the specific variance
𝚿 are known for the factor model
𝑿 − 𝝁 = 𝑳𝑭 + 𝜺
the specific factors 𝜺 = 𝜀1 , 𝜀2 , … , 𝜀𝑝 as errors.
• If the factor loadings 𝑳 are estimated by the principal component method, it is customary to
generate factor scores using an unweighted (ordinary) least squares procedure.
• Implicitly, this amounts to assuming that the 𝑉𝑎𝑟 𝜀𝑖 = 𝜓𝑖 , 𝑖 = 1,2, … , 𝑝 are equal or
nearly equal.
The factor scores are then
−𝟏 ′
𝐟መ𝒋 = 𝑳 ′ 𝑳 𝑳 𝐱 𝒋 − 𝐱ത , 𝑗 = 1,2, … , 𝑛
we see that the 𝐟መ𝒋 are nothing more than the first 𝑚 (scaled) principal components, evaluated at 𝐱𝑗 .
19 Factor Analysis
The Regression Method
Suppose first that the mean vector 𝝁, the factor loadings 𝑳, and the specific variance
𝚿 are known for the factor model
𝑿 − 𝝁 = 𝑳𝑭 + 𝜺
the specific factors 𝜺 = 𝜀1 , 𝜀2 , … , 𝜀𝑝 as errors.
• When the common factors 𝑭 and the specific factors (or errors) 𝜺 are jointly normally
distributed the linear combination 𝑿 − 𝝁 = 𝑳𝑭 + 𝜺 has an 𝑁𝑚+𝑝 𝟎, 𝜮 = 𝑳𝑳′ + 𝜳 distribution.
• Given any vector of observations 𝐱𝑗 , and taking the maximum likelihood estimates 𝑳 and 𝚿,
and
ෝ = 𝐱ത as the true values, the 𝑗th factor score vector is given by
𝝁
𝐟መ𝒋 = 𝑳 ′ 𝚺
−𝟏 𝐱 𝒋 − 𝐱ത , 𝑗 = 1,2, … , 𝑛
= 𝑳 ′ 𝑺−𝟏 𝐱 𝒋 − 𝐱ത
Or if correlation matrix is factored,
′
𝐟መ𝒋 = 𝑳 𝒁 𝑹−𝟏 𝒛𝒋 , 𝑗 = 1,2, … , 𝑛
Where 𝒛𝒋 = 𝑫−𝟏/𝟐 𝐱 𝒋 − 𝐱ത , 𝑫𝟏/𝟐 is a standard deviation diagonal matrix, and
′
ෝ = 𝑳 𝒁 𝑳 𝒁 + 𝚿
𝝆 𝒁
20 Factor Analysis
Major Steps in EFA
1. Data collection and preparation
2. Choose number of factors to extract
3. Model fitting (Extracting initial factors)
4. Rotation to a final solution
5. Model diagnosis/refinement
6. Derivation of factor scores to be used in further analysis
(e.g. SEM analysis)
21 Factor Analysis
Perspectives and A Strategy for Factor Analysis
• There are many decisions that must be made in any factor analytic study. Probably the most
important decision is the choice of 𝑚, the number of common factors.
• Although a large sample test of the adequacy of a model is available for a given 𝑚, it is
suitable only for data that are approximately normally distributed. Moreover, the test will
most assuredly reject the model for small 𝑚 if the number of variables and observations
is large.
• The final choice of 𝑚 is based on some combination of: (1) the proportion of the sample
variance explained, (2) subject-matter knowledge, and (3) the "reasonableness" of the
results.
• The choice of the solution method and type of rotation is a less crucial decision.
• At the present time, factor analysis still maintains the flavor of an art, and no single strategy.
22 Factor Analysis
Jhonson et al. (2002) suggest and illustrate one reasonable option:
1. Perform a principal component factor analysis. This method is particularly appropriate for a
first pass through the data. (It is not required that 𝑹 or 𝑺 be nonsingular.)
a) Look for suspicious observations by plotting the factor scores. Also, calculate
standardized scores for each observation and squared distances
b) Try a varimax rotation.
2. Perform a maximum likelihood factor analysis, including a varimax rotation.
3. Compare the solutions obtained from the two factor analyses.
a) Do the loadings group in the same manner?
b) Plot factor scores obtained for principal components against scores from the maximum
likelihood analysis.
4. Repeat the first three steps for other numbers of common factors m. Do extra factors
necessarily contribute to the understanding and interpretation of the data?
5. For large data sets, split them in half and perform a factor analysis on each part.
6. Compare the two results with each other and with that obtained from the complete data set
to check the stability of the solution. (The data might be divided at random or by placing the
first half of the cases in one group and the second half of the cases in the other group.)
See Example 9.14 (Jhonson, 2002)
23 Factor Analysis
24 Factor Analysis
25 Factor Analysis
• If the loadings on a particular factor agree, the pairs of scores should cluster tightly about
the 45° line through the origin.
• Sets of loadings that do not agree will produce factor scores that deviate from this pattern.
• If the latter occurs, it is usually associated with the last factors and may suggest that the
number of factors is too large. That is, the last factors are not meaningful. This seems to be
the case with the third factor in the chicken-bone data, as indicated by Plot (c) in Figure 9.6.
Plots of pairs of factor scores using estimated loadings from two solution methods are also
good tools for detecting outliers.
• If the sets of loadings for a factor tend to agree, outliers will appear as points in the
neighborhood of the 45 ° line, but far from the origin and the cluster of the remaining
points.
• It is clear from Plot (b) in Figure 9.6 that one of the 276 observations is not consistent with
the others. It has an unusually large F2-score.
• When this point, [39.1, 39.3, 75.7, 115, 73.4, 69. 1 ] , was removed and the analysis repeated,
the loadings were not altered appreciably.
26 Factor Analysis
Relationship between factor analysis and principal
component analysis
• Both factor analysis and principal component analysis have the goal of reducing dimensionality.
• The differences are:
• In factor analysis, the variables are expressed as linear combinations of the factors,
whereas the principal components are linear functions of the variables,
𝑝
• in principal component analysis, the emphasis is on explaining the total variance σ𝑖 𝑠𝑖𝑖 ,
as contrasted with the attempt to explain the covariances in factor analysis,
• principal component analysis requires essentially no assumptions, while factor analysis
makes several key assumptions,
• the principal components are unique (assuming distinct eigenvalues of 𝑺), whereas the
factors are not unique, subject to an arbitrary rotation,
• if we change the number of factors, the (estimated) factors change. This does not happen
in principal components
• the calculation of factor scores is not as straightforward as the calculation of principal
component scores.
27 Factor Analysis
Application with R
Maximum Likelihood method:
factanal(x, factors =1, data = NULL, covmat = NULL, n.obs = NA, subset, na.action, start =
NULL,
scores = c("none", "regression", "Bartlett"), rotation = "varimax", control = NULL, ...)
Keterangan:
x : berupa formula (tanpa variable respon) atau matriks numerik dari objek
factors : banyak faktor yang akan diestimasi.
data : data frame yang digunakan apabila x berupa formula.
Covmat : matriks varians-kovarians dalam hal ini matriks korelasi juga merupakan matriks varians-kovarians
n.obs : banyaknya pengamatan dari data, opsi ini digunakan apabila opsi ‘covmat’ adalah matriks
kovarians.
subset : Spesifikasi pengamatan yang digunakan. Digunakan apabila opsi ‘x’ digunakan sebagai matriks
data atau formula.
na.action : opsi untuk data hilang, digunakan apabila opsi 'x' berupa formula
start : dengan nilai default ‘NULL’ adalah matriks yang berisi nilai awal dengan tiap kolom merupakan
set awal uniquenesses.
Scores : metode menghitung skor factor. Ada dua tipe yaitu "regression" bila menggunakan metode
Thompson, dan "Bartlett"’ bila menggunakan meto de Bartlett's weighted least-squares
Rotation : tipe rotasi yang digunakan, secara default bernilai "none"
28 Factor Analysis
#normality test
library(MVN)
mvn(dataku, mvnTest = "mardia")
See Beverit & Hothorn (2011) p.148 – example of exploratory factor analysis: expectations of
life
29 Factor Analysis
Other Package
library(psych)
fa(r, nfactors=1,n.obs = NA,n.iter=1, rotate="oblimin", scores="regression",
residuals=FALSE, SMC=TRUE, covar=FALSE,missing=FALSE,impute="median", min.err =
0.001, max.iter = 50,symmetric=TRUE, warnings=TRUE, fm=“minres",
alpha=.1,p=.05,oblique.scores=FALSE,np.obs=NULL,use="pairwise",cor="cor",
correct=.5,weight=NULL,...)
r : a correlation or covariance matrix or a raw data matrix. If raw data, the correlation matrix
will be found using pairwise deletion. If covariances are supplied, they will be converted to
correlations unless the covar option is TRUE.
nfactors : number of factors to extract, default is 1
rotate :"none", "varimax", "quartimax", "bentlerT", "equamax", "varimin", "geominT" and "bifactor"
are orthogonal rotations. "Promax", "promax", "oblimin", "simplimax", "bentlerQ, "geominQ"
and "biquartimin" and "cluster" are possible oblique transformations of the solution. The
default is to do a oblimin transformation, although versions prior to 2009 defaulted to
varimax. SPSS seems to do a Kaiser normalization before doing Promax, this is done here by
the call to "promax" which does the normalization before calling Promax in GPArotation
30 Factor Analysis
scores the default="regression" finds factor scores using regression. Alternatives for
estimating factor scores include simple regression ("Thurstone"), correlaton
preserving ("tenBerge") as well as "Anderson" and "Bartlett" using the appropriate
algorithms ( factor.scores). Although scores="tenBerge" is probably preferred for
most solutions, it will lead to problems with some improper correlation matrices.
fm Factoring method fm="minres" will do a minimum residual as will fm="uls". Both of
these use a first derivative. fm="ols" differs very slightly from "minres" in that it
minimizes the entire residual matrix using an OLS procedure but uses the empirical
first derivative. This will be slower. fm="wls" will do a weighted least squares (WLS)
solution, fm="gls" does a generalized weighted least squares (GLS), fm="pa" will do
the principal factor solution, fm="ml" will do a maximum likelihood factor analysis.
fm="minchi" will minimize the sample size weighted chi square when treating
pairwise correlations with different number of subjects per pair. fm ="minrank" will
do a minimum rank factor analysis. "old.min" will do minimal residual the way it was
done prior to April, 2017 (see discussion below). fm="alpha" will do alpha factor
analysis as described in Kaiser and Coffey (1965)
31 Factor Analysis