Factor Analysis: Meaning of Underlying Variables
Factor Analysis: Meaning of Underlying Variables
Factor Analysis: Meaning of Underlying Variables
Factor means variables and analysis means examination of the variables. So factor analysis is a
technique in research that attempts to identify underlying variables, or factors, that explain the
pattern of correlations within a set of observed variables.
Factor analysis is a statistical data reduction and analysis technique that strives to explain
correlations among multiple outcomes as the result of one or more underlying explanations, or
factors. The technique involves data reduction, as it attempts to represent a set of variables by a
smaller number.
Assumption #1: You have multiple variables that should be measured at the continuous level
(although ordinal variables are very frequently used).
Assumption #2: There needs to be a linear relationship between all variables. The reason for this
assumption is that a PCA is based on Pearson correlation coefficients, and as such, there needs to
be a linear relationship between the variables. In practice, this assumption is somewhat relaxed
(even if it shouldn't be) with the use of ordinal data for variables.
Assumption #3: You should have sampling adequacy, which simply means that for PCA to
produce a reliable result, large enough sample sizes are required. Many different rules-of-thumb
have been proposed. These mainly differ depending on whether an absolute sample size is
proposed or if a multiple of the number of variables in your sample are used. Generally speaking,
a mimimum of 150 cases, or 5 to 10 cases per variable, has been recommended as a minimum
sample size. There are a few methods to detect sampling adequacy: (1) the Kaiser-Meyer-Olkin
(KMO) Measure of Sampling Adequacy for the overall data set; (2) the KMO measure for each
individual variable; and (3) Bartlett's test of sphericity.
Assumption #4: Your data should be suitable for data reduction. Effectively, you need to have
adequate correlations between the variables in order for variables to be reduced to a smaller
number of components. The method used by SPSS to detect this is Bartlett's test of sphericity.
Assumption #5: There should be no significant outliers. Outliers are important because these can
have a disproportionate influence on your results. SPSS recommends determining outliers as
component scores greater than 3 standard deviations away from the mean
This is used to identify complex interrelationships among items and group items that are part of
unified concepts. The researcher makes no "a priori" assumptions about relationships among
factors.
It is a more complex approach that tests the hypothesis that the items are associated with specific
factors. Hypothesized models are tested against actual data, and the analysis would demonstrate
loadings of observed variables on the latent variables (factors), as well as the correlation between
the latent variables.
Types of factoring
There are many methods of factor analysis namely Principal Factor Analysis, Canonical factor
analysis, Alpha factoring, Image Factoring etc. Most commonly used method is Principal Factor
Analysis.
Principal components analysis (PCA, for short) is a variable-reduction technique that shares
many similarities to exploratory factor analysis. Its aim is to reduce a larger set of variables into
a smaller set of 'articifial' variables, called 'principal components', which account for most of the
variance in the original variables.
These are the scores of each case (row) on each factor (column). Computing factor scores allows
one to look for factor outliers. Also, factor scores may be used as variables in subsequent
modeling.
DA is used when:
The dependent is categorical with the predictor IV’s at interval level such as age, income,
attitudes, perceptions, and years of education, although dummy variables can be used as
predictors as in multiple regression. Logistic regression IV’s can be of any level of
measurement.
There are more than two DV categories, unlike logistic regression, which is limited to a
dichotomous dependent variable.
a = a constant
This function is similar to a regression equation or function. The v’s are unstandardized
discriminant coefficients analogous to the b’s in the regression equation. These v’s maximize the
distance between the means of the criterion (dependent) variable. Standardized discriminant
coefficients can also be used like beta weight in regression. Good predictors tend to have large
weights. What you want this function to do is maximize the distance between the categories, i.e.
come up with an equation that has strong discriminatory power between groups. After using an
existing set of data to calculate the discriminant function and classify cases, any new cases can
then be classified. The number of discriminant functions is one less the number of groups. There
is only one function for the basic two group discriminant analysis.
Purpose of DA
There are several purposes of DA:
To investigate differences between groups on the basis of the attributes of the cases,
indicating which attributes contribute most to group separation. The descriptive technique
successively identifies the linear combination of attributes known as canonical
discriminant functions (equations) which contribute maximally to group separation.
Predictive DA addresses the question of how to assign new cases to groups. The DA
function uses a person’s scores on the predictor variables to predict the category to which
the individual belongs.
To determine the most parsimonious way to distinguish between groups.
To classify cases into groups. Statistical significance tests using chi square enable you to
see how well the function separates the groups.
To test theory whether cases are classified as predicted.