Unit - 4: Data Preparation and Analysis
Unit - 4: Data Preparation and Analysis
Unit - 4: Data Preparation and Analysis
Validation of
Editing Coding Data entry Classification Tabulation
data
EDITING
Field
Editing
Central
Editing
3. The column headings (captions) and the row headings (stubs) of the
table should be clear and brief.
• Qualitative Data
Analysis Techniques
• Quantitative Data
Analysis Techniques
Quantitative Research
Features:
Characteristic:
Features:
a. Depth Interview
b. Delphi Techniques
c. Focus Group
d. Projective Techniques
Basis of Qualitative Data Analysis Quantitative Data Analysis
Difference
Relaxation
Distant Vacation
Adventure
Historical
Local Vacation
No Vacation
Cluster Analysis
Usage 9
Usage 7
Usage 8
Usage 4
Usage 10 Cluster 2
Reason 2 Reason 9
Cluster 3 Reason 13
Reason 6
Usage 5
Reason 10
Usage 6 Usage 2 Reason 4
Cluster 1 Reason 12
Usage 1
Usage 3
Reason 11 Reason 7
Reason 3
Reason 5
Reason 14
Cluster 4 Reason 8
Reason 1
Reason 15
25.3%
53.8%
= Correlation < 0.50 2D Fit = 79.1%
Interpretation
• Correspondence analysis plots should be interpreted
by looking at points relative to the origin
– Points that are in similar directions are positively
associated
– Points that are on opposite sides of the origin are
negatively associated
– Points that are far from the origin exhibit the strongest
associations
• Also the results reflect relative associations, not just
which rows are highest or lowest overall
Application of SPSS
Application of SPSS : Factor analysis
A marketing concern would like to predict the sales of the cars from a set of
variable. However many of the variables are correlated and this might adversely
result in
178
a wrong prediction. The variables are vehicle type, price, engine size, fuel capacity,
fuel
efficiency, wheel base, horsepower, width, length. Factor analysis with principal
component extraction can be used to identify a manageable subset of predictors.
The steps to be followed in performing factor analysis and interpretation of the
same output is discussed below:
From the Data Editor Window
Click on “Analyze”
Click on “Data Reduction”
Click on “Factor...”
The following Factor Analysis dialog box will appear.
Select the variables you want to enter into the factor analysis by double clicking on
them,
or use the shift or control keys to select them and click the right arrow key to move
the
Application of SPSS
Application of SPSS : Factor analysis
The following Factor Analysis dialog box will appear.
Select the variables you want to enter into the factor analysis by double clicking on
them,
or use the shift or control keys to select them and click the right arrow key to move
the
selected variables to the “Variables” list on the right. Click Extraction
Extracting factors and factor rotation:
There is no hard and fast rule to determine the number of factors. A commonly used
convention is to use the number of factors with eigen values greater than 1. SPSS will
select this number by default. The screen plot may also be used to determine the
number of
factors
Application of SPSS
Application of SPSS: Cluster analysis
A car manufacturing concern would like to ascertain the current market for its
vehicles. For this it needs to group cars based on the information available regarding
various models of vehicles. The information regarding the vehicle type, price, engine
size, fuel capacity, fuel efficiency, wheel base, horsepower, width, length are
available.
The segmentation could be performed using the Hierarchical Cluster Analysis
procedure.
The steps are discussed below;
To perform cluster analysis from the menus choose:
Analyze
Classify
Hierarchical Cluster...
Click Plots.
Select Dendrogram.
Select None in the Icicle group.
Click Continue.
Application of SPSS
Click Method in the Hierarchical Cluster Analysis dialog box.
Select Nearest neighbor as the cluster method.
Select Z scores as the standardization in the Transform Values group.
Click Continue.
Click OK in the Hierarchical Cluster Analysis group.
Interpretation of the output
The output of cluster analysis is discussed below: The dendrogram is a graphical
summary of the cluster solution.
Cases are listed along the left vertical axis. The horizontal axis shows the
distance between clusters when they are joined. Parsing the classification tree to
determine the number of clusters is a subjective process. Generally, the "gaps"
between
joinings along the horizontal axis is looked for . Starting from the right, there is a gap
between 20 and 25, which splits the automobiles into two clusters. There is another
gap
from approximately 4 to 15, which suggests 6 clusters
Application of SPSS
Application of SPSS: Discriminant analysis
Using cluster analysis a telephone company has categorized the customers into
four groups viz., Basic service, e- service, plus service and total service. The concern
wants to predict group membership so as to customize offers for individual
prospective
customers. The predication should be based on the demographic data viz., gender ,
age,
marital status, income, education, number of years in current address, years with
current
employer, retired and number of people in family. The Discriminant Analysis
procedure
can be used to classify customers.
The steps are discussed below;
To run the discriminant analysis, from the menus choose:
Analyze-Classify-Discriminant...
Select the grouping variable.
Click Define Range, Enter the Minimum, Enter the Maximum
Click Continue-Click Classify in the Discriminant Analysis dialog box.
Application of SPSS
Application of SPSS: Discriminant analysis
Select Summary table and Territorial map.
Click Continue.
Click OK in the Discriminant Analysis dialog box.
These selections produce a discriminant model using the stepwise method of
variable selection.
Interpretation of the Output
The discriminant model produced using the stepwise method of variable selection
is discussed below;
Variables Not in Analysis
The following table displays statistics for the variables that are in the analysis at
each step.
Variables in Analysis
Tolerance is the proportion of a variable's variance not accounted for by other
independent variables in the equation. A variable with very low tolerance contributes
little information to a model and can cause computational problems.
F to Remove values are useful for describing what happens if a variable is
removed from the current model (given that the other variables remain). F to
Application of SPSS
Application of SPSS: Discriminant analysis
Variables in Analysis
Tolerance is the proportion of a variable's variance not accounted for by other
independent variables in the equation. A variable with very low tolerance contributes
little information to a model and can cause computational problems.
F to Remove values are useful for describing what happens if a variable is
removed from the current model (given that the other variables remain). F to
Remove for the entering variable is the same as F to Enter at the previous step
(shown in the
Variables Not in the Analysis table
From the Summary of the Canonical functions – eigen values table it can be seen that
nearly all of the variance explained by the model is due to the first two discriminant
functions
Three functions are fit automatically, but due to its minuscule eigenvalue, the
third function can be ignored.
Wilks' lambda shows that only the first two functions are useful
Application of SPSS
Application of SPSS : Multiple Regression and Correlation
An automobile concern wants to identify the sales for a variety of personal
motor vehicles so as to identify over- and underperforming models. This necessitates
221
establishing a relationship between vehicle sales and vehicle characteristics.
Information
concerning different makes and models of cars like the vehicle type, price, engine
size,
fuel capacity, fuel efficiency, wheel base, horsepower, width, length are available.
Linear regression can be performed in SPSS to identify models that are not selling
well. Steps are discussed below;
To run a linear regression analysis, from the menus choose:
Analyze
Regression
Linear
Select the dependent variable, Select the Independent variables.
Select Stepwise as the entry method, Select the case labeling variable.
Click Statistics
Application of SPSS
Application of SPSS : Multiple Regression and Correlation
Select Case wise diagnostics and type 2 in the text box.
Click Continue.
Click Plots in the Linear Regression dialog box.
Select the y variable and the x variable.
Select Histogram.
Click Continue.
Click Save in the Linear Regression dialog box.
Select Standardized in the Predicted Values group.
Select Cook's and Leverage values in the Distances group.
Click Continue.
Click OK in the Linear Regression dialog box
Interpretation of output
The collinearity among the variables needs to be verified from the output collinearity
diagnostics. If the eigen values are close to 0, it means that the predictors are highly
inter correlated and that small changes in the data values may lead to large changes
in the estimates of the coefficients. Condition index values greater than 15 indicate a
possible problem with collinearity; greater than 30, a serious problem.
BIVARIATE CORRELATION ANALYSIS
Bivariate Statistical
Techniques
Linear Correlation
Simple Regression
Two - way
ANOVA
Pearson’s correlation coefficient ‘r’ measures the direction and the strength
of the linear association between two numerical paired variables in a
bivariate correlation analysis.
y = a + bx
Y = Dependent Variable
X = Independent variable
Where a and b are constants which determine that the line is completed.
SIMPLE REGRESSION
The dictionary term of the term ‘regression’ is the act of returning or
going back. The term ‘regression’ was first used by Sir Francis Galton in
1877 while studying the relationship between the heights of father and sons.
Regression equation of Y on X
∑Y = Na + b∑X
∑XY = a∑X + ∑X2
Regression equation of X on Y
∑X = Na + b∑Y
∑XY = a∑Y + ∑Y2
ANOVA
For example, the sales of Hyundai Verna car may be attributed to different
salesmen and different states.
YES NO Cluster
YES NO Analysis
Multivariate
analysis of
Use Multiple
variance
Regressions
Multi
Use Multiple Canonical Dimensional
Discriminant analysis Scaling
Analysis
Regression Analysis
We now take X1 as the dependent variable and try to find out its
relative movement for movements in both X2 and X3, which are
independent variables.
For Example:
That factor can then be interpreted in terms of the variables that load
high on it.
●
Principal Component
Analysis (PCA)
●
Common Factor
Analysis (CFA)
Principal Component Analysis
Helps in assessing
the images of a company/enterprise,
Attitudes of sales personnel and customers.
Factor Analysis - Example
Purpose: Customer feedback about a two wheeler
manufactured by a company.