Unit - 4: Data Preparation and Analysis

Unit – 4
DATA PREPARATION AND ANALYSIS

Data Preparation
The data collected from the respondents is generally not

in the form to be analyzed directly. After the responses are
recorded or received, the next stage is that of preparation of
data i.e. to make the data amenable for appropriate analysis.
Data preparation includes editing, coding, and data

entry and is the activity that ensures the accuracy of the
data and their conversion from raw form to reduced and
classified forms that are more appropriate for analysis.
Preparing a descriptive statistical summary is another
preliminary step leading to an understanding of the collected
data
Data
Preparation
Validation of
Editing Coding Data entry Classification Tabulation
data
EDITING
The customary first step in analysis is to edit the raw

data. Editing detects errors and omissions, corrects them
when possible, and certifies that maximum data quality
standards are achieved. The editor's purpose is to
guarantee that data are:

1. Accurate.
2. Consistent with the intent of the question and other
information in the survey.
3. Uniformly entered.
4. Complete.
5. Arranged to simplify coding and tabulation
Editing
Field
Editing
Central
Editing
In large projects, field editing review is a responsibility of

the field supervisor. It, should be done soon after the data
have been gathered. During the stress of data collection in
a personal interview and paper-and-pencil recording in an
observation, the researcher often uses ad hoc
abbreviations special symbols. Soon after the interview,
experiment, or observation, the investigator should review
the reporting forms
Central Editing
It should take place when all forms or schedules have

been completed and returned to the office.
This type of editing implies that all forms should get a

thorough editing by a single editor in a small study and
by a team of editors in case of a large inquiry.
Editor(s) may correct the obvious errors such as an

entry in the wrong place, entry recorded in months when it
should have been recorded in weeks, and the like. In case
of inappropriate on missing replies, the editor can
sometimes determine the proper answer by reviewing the
other information in the schedule. At times, the respondent
can be contacted for clarification.
Be familiar with instructions given to interviewers and
coders.
 Do not destroy, erase, or make illegible the original

entry by the interviewer;
Original entries should remain legible.
Make all editing entries on an instrument in some

distinctive color and in a standardized form.
Initial all answers changed or supplied.
Place initials and date of editing on each instrument

completed.
CODING
Coding refers to the process of assigning numerals or

other symbols to answers so that responses can be put
into a limited number of categories or classes.
Numeric coding simplifies the researcher's task in

converting a nominal variable, like gender, to a "dummy
variable,". Statistical software also can use alphanumeric
codes, as when we use M and F, or other letters, in
combination with numbers and symbols for gender.
CODING
Coding involves assigning numbers or other

symbols to answers so that the responses can be
grouped into a limited number of categories.
In coding, categories are the partitions of a data set of a

given variable (e.g., if the variable is gender, the partitions
are male and female).
Both closed- and open-response questions must be coded.

Some examples of Pre coded
questions
Questions Answers Codes

How often these days do you go to the More than once a week 1
cinema? Once a week 2
Once a fortnight 3
Three or four times a year 4
Less often 5
Never 6
Which type(s) of wristwatch do you own? Hand – wound 1
Automatic 2
Electronic 3
Which battery – operated equipment do Torch 1
you have at home? Transistor 2
Other (specify) 3
VALIDATION OF DATA
After the data is coded, it is validated for data entry errors.
The data is then used for further analysis. The purpose of
validating the data is that it has been collected as per the
specifications in the prescribed format or questionnaire.
For example, if the respondent is asked to rate a particular
aspect on 1 to 7, then the obvious responses should be 1 or 2
….., or 7. Any other inputted number is not considered as valid.
In validation of the data, the above data will be restricted to the
integers between 1 and 7. This minimizes the errors. The other
validations are age within a number like 100, dates such as
birth dates, joining dates, etc should not be future dates etc.
CLASSIFICATION
Data having a common characteristic are placed in one class and in

this way the entire data get divided into a number of groups or classes.
Classification can be one of the following two types, depending upon the
nature of the phenomenon involved:
Classification according to attributes: As stated above, data are classified

on the basis of common characteristics which can either be descriptive (such
as literacy, sex, honesty, etc.) or numerical (such as weight, height, income,
etc.).
Classification according to class-intervals:
Data relating to income, production, age, weight, etc. come under this
category. Such data are known as statistics of variables and are classified on
the basis of class intervals. For instance, persons whose incomes, say, are
within Rs 201 to Rs 400 can form one group, those whose incomes are
within Rs 401 to s 600 can form another group and so on
TABULATION
Tabulation is the process of summarizing raw data and

displaying the same in compact form (i.e., in the form of
statistical tables) for further analysis. In a broader sense,
tabulation is an orderly arrangement of data in columns and
rows.
Tabulation is essential because of the following reasons.

1. It conserves space and reduces explanatory and
descriptive statement to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of
errors and omissions.
4. It provides a basis for various statistical computations.
Generally accepted principles of tabulation:
1. Every table should have a clear, concise and adequate title so as to

make the table intelligible without reference to the text and this title
should always be placed just above the body of the table.
2. Every table should be given a distinct number to facilitate easy

reference.
3. The column headings (captions) and the row headings (stubs) of the
table should be clear and brief.
4. The units of measurement under each heading or sub-heading must

always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed

directly beneath the table, along with the reference symbols used in the
table.

6. Source or sources from where the data in the table have
been obtained must be indicated just below the table.
7. Usually the columns are separated from one another by lines

which make the table more readable and attractive.
8. The columns may be numbered to facilitate reference.
9. Decimal points and (+) or (-) signs should be in perfect

alignment.
10. Abbreviations should be avoided to the extent possible

and ditto marks should not be used in the table.

DATA ENTRY

Data entry converts information gathered by secondary
or primary methods to a medium for viewing and manipulation.
Keyboarding remains a mainstay for researchers who need to

create a data file immediately and store it in a minimal space on
a variety of media.
However, researchers have profited from more efficient ways

of speeding up the research process, especially from bar coding
and optical character and mark recognition.
TYPES OF DATA ANALYSIS
• Qualitative Data
Analysis Techniques
• Quantitative Data
Analysis Techniques
Quantitative Research
Measurability: Quantitative data is measurable. For example size of

the market, rate of product usage.
Features:
1. Data collected is numerical in nature.

2. Data collection methods are
a. Mail Questionnaire
b. Personal Interview
c. Telephonic Interview
Characteristic:
3. Sample sixe used is very large.

4. Structured questionnaire is used for data collection.
Qualitative Research
Measurability: Not possible or difficult to measure.
Features:
1. It is a kind of exploratory research

Characteristic:
2. Sample size used is usually small.
3. Unstructured questionnaire is used for data collection.
There are four major techniques in qualitative research. They are
a. Depth Interview
b. Delphi Techniques
c. Focus Group
d. Projective Techniques
Basis of Qualitative Data Analysis Quantitative Data Analysis
Difference
Focus Understand and Interpret Describe, explain and Predict
Sample Design Non – Probability, Purposive Probability
Interpretation It relies on interpretation and This analysis relies on

logic. STATISTICS.
Qualitative researchers present Quantitative research use graphs

their analyses using text and and tables to present their
arguments. analysis.
Basis of Qualitative Data Analysis Quantitative Data Analysis
Difference
Procedures Qualitative analysis has no set of Quantitative analysis follows
and Rules rules, but rather guidelines are agreed upon standardised
there to support the analysis. procedures and rules.
Occurrence This analysis occurs Quantitative analysis occurs after
simultaneously with data data collection is finished.
collection.
Methodology Qualitative analysis may vary Methods of Quantitative analysis
methods depending on the are determined in advance as part
situations. of the study design.
Reliability Qualitative analysis is validity, Their reliability is easy to
but is less reliability or establish and that they generally
consistent. involve sophisticated
They have a corresponding comparisons of variables in
weakness in their ability to different conditions.
compare variables in different
conditions.
Questions Open – ended questions and Specific questions obtain

probing yield detailed predetermined responses to
information that illuminates standardised questions.
nuances and highlights divers it.
Information Provide more information on the More likely provides
application of the program a information on the broad
specific context to a specific application of the program.
population.
Suitability More suitable when time and Relies on more extensive
resources are limited. interviewing.
Cluster Analysis
Cluster Analysis
• A mechanism for grouping objects, frequently

used for segmentation.
How would you group these faces?
Vacation Anyone?
Relaxation
Distant Vacation
Adventure
Historical
Local Vacation
No Vacation
Cluster Analysis
• It would be possible to run factor analysis and

then examine clusters after this.
• Two fundamental types of clustering exist: (1)

hierarchical; and, (2) k-means.
Cluster Analysis
• Techniques for identifying separate groups of
similar cases
– Similarity of cases is either specified directly in a
distance matrix, or defined in terms of some
distance function
• Also used to summarise data by defining
segments of similar cases in the data
– This use of cluster analysis is known as
“dissection”
Cluster Analysis
Cluster analysis is used:
• To classify persons or objects into small no.of
cluster or group.
• To identify the specific customer segment for
company’s brand.
Cluster Analysis is applicable when:
•An FMCG company wants to map the profile of its
target audience in terms of life style, attitude and
perception.
•A consumer durable company wants to know the
features and services
•A housing finance corporation wants to identify and
cluster the basic characteristics, lifestyles, mindset
Clustering Techniques
• Two main types of cluster analysis methods
– Hierarchical cluster analysis
• Each cluster (starting with the whole dataset) is divided into two,
then divided again, and so on
– Iterative methods
• k-means clustering (PROC FASTCLUS)
• Analogous non-parametric density estimation method
– Also other methods
• Overlapping clusters
• Fuzzy clusters
Applications
• Market segmentation is usually conducted
using some form of cluster analysis to divide
people into segments
– Other methods such as latent class models or
archetypal analysis are sometimes used instead
• It is also possible to cluster other items such
as products/SKUs, image attributes, brands
Conjoint Analysis?
What is Conjoint Analysis?
• CA is a multivariate technique used specifically to understand how
respondents develop preferences for products or services. It is
based on the simple premise that consumers evaluate the value or
utility of a product / service / concept / idea (real or hypothetical)
by combining the utility provided by each attribute characterizing
the product / service / concept / idea
• CA is a decompositional method. Respondents provide overall
evaluations of products that are presented to them as combos of
attributes. These evaluations are then used to infer the utilities of
the individual attributes comprising the products. In many
situations, this is preferable to asking respondents how important
certain attributes are, or to rate how well a product performs on
each of a number of attributes
Managerial uses of Conjoint Analysis
After determining the contribution of each attribute to the

consumer’s overall evaluation, one could
1. Define the object with the optimal combo of features
2. Predict market shares of different objects with different sets of
features
3. Isolate groups of customers who place differing importances
on different features
4. Identify marketing opportunities by exploring the market
potential for feature combos not currently available
5. Show the relative contributions of each attribute and each
level to the overall evaluation of the object
Commercial Applications
• Technique is widely used by consumer and industrial product

companies, service companies, marketing research, advertising and
consulting firms
• Over 400 commercial applications per year even in the mid 80s
• Types of applications include
– Consumer durables: automobiles, refrigerators, car stereos, condos, food
processors, HDTV
– Industrial products: copy machines, forklift trucks, computer software, aircraft
– Consumer nondurables: bar soaps, hair shampoos, disposable diapers
– Services: car rentals, credit cards, hotels, performance art series, rural health
care systems, BART
– Other: MBA job choice
Correspondence Analysis
• Provides a graphical summary of the interactions in a
table
• Also known as a perceptual map
– But so are many other charts
• Can be very useful
– E.g. to provide overview of cluster results
• However the correct interpretation is less than
intuitive, and this leads many researchers astray
Four Clusters (imputed, normalised)
Usage 9
Usage 7
Usage 8
Usage 4
Usage 10 Cluster 2
Reason 2 Reason 9
Cluster 3 Reason 13
Reason 6
Usage 5
Reason 10
Usage 6 Usage 2 Reason 4
Cluster 1 Reason 12
Usage 1
Usage 3
Reason 11 Reason 7
Reason 3
Reason 5
Reason 14
Cluster 4 Reason 8
Reason 1
Reason 15
25.3%
53.8%
= Correlation < 0.50 2D Fit = 79.1%
Interpretation
• Correspondence analysis plots should be interpreted
by looking at points relative to the origin
– Points that are in similar directions are positively
associated
– Points that are on opposite sides of the origin are
negatively associated
– Points that are far from the origin exhibit the strongest
associations
• Also the results reflect relative associations, not just
which rows are highest or lowest overall
Application of SPSS
Application of SPSS : Factor analysis
A marketing concern would like to predict the sales of the cars from a set of
variable. However many of the variables are correlated and this might adversely
result in
178
a wrong prediction. The variables are vehicle type, price, engine size, fuel capacity,
fuel
efficiency, wheel base, horsepower, width, length. Factor analysis with principal
component extraction can be used to identify a manageable subset of predictors.
The steps to be followed in performing factor analysis and interpretation of the
same output is discussed below:
From the Data Editor Window
Click on “Analyze”
Click on “Data Reduction”
Click on “Factor...”
The following Factor Analysis dialog box will appear.
Select the variables you want to enter into the factor analysis by double clicking on
them,
or use the shift or control keys to select them and click the right arrow key to move
the
Application of SPSS
Application of SPSS : Factor analysis
The following Factor Analysis dialog box will appear.
Select the variables you want to enter into the factor analysis by double clicking on
them,
or use the shift or control keys to select them and click the right arrow key to move
the
selected variables to the “Variables” list on the right. Click Extraction
Extracting factors and factor rotation:
There is no hard and fast rule to determine the number of factors. A commonly used
convention is to use the number of factors with eigen values greater than 1. SPSS will
select this number by default. The screen plot may also be used to determine the
number of
factors
Application of SPSS
Application of SPSS: Cluster analysis
A car manufacturing concern would like to ascertain the current market for its
vehicles. For this it needs to group cars based on the information available regarding
various models of vehicles. The information regarding the vehicle type, price, engine
size, fuel capacity, fuel efficiency, wheel base, horsepower, width, length are
available.
The segmentation could be performed using the Hierarchical Cluster Analysis
procedure.
The steps are discussed below;
To perform cluster analysis from the menus choose:
Analyze
Classify
Hierarchical Cluster...
Click Plots.
Select Dendrogram.
Select None in the Icicle group.
Click Continue.
Application of SPSS
Click Method in the Hierarchical Cluster Analysis dialog box.
Select Nearest neighbor as the cluster method.
Select Z scores as the standardization in the Transform Values group.
Click Continue.
Click OK in the Hierarchical Cluster Analysis group.
Interpretation of the output
The output of cluster analysis is discussed below: The dendrogram is a graphical
summary of the cluster solution.
Cases are listed along the left vertical axis. The horizontal axis shows the
distance between clusters when they are joined. Parsing the classification tree to
determine the number of clusters is a subjective process. Generally, the "gaps"
between
joinings along the horizontal axis is looked for . Starting from the right, there is a gap
between 20 and 25, which splits the automobiles into two clusters. There is another
gap
from approximately 4 to 15, which suggests 6 clusters
Application of SPSS
Application of SPSS: Discriminant analysis
Using cluster analysis a telephone company has categorized the customers into
four groups viz., Basic service, e- service, plus service and total service. The concern
wants to predict group membership so as to customize offers for individual
prospective
customers. The predication should be based on the demographic data viz., gender ,
age,
marital status, income, education, number of years in current address, years with
current
employer, retired and number of people in family. The Discriminant Analysis
procedure
can be used to classify customers.
The steps are discussed below;
To run the discriminant analysis, from the menus choose:
Analyze-Classify-Discriminant...
Select the grouping variable.
Click Define Range, Enter the Minimum, Enter the Maximum
Click Continue-Click Classify in the Discriminant Analysis dialog box.
Application of SPSS
Select Summary table and Territorial map.
Click Continue.
Click OK in the Discriminant Analysis dialog box.
These selections produce a discriminant model using the stepwise method of
variable selection.
Interpretation of the Output
The discriminant model produced using the stepwise method of variable selection
is discussed below;
Variables Not in Analysis
The following table displays statistics for the variables that are in the analysis at
each step.
Variables in Analysis
Tolerance is the proportion of a variable's variance not accounted for by other
independent variables in the equation. A variable with very low tolerance contributes
little information to a model and can cause computational problems.
F to Remove values are useful for describing what happens if a variable is
removed from the current model (given that the other variables remain). F to
Application of SPSS
Variables in Analysis
Tolerance is the proportion of a variable's variance not accounted for by other
independent variables in the equation. A variable with very low tolerance contributes
little information to a model and can cause computational problems.
F to Remove values are useful for describing what happens if a variable is
removed from the current model (given that the other variables remain). F to
Remove for the entering variable is the same as F to Enter at the previous step
(shown in the
Variables Not in the Analysis table
From the Summary of the Canonical functions – eigen values table it can be seen that
nearly all of the variance explained by the model is due to the first two discriminant
functions
Three functions are fit automatically, but due to its minuscule eigenvalue, the
third function can be ignored.
Wilks' lambda shows that only the first two functions are useful
Application of SPSS
Application of SPSS : Multiple Regression and Correlation
An automobile concern wants to identify the sales for a variety of personal
motor vehicles so as to identify over- and underperforming models. This necessitates
221
establishing a relationship between vehicle sales and vehicle characteristics.
Information
concerning different makes and models of cars like the vehicle type, price, engine
size,
fuel capacity, fuel efficiency, wheel base, horsepower, width, length are available.
Linear regression can be performed in SPSS to identify models that are not selling
well. Steps are discussed below;
To run a linear regression analysis, from the menus choose:
Analyze
Regression
Linear
Select the dependent variable, Select the Independent variables.
Select Stepwise as the entry method, Select the case labeling variable.
Click Statistics
Application of SPSS
Application of SPSS : Multiple Regression and Correlation
Select Case wise diagnostics and type 2 in the text box.
Click Continue.
Click Plots in the Linear Regression dialog box.
Select the y variable and the x variable.
Select Histogram.
Click Continue.
Click Save in the Linear Regression dialog box.
Select Standardized in the Predicted Values group.
Select Cook's and Leverage values in the Distances group.
Click Continue.
Click OK in the Linear Regression dialog box
Interpretation of output
The collinearity among the variables needs to be verified from the output collinearity
diagnostics. If the eigen values are close to 0, it means that the predictors are highly
inter correlated and that small changes in the data values may lead to large changes
in the estimates of the coefficients. Condition index values greater than 15 indicate a
possible problem with collinearity; greater than 30, a serious problem.
BIVARIATE CORRELATION ANALYSIS
Bivariate analysis refers to simultaneous analysis of two variables. It

is usually undertaken to see if one variable, such as gender is related to
another variable, perhaps attitudes toward male/female equality.
Bivariate Statistical
Techniques
Linear Correlation
Simple Regression
Two - way
ANOVA
Pearson’s correlation coefficient ‘r’ measures the direction and the strength
of the linear association between two numerical paired variables in a
bivariate correlation analysis.
The Pearson (product moment) correlation coefficient varies over a range

of + 1 through 0 to -1. The designation r symbolizes the coefficient's
estimate of linear association based on sampling data
LINEAR CORRELATION
The correlation between two variables is said to be linear if
corresponding to a unit change in the value of one variable there is a
constant change in the value of the other variable i.e. incase of linear
correlation the relation between the variables x and y is of the type.
y = a + bx

Y = Dependent Variable
X = Independent variable
Where a and b are constants which determine that the line is completed.
If a = 0, the relation become y = bx.

In such cases the values of the variables are in constant ratio.
Non - Linear (Curvilinear) Correlation:
The correlation between two variables is said to be non - linear
(curvilinear) if corresponding to a unit change in the value of one variable
does not change at a constant rate but at fluctuating rate of other variable.
SIMPLE REGRESSION
The dictionary term of the term ‘regression’ is the act of returning or
going back. The term ‘regression’ was first used by Sir Francis Galton in
1877 while studying the relationship between the heights of father and sons.
Regression equation of Y on X
∑Y = Na + b∑X
∑XY = a∑X + ∑X2
Regression equation of X on Y
∑X = Na + b∑Y
∑XY = a∑Y + ∑Y2
ANOVA
Prof. R.A.Fisher was the first man who use the

term ‘VARIANCE’. Later Professor Snedecor and
many others contributed to the development of this
technique.
ANOVA is essentially a procedure for testing the

difference among different groups of data for
homogeneity.
“The essence of ANOVA is that the total amount of
variation in a set of data is broken into two types, that
amount which can be attributed to specified causes.”
There may be variation between samples and also

within sample items.
Through ANOVA one can investigate the number of

factors which are hypothesized or said to influence the
dependent variable.
Two way ANOVA involves only two categorical variables or factors and
examines the effect of these two factors on the dependent variable.
For example, the sales of Hyundai Verna car may be attributed to different
salesmen and different states.
It examines the interaction between the different levels of these two

factors. Similarly, the production of a particular product in a factory may
be attributed to the different types of machines as well as the different
grades of executives.
Procedure for Two – Way ANOVA
1. Identify dependent and independent variables.

2. Partition (decomposition) of total variation.
3. Calculate variations
4. Calculate degree of freedom
5. Calculate mean square
6. Calculate F statistic or F ratio
7. Determine level of significance
8. Interpret the results.
Multivariate Data Analysis
Multivariate data analysis refers to any statistical technique used to

analyze data that arises from more than one dependable variable.
Most of the applied and behavioural researches, we generally resort to

multivariate analysis.
Multivariate analysis methods are typically used for

- Consumer and market research
- Quality control and quality assurance across a range of industries such as

food & beverages, paint, pharmaceuticals, chemicals and
telecommunications.
- Process optimization & process control
- Research and development

Multivariate techniques transforms a mass of observations
into a smaller number of composite scores in such a way that
they may reflect as much information as possible contained in the
raw data obtained concerning a research study.
The contribution of this techniques is arranging a large amount

of complex information in the real data into a simplified
visible form.
Multivariate methods
Dependence Methods Interdependence Methods
How many variables are dependent?
One Many Factor

Analysis
Is it Metric Are they Metric
YES NO Cluster
YES NO Analysis
Multivariate
analysis of
Use Multiple
variance
Regressions
Multi
Use Multiple Canonical Dimensional
Discriminant analysis Scaling
Analysis
Regression Analysis
Regression analysis is a statistical process for estimating the

relationships among variables.
Regression analysis helps one understand how the typical value

of the dependent variable (or 'Criterion Variable') changes when
any one of the independent variables is varied, while the other
independent variables are held fixed.
Uses of Regression Analysis
It is the most widely used techniques for and forecasting.
Regression analysis is also used to understand which among

the independent variables are related to the dependent variable,
and to explore the forms of these relationships.

In restricted circumstances, regression analysis can be used to
infer Casual relationships between the independent and
dependent variables.
In multiple regression analysis there are three or more
variables say X1, X2 and X3.
We now take X1 as the dependent variable and try to find out its
relative movement for movements in both X2 and X3, which are
independent variables.
Thus in multiple regression analysis the effect of two or more

independent variables on one dependent variables is studied.
Discriminant Analysis
The discriminant analysis aims at studying the effect of two

or more predictor variables on certain evaluation criterion.
The evaluation criterion is categorizd into two groups, they may

be good or bad, like or dislike, successful or unsuccessful, etc.
For Example:
•While grouping investment alternatives based on return, the criterion of the

rate of return will be categorized into ‘good’ or ‘bad’.
•While grouping products by consumers in terms of their flavour, the criterion

will be ‘like’ or ‘dislike’.
•While grouping the performance of an employee after training programme,

the criterion will be ‘above expected level’ and ‘below expected level’.
Factor Analysis
Factor analysis is a technique used to study

interrelationship among many variables.
The main purpose of factor analysis is to group

large set of variable factors into fewer factors.
Each factor will account for one or more

component. Each factor is a combination of many
variables.
Factor Analysis
Factor Analysis is an interdependence technique.

i.e. the variables are not classified as independent or
dependent variable but their interrelationship is studied.
Factor analysis is used to draw inferences on

unobservable quantities such as intelligence, musical
ability, patriotism, consumer attitudes, that cannot be
measured directly.
The goal of factor analysis is to describe correlations

between p measured traits in terms of variation in few
underlying and unobservable factors.
Factor Analysis
Factor analysis is a method of investigating whether a

number of variables of interest are linearly related to
small number of unavoidable factors. The observed
variables are modeled as linear combinations of the
factors, plus “error” terms.
Basic principle of Factor Analysis
1. They starts with the large number of variables.
2. Find minimum number of underlying factors

(principal components) that together account for the
pattern of inter-correlations among observed
variables.
3. Variables must be coded in similar ways

Purpose of Factor Analysis
1. Identify underlying Dimensions, or Factors: Factor analysis

strives to identify underlying dimensions or factors, that
explain the correlation among a set of variables.
2. Identify a New Smaller set of Uncorrelated Variables: One of the

prime objective of factor analysis is to identify a new,
smaller set of uncorrelated variables to replace the original
set of correlated variables in subsequent multivariate
analysis (regression or discriminant analysis)
3. Identify a smaller set of salient variables: The purpose of factor

analysis is to identify a smaller set of salient variables from
a larger set for use in subsequent multivariate analysis
Interpreting results of Factor Analysis
 Interpretation is facilitated by identifying the variables that have large

loadings on the same factor.
 That factor can then be interpreted in terms of the variables that load
high on it.
 Plot the variables using the factor loadings as co-ordinates.
 Variables that have high loadings describe the factors.
 If a factor cannot be clearly defined in terms of the original variables, it

should be labeled as an undefined or a general factor.
Methods of Factor Analysis
There are two most commonly employed procedure of factor

analysis are
●
Principal Component
Analysis (PCA)
●
Common Factor
Analysis (CFA)
Principal Component Analysis
When the objective is to summarise information from a

large set of variables into fewer factors, Principle Component
factor analysis is used.
It is a technique for forming set of new variables that are

linear combinations of the original set of variables, and are
uncorrelated. The new variables are called Principal
component.
These variables are fewer in number as compared to the

original variables, but they extract most of the information
provided by the original variables.
Common Factor Analysis
If the researcher wants to analyse the components of the
main factor, common factor analysis is used.
It is a statistical approach that is used to analyze

interrelationships among a large number of variables
(indicators) and to explain these variables (indicators) in terms
of few unobservable constructs (factors). In fact these factors
impact the variables, and are reflective indicators.
Helps in assessing
 the images of a company/enterprise,
 Attitudes of sales personnel and customers.
Factor Analysis - Example
Purpose: Customer feedback about a two wheeler
manufactured by a company.
Method: The Marketing Research Manager prepares a

questionnaire to study the customer feedback. The researcher
has identified six variables or factor for this purpose. They are
as follows.
1. Fuel efficiency (A)

2. Durability of life (B)
3. Comfort (C)
4. Spare parts availability (D)
5. Breakdown frequency (E)
6. Price (F)
The application of factor analysis has led to grouping the
variables as follows.
A,B,D,E into - factor – 1

F into - factor – 2
C into - factor – 3
Factor – 1 can be termed as Technical factor

Factor – 2 can be termed as Price factor
Factor – 3 can be termed as Personal factor
For future analysis, while conducting a study to obtain
customer’s opinion, three factors mentioned above would be
sufficient.
The basic purpose of using factor analysis is to reduce the

number of independent variables in the study. Too many
independent variables, the M.R. study will suffer from
following disadvantage.
1. Time for data collection is very high due to several

independent variables.
2. Expenditure increases due to the time factor.
3. Computation time is more, resulting in delay

Unit - 4: Data Preparation and Analysis

Uploaded by

Copyright:

Available Formats

Unit - 4: Data Preparation and Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit - 4: Data Preparation and Analysis

Uploaded by

Copyright:

Available Formats

Unit – 4

DATA PREPARATION AND ANALYSIS

The data collected from the respondents is generally not

Data preparation includes editing, coding, and data

The customary first step in analysis is to edit the raw

In large projects, field editing review is a responsibility of

It should take place when all forms or schedules have

This type of editing implies that all forms should get a

Editor(s) may correct the obvious errors such as an

 Do not destroy, erase, or make illegible the original

Original entries should remain legible.

Make all editing entries on an instrument in some

Initial all answers changed or supplied.

Place initials and date of editing on each instrument

Coding refers to the process of assigning numerals or

Numeric coding simplifies the researcher's task in

Coding involves assigning numbers or other

In coding, categories are the partitions of a data set of a

Both closed- and open-response questions must be coded.

Questions Answers Codes

Data having a common characteristic are placed in one class and in

Classification according to attributes: As stated above, data are classified

Tabulation is the process of summarizing raw data and

Tabulation is essential because of the following reasons.

1. Every table should have a clear, concise and adequate title so as to

2. Every table should be given a distinct number to facilitate easy

4. The units of measurement under each heading or sub-heading must

5. Explanatory footnotes, if any, concerning the table should be placed

7. Usually the columns are separated from one another by lines

8. The columns may be numbered to facilitate reference.

9. Decimal points and (+) or (-) signs should be in perfect

10. Abbreviations should be avoided to the extent possible

Keyboarding remains a mainstay for researchers who need to

However, researchers have profited from more efficient ways

Measurability: Quantitative data is measurable. For example size of

1. Data collected is numerical in nature.

3. Sample sixe used is very large.

Measurability: Not possible or difficult to measure.

1. It is a kind of exploratory research

There are four major techniques in qualitative research. They are

Focus Understand and Interpret Describe, explain and Predict

Sample Design Non – Probability, Purposive Probability

Interpretation It relies on interpretation and This analysis relies on

Qualitative researchers present Quantitative research use graphs

Questions Open – ended questions and Specific questions obtain

• A mechanism for grouping objects, frequently

• It would be possible to run factor analysis and

• Two fundamental types of clustering exist: (1)

After determining the contribution of each attribute to the

• Technique is widely used by consumer and industrial product

Bivariate analysis refers to simultaneous analysis of two variables. It

The Pearson (product moment) correlation coefficient varies over a range

If a = 0, the relation become y = bx.

Prof. R.A.Fisher was the first man who use the

ANOVA is essentially a procedure for testing the

There may be variation between samples and also

Through ANOVA one can investigate the number of

It examines the interaction between the different levels of these two