1.1 Introduction To Data Analysis
1.1 Introduction To Data Analysis
1.1 Introduction To Data Analysis
Introduction
In the recent past, quite a bit of importance has been given to data analysis in
research. One of the possible reasons is that empirical evidence establishes a firm grounding
to either accept or reject the proposed hypotheses. The choice of the statistical technique
depends on the nature of the research problem or question and also on the nature of the data
set.
The research questions to solve a research gap or problem may be related to
identifying the degree of relationships among variables, checking for the significance of
group differences, predicting of group memberships or structure, or it could be time-related.
In order to identify associations between two or more variables, depending on
whether their nature of being parametric or non-parametric, correlation, and regression or chi-
square techniques may be adopted. This can be done as a Bi-variate correlation and
regression, multiple correlation and regression, Canonical correlation, Multiple Discriminant
Analysis, and Log-it regression. The bi-variate correlation is a good starting point to identify
the degree of relationship between two continuous variables, such as job and family
satisfaction where either of them can be treated as a DV and IV as the research question may
be. But bi-variate regression would require one of them to be defined as the DV and the other
as the IV. Although these are not multivariate techniques, they form the basis of the
Multivariate Analysis (MVA).
From Figure 1, it is seen that family-work conflict (FWC) will have a negative effect
on job satisfaction (JS) while family-work facilitation (FWF) will have a positive effect on
job satisfaction. Similarly, work-family facilitation (WFF) will have a positive effect on
family satisfaction (FS) while work-family conflict (WFC) will have a negative effect on
family satisfaction. Both job and family satisfaction will influence feelings of work-family
balance positively which in turn will positively influence life satisfaction (LS). All the above
statements have been hypothesized and can be stated conclusively if we have empirical data
to establish the stated hypotheses. The use of appropriate statistical methods will facilitate the
data analysis to arrive at well-grounded inferences and conclusions.
Univariate statistical tests involve one dependent variable. Examples include, but are
not limited to, t-tests of means, analysis of variance (ANOVA), analysis of covariance and
simple linear regression (with one dependent and one independent variable). Having said so
much about the importance of data analysis, let us have a quick look at a few multivariate
techniques that we are likely to study in detail during the course of this study.
The next section leads us to the classification of MVA.
Page 2|8
2. Classification of MVA
MVA can be classified as Dependence techniques and Interdependence techniques.
2.1 Dependence techniques (used when there are one or more dependent variables and
independent variables. Eg. Multiple regression analysis)
Let us presume that some previous research has established that cars with higher
engine capacity and higher unladen weight offer lesser fuel efficiency (possibly validated
using a correlation analysis). If a researcher wants to predict the fuel efficiency based on
engine capacity and unladen weight, then fuel efficiency is treated as the dependent variable
while engine capacity and unladen weight are treated as the independent variables. The
researcher collects data on fuel efficiency, engine capacity and an unladen weight of about
100 cars or more (that run on the same type of fuel) and would possibly use the multiple
regression (MR) method to predict fuel efficiency. In order to use the MR method the
dependent and the independent variables (two or more) must be metric data.
If the dependent variable is dichotomous (Yes/No, Men / Women) type, then MDA is
an appropriate technique. The independent variables need to be metric data. MDA helps to
understand group differences and to predict the possibility that an observation or object
would belong to a specific group. An example that we had discussed in MR in the previous
section, suppose we had data on the engine capacity and unladen weight of about 100 plus
cars (that run on the same type of fuel) and if we want to classify them as Big and Small cars,
then MDA would be a relevant technique.
In the car example that we have been discussing so far, suppose we have the data on
engine capacities of about 130 cars with the engine capacities ranging from a minimum of
799cc to 2399cc and we want these 130 cars to be placed in three groups, namely, small,
medium and large cars, cluster analysis would be a recommended technique. The Cluster
analysis algorithm places the objects in homogeneous groups depending on the characteristics
specified by the researcher. In our example, the cars would be placed in groups based on
engine capacity. Clustering can be done based on multiple characteristics too. Either
hierarchical or non-hierarchical clustering procedures may be adopted. Basically hierarchical
methods could be either agglomerative or divisive. The algorithms followed in the
hierarchical methods are single, complete and average linkage methods. The other methods
are the Centroid and Ward methods. Alternatively the non-hierarchical clustering popularly
follows the k-means algorithm and places objects in cluster groups once the number of
clusters is specified. The decision on whether to adopt the hierarchical or non-hierarchical
procedure depends on the choice of the researcher and the problem defined.
Page 5|8
2.2.3 Perceptual Mapping
If we consider two dimensions of the car, namely, fuel efficiency and driving comfort
and we want to know how the brands of cars currently available in the market are positioned
in the minds of the car enthusiasts and perceived by the car enthusiasts, the right technique is
Perceptual Mapping (PM) also known as Multi-dimensional Scaling (MDS). MDS
typically helps a researcher to determine the perceived relative image of the cars (in this case)
considering the two dimensions. In MDS, unlike in factor or cluster analysis, a solution can
be obtained for each respondent and there is no variate. The researcher makes choices
between similarity and preference data, disaggregate and aggregate analysis and on whether
to use the Compositional or decompositional methods. Although earlier MDS programs were
predominantly non-metric in output, the contemporary programs provide metric output.
If we have non-metric data such as colors of the cars, classification of car size such as
small, medium and large and we want to position the cars in a perceptual map, then the
technique to be adopted is the Correspondence Analysis (CA). It starts with a cross-tabulation
of the two attributes, namely, colors and car size; after that it carries out a non-metric to
metric conversion, and then leads to dimension reduction and finally the perceptual map is
prepared. CA is the best option for a multivariate representation of interdependence for non-
metric data.
3. Nature of Data
Page 6|8
While performing MVA on the research problem, it would help if the researcher observes the
following tips:
1. Ensure that both statistical and practical significance exists in the research being done.
2. The sample size should be adequate but neither under sized nor over sized.
3. Clearly, understand the nature of the data.
4. Use a minimum number of variables in the model to obtain the desired results.
5. Identify and eliminate errors.
6. Ensure a fool-proof validation of the results.
I hope the above content gives you a fair idea of the existing multivariate techniques
that we would be covering in our course and a snapshot of their applications. For further
learning, may I also suggest the open courseware by Cynthia et al., (2011), titled “Statistical
Thinking and Data Analysis”.
Although at the beginning of this discussion, I had suggested the reading of the paper
by Pattusamy and Jacob (2015), throughout the discussion I used examples relating to cars. If
you have understood the application of the discussed MVA tests with the variables in the car
example, you should be able to answer a few fundamental questions relating to data analysis
with respect to the variables in the paper. Here are your challenges.
Self-Assessment:
You could suggest appropriate statistical tests to answer the following research
questions. It does help if you could also justify your choice of the technique.
1. Are men more satisfied with their jobs than women?
2. Does life satisfaction vary with age?
3. Will feelings of work-life balance influence the relationship between job
satisfaction and life satisfaction?
4. Would there be a difference in the strength of the relationship between family
satisfaction and life satisfaction between men and women?
5. Would it be possible to categorize men who are highly and moderately satisfied in
their lives?
Page 7|8
References
1. Barbara G.T and Linda S.F, Using Multivariate Statistics, 6th Edition, Pearson Education
Inc, pp. 612-680.
2. Cynthia Rudin, Allison Chang, and Dimitrios Bisias. 15.075J Statistical Thinking and
Data Analysis. Fall 2011. Massachusetts Institute of Technology: MIT
OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA.
3. Hair J.F, Black W.C, Babin B.J and Anderson R.E, Multivariate Data Analysis, 7th
Edition, Pearson Education (South Asia), pp. 89-149.
4. Murugan Pattusamy and Jayanth Jacob, A test of Greenhaus and Allen (2011) model on
Work Family Balance, Current Psychology, Springer, 2015.
5. Zumbo B.D. (2014) Univariate Tests. In: Michalos A.C. (eds) Encyclopedia of Quality
of Life and Well-Being Research. Springer, Dordrecht
***************************************************************************
Page 8|8