Ch-5

Business Research
Methodology
Chapter Five: Analysis and Interpretation of

Data Using SPSS or Other Pertinent Software
Prepared by: Dr. Solomon
Getachew
Chapter Five: Outline
• Basic data analysis for quantitative research • Examining relationships using
• Analyzing quantitative data correlations and regressions
• Data preparation • Variable Relationships and
• Data Transformation Covariation
• Data analysis using Descriptive Statistics • Correlation analysis
• Identifying and Dealing with Outliers • Regression analysis
• Testing hypothesis in quantitative research • Multicollinearity and
• Cross Tabulation Using Chi Square (χ2) multiple Regressions
Analysis
• T-test
• Analysis of Variance (ANOVA)
Processing and Analysis of Data
• Data processing is a process of inspecting, cleaning, transforming, and
modelling data with the goal of discovering useful information, suggesting
conclusions, and supporting decision making.
• Processing implies editing, coding, classification and tabulation of collected

data to make agreeable to analysis.
• It is the collection and manipulation of items of data to produce meaningful

information.
• Data entry: Coded data can be entered into a spreadsheet, database, text
PROCESSING OPERATIONS
1. Editing: Data cleaning A process of examining the collected raw data to

detect errors and omissions.
Stages of editing:
Field editing: review of the reporting forms by the investigator: abbreviated

and/or in illegible form. It should be done just after the interview.
Central editing: take place after all forms or schedules have been completed
and returned.
PROCESSING OPERATIONS…cont.
2. Coding: is the process of converting data into numeric format or the process of assigning
numerals. Coding decisions should usually be taken at the designing stage of the
questionnaire. Example Male=1 and Female=2
3. Classification: Process of arranging data in groups or classes on the basis of common ch-ch.
(a) Classification according to attributes:
(b) Classification according to class-intervals:
4. Tabulation: Arranging mass of data in concise and logical order in columns and rows.
1. Saves space and facilitates the process of comparison for statistical computations
2. It facilitates the summation of items and the detection of errors and omissions.
Stage of Data Analysis
Identifying and Dealing with Outliers
 An outlier is an observation that lies an abnormal distance from other values in a

random sample from a population.
 A value that "lies outside" (is much smaller or larger than) most of the other values in
a set of data.
 For example in the scores 25,29,3,32,85,33,27,28 both 3 and 85 are "outliers".
Identifying and Dealing with Outliers
The two main types of outlier detection methods are:

 Using distance and density of data points for outlier detection.
 Building a model to predict data point distribution and highlighting outliers which
don't meet a user-defined threshold.
Data Analysis
Analysis is estimating the values of unknown parameters of the population and testing of
hypotheses for drawing inferences.
 Analysis refers to the computation of certain measures  searching for patterns of

r/ship or d/f testing hypotheses and subjected to statistical tests of significance.
 It includes coding and tabulation, hypothesis testing, Chi square test, t-test, F-test.
Types of Data Analysis

 Descriptive analysis is largely the study of distributions of one or more variable.
 Inferential analysis Causal analysis (regression analysis) is how one or more variables
affect changes in another variable.
Descriptive Analysis/Statistics
• Descriptive statistics is the discipline of quantitatively describing the main
features of a collection of data, or the quantitative description itself.
• Unlike inferential statistics, descriptive statistics aim to summarize a

sample, rather than use the data to learn about the population that the
sample of data is thought to represent.
• Examples – mean, median, standard deviation, etc.

Important statistical measures:
Measures of Central Tendency: (Mean, median and mode)statistical averages
a. Mean is a relatively stable measure of central tendency but affected by extreme items.
b. Median is the value of the middle item of an arranged series either in ascending or
descending.
It is a positional average and is not frequently used in sampling statistics.

c. Mode is the most commonly or frequently occurring value in a series or maximum
concentration.
It is a positional average and is not affected by the values of extreme items.

Important statistical measures:
 Measures of Dispersion: (range, mean deviation, standard deviation) measure the
scatter.
(a) Range is not stable, affected by fluctuations and used as rough measure of variability.
(b) Mean deviation is the average of difference of the values from some average of the
series.
(c) Standard deviation is the square-root of the average of squares of deviations.
Inferential/Inductive Statistics
• Inferential statistics is concerned with making predictions or inferences about
a population from observations and analyses of a sample.
• Inferential statistics uses the results of a statistics to generalize or forecast

the population from the sample was drawn.
• Examples: Regression analysis, correlation analysis, chi-square tests, F-test,

etc.
Correlation analysis is the joint variation of two or more variables.

Correlation
• Correlation: is a measure of the linear relationship between two variables which
may be used to measure the degree of the association between the two variables.
• Correlation does not imply causality
Types of correlation:
1. Pearson correlation(r)= relation b/n two variables. For Normal/scale. r is b/n -1 &
1. if r is more than 0.6, it can be said that there is strong correlation b/n two
variables.
2. Spearman Rho(rs)= for two variables. For Ordinal scales

Simple Regression
Linear regression provides the relationship between independent and dependent variables.
 The regression equation is the formula for the trend which enables us to predict the
dependent variable for any given value of the independent variable
 The intercept is the point on the vertical axis where the regression line crosses. It
generally does not provide useful information.
 Simple Linear Regression is expressed in the form of linear equation Y = a + bx, Where: X
and Y occurrence values ; a is the y-intercept; b is the slope.
 The coefficient of determination, R², which indicates the percentage of variance in the
dependent variable that is accounted by variability in the independent variable
Testing of hypothesis
• Hypothesis is a predictive statement, capable of being tested by

scientific methods, that relates an independent variable to some
dependent variable.
• Its main function is to suggest new experiments and observations.
• A conclusion is statistically valid depending on the accuracy of accepting
or rejecting your hypotheses.
Cross Tabulation Using Chi Square (χ2) Analysis
 Cross tabulation is a powerful technique that helps you to describe the relationships
between categorical (nominal or ordinal) variables.
 The Pearson chi-square test essentially tells us whether the results of a crosstab are
statistically significant. The two categorical variables independent (unrelated) of one
another.
Cross Tabulation Using Chi Square (χ2) Analysis
 So basically, the chi square test is a correlation test for categorical variables.
 Both correlations and chi-square tests can test for relationships between two variables.
However, a correlation is used when you have two quantitative variables and a chi-square
test of independence is used when you have two categorical variables.
 For a Chi-square test, p-value ≤ significance level indicates there is sufficient evidence to
conclude that the observed distribution is not the same as the expected distribution.
 It can be concluded that a relationship exists between the categorical variables.

P-Value and Hypothesis
What exactly is a p value?
 The p value, or probability value, tells you how likely it is that your data could have
occurred under the null hypothesis. It does this by calculating the likelihood of your test
statistic, which is the number calculated by a statistical test using your data.
 For most tests, the null hypothesis is that there is no relationship between your variables
of interest or that there is no difference among groups.
 For example, in a two-tailed t test, the null hypothesis is that the difference between two
groups is zero.
Examples
Question Null Hypothesis
Are teens better at math than adults? Age has no effect on mathematical ability.
Does taking aspirin every day reduce the Taking aspirin daily does not affect heart
chance of having a heart attack? attack risk.
Do teens use cell phones to access the Age has no effect on how cell phones are
internet more than adults? used for internet access.
Do cats care about the color of their food? Cats express no food preference based on
color.
Does chewing willow bark relieve pain? There is no difference in pain relief after
chewing willow bark versus taking a placebo.
T-test
 A t-test is a statistical test that is used to compare the means of two groups. It is often
used in hypothesis testing to determine whether a process or treatment actually has an
effect on the population of interest, or whether two groups are different from one
another.
 It is used in hypothesis testing, with a null hypothesis that the difference in group means is
zero and an alternate hypothesis that the difference in group means is different from zero.
 A t-test is appropriate to use when you've collected a small, random sample from some
statistical “population” and want to compare the mean from your sample to another
value. The value for comparison could be a fixed value (e.g., 10) or the mean of a second
sample.
T-test
 For example, if a teacher wants to compare the height of male students and female
students in class 5, she would use the independent two-sample test.
t-Test assumptions
 The data are continuous.
 The sample data have been randomly sampled from a population.
 There is homogeneity of variance (i.e., the variability of the data in each group is similar).
 The distribution is approximately normal.
For two-sample t-tests, we must have independent samples. If the samples are not
independent, then a paired t-test may be appropriate.
Z and T-test
 Z-test is the statistical hypothesis used to determine whether the two samples' means
calculated are different if the standard deviation is available and the sample is large.
 In contrast, the T-test determines how averages of different data sets differ in case the
standard deviation or the variance is unknown.
 A large t-score, or t-value, indicates that the groups are different while a small t-score
indicates that the groups are similar.
 A z-test is used if the population variance is known, or if the sample size is larger than 30,
for an unknown population variance.
 If the sample size is less than 30 and the population variance is unknown, we must use a
t-test.
ANOVA
 What is Analysis of Variance (ANOVA)? Analysis of Variance (ANOVA) is a statistical
formula used to compare variances across the means (or average) of different groups. A
range of scenarios use it to determine if there is any difference between the means of
different groups.
 ANOVA tells you if the dependent variable changes according to the level of the
independent variable. For example: Your independent variable is social media use, and
you assign groups to low, medium, and high levels of social media use to find out if there
is a difference in hours of sleep per night.
T test and ANOVA
 The Student's t test is used to compare the means between two groups, whereas
ANOVA is used to compare the means among three or more groups. In ANOVA, first gets
a common P value. A significant P value of the ANOVA test indicates for at least one pair,
between which the mean difference was statistically significant.
End ！
Thank you!

Ch-5

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Ch-5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch-5

Uploaded by

Copyright:

Available Formats

Business Research

Chapter Five: Analysis and Interpretation of

Prepared by: Dr. Solomon

• Processing implies editing, coding, classification and tabulation of collected

• It is the collection and manipulation of items of data to produce meaningful

1. Editing: Data cleaning A process of examining the collected raw data to

Field editing: review of the reporting forms by the investigator: abbreviated

(a) Classification according to attributes:

(b) Classification according to class-intervals:

 An outlier is an observation that lies an abnormal distance from other values in a

The two main types of outlier detection methods are:

 Analysis refers to the computation of certain measures  searching for patterns of

Types of Data Analysis

• Unlike inferential statistics, descriptive statistics aim to summarize a

• Examples – mean, median, standard deviation, etc.

Measures of Central Tendency: (Mean, median and mode)statistical averages

It is a positional average and is not frequently used in sampling statistics.

It is a positional average and is not affected by the values of extreme items.

• Inferential statistics uses the results of a statistics to generalize or forecast

• Examples: Regression analysis, correlation analysis, chi-square tests, F-test,

Correlation analysis is the joint variation of two or more variables.

• Correlation does not imply causality

2. Spearman Rho(rs)= for two variables. For Ordinal scales

• Hypothesis is a predictive statement, capable of being tested by

 It can be concluded that a relationship exists between the categorical variables.

You might also like