Descriptive Statistics, Cross-Tabulation, and Hypothesis Testing

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 68

Descriptive Statistics,

Cross-Tabulation, and
Hypothesis Testing

115-1
Variable

• Any concept or construct that varies or changes


in value

• Main types of variables:


– Dependent variable
– Independent variable
– Moderating variable
– Mediating variable

15-2
(In)dependent Variables

• Dependent variable (DV)


–Is of primary interest to the researcher.
The goal of the research project is to
understand, predict or explain the
variability of this variable.

• Independent variable (IV)


–Influences the DV in either positive or
negative way. The variance in the DV is
accounted for by the IV.
15-3
Example

15-4
• Research studies indicate that successful new
product development has an influence on the stock
market price of the company. That is, the more
successful the new product turns out to be, the
higher will be the stock market price of that firm.

15-5
• Cross‐cultural research indicates that managerial
values govern the power distance between
superiors and subordinates.

15-6
Moderators

• Moderating variable
– Moderator is qualitative (e.g., gender, race, class)
or quantitative (e.g., level of reward) variable that
affects the direction and/or strength of relation
between independent and dependent variable.

• Example

15-7
Mediating Variable

• Mediating variable
– surfaces between the time the
independent variables start operating
to influence the dependent variable
and the time their impact is felt on it.

• Example

15-8
Causal relationhip
• Increase in X----- increase in Y
• Assumption: X is cause of Y

• Ice-cream & job satisfaction


• Work environment & job satisfaction

15-9
Hypothesis

• A proposition that is empirically testable. It is an


empirical statement concerned with the
relationship among variables.

• Good hypothesis:
– Must be adequate for its purpose
– Must be testable

15-10
Exercise

Give the hypotheses for the following framework:

Service Customer
quality switching

Switching
cost

15-11
Exercise

Give the hypotheses for the following framework:

Service Customer Customer


quality satisfaction switching

15-12
Argumentation

• The expected relationships / hypotheses are an


integration of:
– Exploratory research
– Common sense and logical reasoning

15-13
• A store manager observes that the morale of employees in her
supermarket is low. She thinks that if their working conditions
are improved, pay scales raised, and the vacation benefits
made more attractive, the morale will be boosted. She doubts,
however, if an increase in pay scales would raise the morale of
all employees. Her conjecture is that those who have
supplemental incomes will just not be “turned on” by higher
pay, and only those without side incomes will be happy with
increased pay, with a resultant boost in morale.
• List and label the variables in this situation.
• Develop few hypotheses

15-14
Levels of Measurement

Classification
Nominal

Classification
Ordinal
Order
Classification Distance
interval
Order
Classification Distance
Ratio
Order Natural Origin

15-15
Internet Usage Data
Respondent Sex Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking
1 1.00 7.00 14.00 7.00 6.00 1.00 1.00
2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00
5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 9.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00

15-16
Frequency Distribution of Familiarity
with the Internet

Valid Cumulative
Value label Value Frequency (N) Percentage percentage percentage

Not so familiar 1 0 0.0 0.0 0.0


2 2 6.7 6.9 6.9
3 6 20.0 20.7 27.6
4 6 20.0 20.7 48.3
5 3 10.0 10.3 58.6
6 8 26.7 27.6 86.2
Very familiar 7 4 13.3 13.8 100.0
Missing 9 1 3.3

TOTAL 30 100.0 100.0

15-17
Frequency Histogram

8
7
6
5
Frequency

4
3
2
1
0
2 3 4 5 6 7
Familiarity
15-18
Statistics Associated with Frequency
Distribution Measures of Location
• The mean, or average value, is the most commonly
used measure of central tendency. The mean, ,is
given by n X

X = S X i /n
i=1

Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)

• The mode is the value that occurs most frequently. It


represents the highest peak of the distribution. The
mode is a good measure of location when the
variable is inherently categorical or has otherwise
been grouped into categories.
15-19
Statistics Associated with Frequency
Distribution Measures of Location

• The median of a sample is the middle value when


the data are arranged in ascending or descending
order. If the number of data points is even, the
median is usually estimated as the midpoint
between the two middle values – by adding the
two middle values and dividing their sum by 2. The
median is the 50th percentile.

15-20
Statistics Associated with Frequency
Distribution Measures of Variability

• The range measures the spread of the data. It


is simply the difference between the largest
and smallest values in the sample.
Range = Xlargest – Xsmallest.
• The interquartile range is the difference
between the 75th and 25th percentile. For a
set of data points arranged in order of
magnitude, the pth percentile is the value that
has p% of the data points below it and (100 -
p)% above it.
15-21
Statistics Associated with Frequency Distribution
Measures of Variability
• The variance is the mean squared deviation from the
mean. The variance can never be negative.
• The standard deviation is the square root of the
variance.
n
(Xi - X)2
sx = S
i =1 n - 1
• The coefficient of variation is the ratio of the standard
deviation to the mean expressed as a percentage, and
is a unitless measure of relative variability.
CV = sx /X 15-22
Statistics Associated with Frequency
Distribution Measures of Shape
• Skewness. The tendency of the deviations from the
mean to be larger in one direction than in the other. It
can be thought of as the tendency for one tail of the
distribution to be heavier than the other.
• Kurtosis is a measure of the relative peakedness or
flatness of the curve defined by the frequency
distribution. The kurtosis of a normal distribution is zero.
If the kurtosis is positive, then the distribution is more
peaked than a normal distribution. A negative value
means that the distribution is flatter than a normal
distribution. 15-23
Skewness of a Distribution
Symmetric Distribution

Skewed Distribution

Mean
Median
Mode
(a)

Mean Median Mode (b)


15-24
Steps Involved in Hypothesis Testing
Formulate H0 and H1

Select Appropriate Test


Choose Level of Significance

Collect Data and Calculate Test Statistic

Determine Probability Determine Critical


Associated with Test Value of Test Statistic
Statistic TSCR
Determine if TSCR falls
Compare with Level of
into (Non) Rejection
Significance, 
Region
Reject or Do not Reject H0
15-25
Draw Conclusion
A General Procedure for Hypothesis Testing
Step 1: Formulate the Hypothesis
• A null hypothesis is a statement of the status
quo, one of no difference or no effect. If the
null hypothesis is not rejected, no changes will
be made.
• An alternative hypothesis is one in which
some difference or effect is expected.
Accepting the alternative hypothesis will lead
to changes in opinions or actions.
• The null hypothesis refers to a specified value
of the population parameter (e.g., m, s, p), not
a sample statistic (e.g., X ).
15-26
A General Procedure for Hypothesis Testing
Step 1: Formulate the Hypothesis
• A null hypothesis may be rejected, but it can
never be accepted based on a single test. In
classical hypothesis testing, there is no way to
determine whether the null hypothesis is true.
• In marketing research, the null hypothesis is
formulated in such a way that its rejection leads
to the acceptance of the desired conclusion.
The alternative hypothesis represents the
conclusion for which evidence is sought.
H0: p  0.40
H1: p > 0.40
15-27
A General Procedure for Hypothesis Testing
Step 1: Formulate the Hypothesis

• The test of the null hypothesis is a one-tailed


test, because the alternative hypothesis is
expressed directionally. If that is not the
case, then a two-tailed test would be
required, and the hypotheses would be
expressed as:
H 0: p = 0.4 0
H1: p  0.40
15-28
A General Procedure for Hypothesis Testing
Step 2: Select an Appropriate Test
• The test statistic measures how close the sample
has come to the null hypothesis.
• The test statistic often follows a well-known
distribution, such as the normal, t, or chi-square
distribution.
• In our example, the z statistic, which follows the
standard normal distribution, would be
appropriate. p
p-
z= s
p
where
p (1 - p)
sp =
n 15-29
A General Procedure for Hypothesis Testing
Step 3: Choose a Level of Significance
Type I Error
• Type I error occurs when the sample results lead to
the rejection of the null hypothesis when it is in fact
true.
• The probability of type I error ( ) is also called the
level of significance.

Type II Error
• Type II error occurs when, based on the sample
results, the null hypothesis is not rejected when it is in
fact false.
• The probability of type II error is denoted by b .
• Unlike  , which is specified by the researcher, the
magnitude of b depends on the actual value of
the population parameter (proportion).
15-30
Probability of z with a One-Tailed Test

Shaded Area
= 0.9699

Unshaded Area
= 0.0301

0 z = 1.88
15-31
A General Procedure for Hypothesis Testing
Step 4: Collect Data and Calculate Test Statistic
• The required data are collected and the
value of the test statistic computed.
• In our example, the value of the sample
proportion is
p= 17/30 = 0.567.
• The value of sp can be determined as
follows:
sp = p(1 - p)
n
=
(0.40)(0.6)
30
= 0.089 15-32
A General Procedure for Hypothesis Testing
Step 4: Collect Data and Calculate Test Statistic

The test statistic z can be calculated as follows:

pˆ - p
z =
s p

= 0.567-0.40
0.089

= 1.88

15-33
A General Procedure for Hypothesis Testing
Step 5: Determine the Probability
(Critical Value )
• Using standard normal tables, the probability of obtaining a z value
of 1.88 can be calculated
• The shaded area between -  and 1.88 is 0.9699. Therefore, the
area to the right of z = 1.88 is 1.0000 - 0.9699 = 0.0301.
• Alternatively, the critical value of z, which will give an area to the
right side of the critical value of 0.05, is between 1.64 and 1.65 and
equals 1.645.
• Note, in determining the critical value of the test statistic, the area
to the right of the critical value is either  or /2 .

15-34
A General Procedure for Hypothesis Testing
Steps 6 & 7: Compare the Probability
(Critical Value) and Making the Decision
• If the probability associated with the calculated or
observed value of the test statistic (TS CAL) is less than
the level of significance (), the null hypothesis is
rejected.
• The probability associated with the calculated or
observed value of the test statistic is 0.0301. This is
the probability of getting a p value of 0.567 when  =
0.40. This is less than the level of significance of 0.05.
Hence, the null hypothesis is rejected.
• Alternatively, if the calculated value of the test
statistic is greater than the critical value of the test
statistic ( T ), the null hypothesis is rejected.
SCR 15-35
A General Procedure for Hypothesis Testing
Steps 6 & 7: Compare the Probability (Critical
Value) and Making the Decision

• The calculated value of the test statistic z = 1.88 lies


in the rejection region, beyond the value of 1.645.
Again, the same conclusion to reject the null
hypothesis is reached.

• Note that the two ways of testing the null


hypothesis are equivalent but mathematically
opposite in the direction of comparison.

• If the probability ofTS CAL< significance level ( )


then reject H0 but if TS CAL> TS CRthen reject H0.
15-36
A General Procedure for Hypothesis Testing
Step 8: Conclusion

• The conclusion reached by hypothesis testing must


be expressed in terms of the problem.

• In our example, we conclude that there is


evidence that the proportion of Internet users who
shop via the Internet is significantly greater than
0.40. Hence, the recommendation to the
department store would be to introduce the new
Internet shopping service.

15-37
A Broad Classification of
Hypothesis Tests
Hypothesis Tests

Tests of Tests of
Association Differences

Proportions Median/
Distributions Means
Rankings

15-38
Cross-Tabulation

• While a frequency distribution describes one


variable at a time, a cross-tabulation describes two
or more variables simultaneously.

• Cross-tabulation results in tables that reflect the joint


distribution of two or more variables with a limited
number of categories or distinct values, e.g., Table
15.3.

15-39
Gender and Internet Usage
Gender
Row
Internet Usage Male Female Total

Light (1) 5 10 15

Heavy (2) 10 5 15

Column Total 15 15

15-40
Internet Usage by Gender

Gender

Internet Usage Male Female

Light 33.3% 66.7%

Heavy 66.7% 33.3%

Column total 100% 100%

15-41
Gender by Internet Usage
Internet Usage

Gender Light Heavy Total

Male 33.3% 66.7% 100.0%

Female 66.7% 33.3% 100.0%

15-42
Introduction of a Third Variable in
Cross-Tabulation
Original Two Variables

Some Association No Association


between the Two between the Two
Variables Variables

Introduce a Third Introduce a Third


Variable Variable

Refined Association No Association No Change in Some Association


between the Two between the Two the Initial between the Two
Variables Variables Pattern Variables
Purchase of Fashion Clothing by
Marital Status

Purchase of Current Marital Status


Fashion
Clothing Married Unmarried
High 31% 52%
Low 69% 48%
Column 100% 100%
Number of 700 300
respondents

15-44
Purchase of Sex
Fashion Male Female
Clothing
Married Not Married Not
Married Married
High 35% 40% 25% 60%

Low 65% 60% 75% 40%

Column 100% 100% 100% 100%


totals
Number of 400 120 300 180
cases

15-45
Ownership of Expensive Automobiles by Education Level

Own Expensive Education


Automobile
College Degree No College Degree

Yes 32% 21%

No 68% 79%

Column totals 100% 100%

Number of cases 250 750

15-46
Income
Own Low Income High Income
Expensive
Automobile
College No College No College
Degree College Degree Degree
Degree

Yes 20% 20% 40% 40%


No 80% 80% 60% 60%
Column totals 100% 100% 100% 100%
Number of 100 700 150 50
respondents

15-47
Three Variables Cross-Tabulation
Reveal Suppressed Association

15-48
Desire to Travel Abroad by Age

Desire to Travel Abroad Age

Less than 45 45 or More

Yes 50% 50%

No 50% 50%

Column totals 100% 100%

Number of respondents 500 500

15-49
Desire to Travel Abroad by
Age and Gender
Desir e to Sex
Tr avel Male Female
Abr oad Age Age
< 45 >=45 <45 >=45

Yes 60% 40% 35% 65%

No 40% 60% 65% 35%

Column 100% 100% 100% 100%


totals
Number of 300 300 200 200
Cases
15-50
Three Variables Cross-Tabulations
No Change in Initial Relationship

15-51
Eating Frequently in
Fast-Food Restaurants by Family Size
Eat Frequently in Fast- Family Size
Food Restaurants
Small Large

Yes 65% 65%

No 35% 35%

Column totals 100% 100%

Number of cases 500 500

15-52
Eating Frequently in Fast Food-Restaurants
by Family Size and Income
Income
Eat Frequently in Fast- Low High
Food Restaurants
Family size Family size
Small Large Small Large
Yes 65% 65% 65% 65%
No 35% 35% 35% 35%
Column totals 100% 100% 100% 100%
Number of respondents 250 250 250 250

15-53
Statistics Associated with
Cross-Tabulation Chi-Square
• To determine whether a systematic association exists, the
probability of obtaining a value of chi-square as large or larger
than the one calculated from the cross-tabulation is estimated.

• An important characteristic of the chi-square statistic is the


number of degrees of freedom (df) associated with it. That is, df =
(r - 1) x (c -1).

• The null hypothesis (H0) of no association between the two


variables will be rejected only when the calculated value of the
test statistic is greater than the critical value of the chi-square
distribution with the appropriate degrees of freedom

15-54
Chi-square Distribution

Do Not Reject
H0

Reject H0

2
Critical
Value
15-55
Statistics Associated with
Cross-Tabulation Chi-Square

• The chi-square statistic ( ) 2is used to test


the statistical significance of the observed
association in a cross-tabulation.
• The expected frequency for each cell can
be calculated by using a simple formula:

nrnc
fe = n

where nr = total number in the row


nc = total number in the column
n = total sample size
15-56
Statistics Associated with
Cross-Tabulation Chi-Square
For the data in Table 15.3, the expected
frequencies for the cells going from left to
right and from top to bottom, are:
15 X 15 = 7.50 15 X 15 = 7.50
30 30

15 X 15 15 X 15
= 7.50 = 7.50
30 30
2
Then the value of is calculated as follows:

2 = S (fo - fe)2
fe
all
15-57
cells
Statistics Associated with
Cross-Tabulation Chi-Square
For the data internet, the value of 2 is
calculated as:

= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2


7.5 7.5 7.5 7.5

=0.833 + 0.833 + 0.833+ 0.833

= 3.333
15-58
Statistics Associated with
Cross-Tabulation Chi-Square

• The chi-square distribution is a skewed distribution whose shape


depends solely on the number of degrees of freedom. As the
number of degrees of freedom increases, the chi-square
distribution becomes more symmetrical.
• For the cross-tabulation given, there are (2-1) x (2-1) = 1 degree of
freedom. The calculated chi-square statistic had a value of 3.333.
Since this is less than the critical value of 3.841, the null hypothesis
of no association can not be rejected indicating that the
association is not statistically significant at the 0.05 level.

15-59
Hypothesis Testing Related to Differences
• Parametric tests assume that the variables of interest are measured
on at least an interval scale.
• Nonparametric tests assume that the variables are measured on a
nominal or ordinal scale.
• These tests can be further classified based on whether one or two
or more samples are involved.
• The samples are independent if they are drawn randomly from
different populations. For the purpose of analysis, data pertaining
to different groups of respondents, e.g., males and females, are
generally treated as independent samples.
• The samples are paired when the data for the two samples relate
to the same group of respondents.

15-60
A Classification of Hypothesis Testing
Procedures for Examining Differences
Hypothesis Tests

Parametric Tests Non-parametric Tests


(Metric Tests) (Nonmetric Tests)

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Chi-Square *
* Z test K-S
* Runs
* Binomial
Independent Paired
Samples Samples Independent Paired
Samples Samples
* Two-Group t * Paired
test * Chi-Square * Sign
t test * Mann-Whitney * Wilcoxon
* Z test
* Median * McNemar
* K-S * Chi-Square 15-61
Parametric Tests
• The t statistic assumes that the variable is normally distributed
and the mean is known (or assumed to be known) and the
population variance is estimated from the sample.
• Assume that the random variable X is normally distributed, with
mean and unknown population variance that is estimated by
the sample variance s 2.
• Then, t = ( X - m )/ s X is t distributed with n - 1 degrees of
freedom.
• The t distribution is similar to the normal distribution in
appearance. Both distributions are bell-shaped and
symmetric. As the number of degrees of freedom increases,
the t distribution approaches the normal distribution.

15-62
One Sample : t Test
For the data Internet , suppose we wanted to test
the hypothesis that the mean familiarity rating exceeds
4.0, the neutral value on a 7 point scale. A significance
level of  = 0.05 is selected. The hypotheses may be
formulated as:
H0: m <4.0
H1: m >4.0
t = (X - m)/sX
sX = s/ n
sX = 1.579/ 29

= 1.579/5.385 = 0.293
t = (4.724-4.0)/0.293 = 0.724/0.293 = 2.471 15-63
One Sample : t Test

The degrees of freedom for the t statistic to test the


hypothesis about one mean are n - 1. In this case,
n - 1 = 29 - 1 or 28. From Table 4 in the Statistical Appendix,
the probability of getting a more extreme value than 2.471
is less than 0.05 (Alternatively, the critical t value for 28
degrees of freedom and a significance level of 0.05 is
1.7011, which is less than the calculated value). Hence, the
null hypothesis is rejected. The familiarity level does exceed
4.0.

15-64
Two Independent Samples Means
• In the case of means for two independent samples, the
hypotheses take the following form.
Ho: µ1 = µ2
H1: : µ1 ≠ µ2

15-65
Paired Samples
The difference in these cases is examined by a paired
samples t test. To compute t for paired samples, the
paired difference variable, denoted by D, is formed
and its mean and variance calculated. Then the t
statistic is computed. The degrees of freedom are n -
1, where n is the number of pairs. The relevant
formulas are:
H0 : m D = 0
H1: m D  0

D - mD
tn-1 = sD
n
continued… 15-66
Paired Samples
Where:
n
S Di
D = i=1n
n
S=1 (Di - D)2
sD = i
n-1

S
SD = n
D

In the Internet usage, a paired t test could be


used to determine if the respondents differed in
their attitude toward the Internet and attitude
toward technology.
15-67
Paired-Samples t Test
Number Standard Standard
Variable of Cases Mean Deviation Error

Internet Attitude 30 5.167 1.234 0.225


Technology Attitude 30 4.100 1.398 0.255

Difference = Internet- - Technology

Difference Standard Standard 2-tail t Degrees of 2-tail


Mean deviation error Correlation prob. value freedom probability

1.067 0.828 0.1511 0.809 0.000 7.059 29 0.000

15-68

You might also like