Ch-5
Ch-5
Ch-5
Methodology
Getachew
Chapter Five: Outline
• Basic data analysis for quantitative research • Examining relationships using
• Analyzing quantitative data correlations and regressions
• Data preparation • Variable Relationships and
• Data Transformation Covariation
• Data analysis using Descriptive Statistics • Correlation analysis
• Identifying and Dealing with Outliers • Regression analysis
• Testing hypothesis in quantitative research • Multicollinearity and
• Cross Tabulation Using Chi Square (χ2) multiple Regressions
Analysis
• T-test
• Analysis of Variance (ANOVA)
Processing and Analysis of Data
• Data processing is a process of inspecting, cleaning, transforming, and
modelling data with the goal of discovering useful information, suggesting
conclusions, and supporting decision making.
• Data entry: Coded data can be entered into a spreadsheet, database, text
PROCESSING OPERATIONS
3. Classification: Process of arranging data in groups or classes on the basis of common ch-ch.
4. Tabulation: Arranging mass of data in concise and logical order in columns and rows.
1. Saves space and facilitates the process of comparison for statistical computations
2. It facilitates the summation of items and the detection of errors and omissions.
Stage of Data Analysis
Identifying and Dealing with Outliers
It includes coding and tabulation, hypothesis testing, Chi square test, t-test, F-test.
Inferential analysis Causal analysis (regression analysis) is how one or more variables
affect changes in another variable.
Descriptive Analysis/Statistics
• Descriptive statistics is the discipline of quantitatively describing the main
features of a collection of data, or the quantitative description itself.
a. Mean is a relatively stable measure of central tendency but affected by extreme items.
b. Median is the value of the middle item of an arranged series either in ascending or
descending.
Types of correlation:
1. Pearson correlation(r)= relation b/n two variables. For Normal/scale. r is b/n -1 &
1. if r is more than 0.6, it can be said that there is strong correlation b/n two
variables.
The Pearson chi-square test essentially tells us whether the results of a crosstab are
statistically significant. The two categorical variables independent (unrelated) of one
another.
Cross Tabulation Using Chi Square (χ2) Analysis
So basically, the chi square test is a correlation test for categorical variables.
Both correlations and chi-square tests can test for relationships between two variables.
However, a correlation is used when you have two quantitative variables and a chi-square
test of independence is used when you have two categorical variables.
For a Chi-square test, p-value ≤ significance level indicates there is sufficient evidence to
conclude that the observed distribution is not the same as the expected distribution.
The p value, or probability value, tells you how likely it is that your data could have
occurred under the null hypothesis. It does this by calculating the likelihood of your test
statistic, which is the number calculated by a statistical test using your data.
For most tests, the null hypothesis is that there is no relationship between your variables
of interest or that there is no difference among groups.
For example, in a two-tailed t test, the null hypothesis is that the difference between two
groups is zero.
Examples
Question Null Hypothesis
Are teens better at math than adults? Age has no effect on mathematical ability.
Does taking aspirin every day reduce the Taking aspirin daily does not affect heart
chance of having a heart attack? attack risk.
Do teens use cell phones to access the Age has no effect on how cell phones are
internet more than adults? used for internet access.
Do cats care about the color of their food? Cats express no food preference based on
color.
Does chewing willow bark relieve pain? There is no difference in pain relief after
chewing willow bark versus taking a placebo.
T-test
A t-test is a statistical test that is used to compare the means of two groups. It is often
used in hypothesis testing to determine whether a process or treatment actually has an
effect on the population of interest, or whether two groups are different from one
another.
It is used in hypothesis testing, with a null hypothesis that the difference in group means is
zero and an alternate hypothesis that the difference in group means is different from zero.
A t-test is appropriate to use when you've collected a small, random sample from some
statistical “population” and want to compare the mean from your sample to another
value. The value for comparison could be a fixed value (e.g., 10) or the mean of a second
sample.
T-test
For example, if a teacher wants to compare the height of male students and female
students in class 5, she would use the independent two-sample test.
t-Test assumptions
The data are continuous.
The sample data have been randomly sampled from a population.
There is homogeneity of variance (i.e., the variability of the data in each group is similar).
The distribution is approximately normal.
For two-sample t-tests, we must have independent samples. If the samples are not
independent, then a paired t-test may be appropriate.
Z and T-test
Z-test is the statistical hypothesis used to determine whether the two samples' means
calculated are different if the standard deviation is available and the sample is large.
In contrast, the T-test determines how averages of different data sets differ in case the
standard deviation or the variance is unknown.
A large t-score, or t-value, indicates that the groups are different while a small t-score
indicates that the groups are similar.
A z-test is used if the population variance is known, or if the sample size is larger than 30,
for an unknown population variance.
If the sample size is less than 30 and the population variance is unknown, we must use a
t-test.
ANOVA
What is Analysis of Variance (ANOVA)? Analysis of Variance (ANOVA) is a statistical
formula used to compare variances across the means (or average) of different groups. A
range of scenarios use it to determine if there is any difference between the means of
different groups.
ANOVA tells you if the dependent variable changes according to the level of the
independent variable. For example: Your independent variable is social media use, and
you assign groups to low, medium, and high levels of social media use to find out if there
is a difference in hours of sleep per night.
T test and ANOVA
The Student's t test is used to compare the means between two groups, whereas
ANOVA is used to compare the means among three or more groups. In ANOVA, first gets
a common P value. A significant P value of the ANOVA test indicates for at least one pair,
between which the mean difference was statistically significant.
End !
Thank you!