Summarizing Data & Statistics: Reminder For Final Exam
Summarizing Data & Statistics: Reminder For Final Exam
Summarizing Data & Statistics: Reminder For Final Exam
1
Multivariate Analysis Distributions
A Scatterplot (Multivariate)
Grade
age categories and income
Gender and classes attended
Hours Studying
Widowed 6% 10
0
Divorced
Widowed
Married
Single
2
Histogram
(used for interval / ratio data) Distributions
Stem and Leaf Chart
60
50
40 1= Strongly agree
2= Agree
30 3= Neither
4= Disagree
20 5= Strongly disagree
10
0
Response on Likert scale
Mean
most common method of describing central tendency Example #1 (Mean)
Median consider the following scores:
list all scores in numerical order, and then locate the 15,20,21,20,36,15, 25,15
score in the center of the sample
Mode
• if you have a “tie” for “most repeated score”, you will
have more than one mode
3
Example # 2 Median Example # 3 Mode
Median is a more accurate measure of consider the following set of scores:
15, 20, 21, 20, 36, 15, 25, 15
central tendency than the mean when there
are a few extreme scores that inflate the again we first line up the scores
average 15, 15, 15, 20, 20, 21, 25, 36
Standard Deviation
Standard Deviation
a value that shows the relation that individual scores
have to the mean of the sample
4
Normal Distribution Standard Deviation
95% assumptions may be made about the
percentage of scores as they deviate from
68% the mean
if scores are normally distributed, then one
can assume that:
approximately ____ of the scores in the sample
fall within one standard deviation of the mean
approximately ____ of the scores fall within two
-2 sd -1 sd 0 1 sd 2 sd
standard deviations of the mean
5
STANDARD SCORES….
The z-score
it is often useful to describe data points on different a z-score is a standard measure of the distance
measures, in terms of a common scale between a single point in the data (e.g. an
individual’s score), and the overall mean for
e.g. if you have two different measures of that variable
depression, and want to compare an
individual’s scores on the two measures the z-distribution:
ranges from negative infinity to positive infinity
the easiest way to compare scores on a common
has a mean of 0
scale is with the use of z-scores
has a standard deviation of 1
Percentiles Example
A standardized test (e.g., the GRE) has a population mean (µ) of 500
we can also express z-scores as percentiles and a standard deviation (σ) of 100
Your score on the test (X) is 700
refers to the proportion scoring less than a
How well did you do compared to the average person?
particular value Calculate a z-score (in standard deviation units)
Therefore, you scored 2 sd’s above the average, which puts you at
about the 97.5th percentile
Note: percentile refers to the proportion scoring less than a particular
value
Interpreting Statistics
When you see statistics reported in the results
section of an article, they tell you:
6
Interpreting Statistics Interpreting Statistics
Suppose that the results were reported as follows in a journal Suppose that the results were reported as follows in a journal
article article:
The number of classes attended by students was found to Participants who shot free throws in front of an audience (M =
correlate positively with their grades, r = .43, p < .05. 6.50) were more successful than participants who took shots
What does p < .05 mean? without an audience present (M = 3.50), t = 3.05, p < .01.
The probability that the correlation happened entirely by chance M refers to the sample group mean
is less than 5% What does p < .01 mean?
The prob. that the correlation is actually significant (not just a The probability that the difference between groups on the D.V.
byproduct of chance) is > 95% was caused by chance (rather than the manipulation of the I.V.)
95% certainty is usually the minimum standard accepted within is less than 1%
the psychological community for demonstrating an effect The prob. that the difference is real is > 99%
7
Basis for Statistical Inference Example
There is the perception that university students are more at risk for
Decision Rule developing alcohol-related problems than are members of the general
population. You decide to take a sample of 100 university students and
assess the number of alcoholic drinks per week that they consume
Could our findings be by chance alone?
You find a mean number of drinks/week of 4.2. You plan to compare
this to the number of drinks per week within the general population -
If the probability of our result occurring is less which is 3.
than .05 we reject the null hypothesis.
Do university students drink more than the general population?
ASSUMED
µ=3 µ=3
ASSUMED SAMPLE
ASSUMED REJECT Ho
µ=3 x=4 µ= 3 x =4
8
Decision Rule Directional Hypothesis
This means that you will conclude that there is a In a directional hypothesis, there is a specific result
significantly higher number of drinks consumed among that one wants to test…change must occur in the
university students if your sample mean is greater than 4. correct “direction” away from the mean.
ASSUMED REJECT Ho
-You expect that students who skip class would do worse!
µ=3 x=4
REJECT Ho REJECT Ho
−zcrit µx µx +z crit
REJECT Ho REJECT Ho
9
Directional and Non-Directional
Hypothesis Errors in rejecting the null
One-Tailed Two-Tailed Test
We reject the null hypothesis We reject the null hypothesis In an ideal world, the decision to reject the
only when the scores falls in only when the scores falls in
the extreme 5% the extreme 5% null (and conclude that the research
5% at one end only BUT the extreme is taken hypothesis is correct) would be 100%
Easier to get significance from each end (2.5%) accurate
BUT what about unexpected More conservative!
However, there are instances when we reject
results?
the null hypothesis when, in reality, the null is
true and should not have been rejected (false
positive)
This is called a Type 1 error (alpha)
10