Summarizing Data & Statistics: Reminder For Final Exam

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

REMINDER FOR FINAL EXAM

Summarizing Data &


 BRING A CALCULATOR
Statistics
 PENCIL AND ERASER
December 1, 2010
Final Lecture
 2 HOURS
Next Week: Presentations (Split
into half)
 25% : 65 Questions
 Mark your booklet and scantron

So you have all your data – now


 Candidates are responsible for arriving at the examination room
what?
on time with adequate supplies (pens,
 pencils, erasers, calculators, current I.D. card) and may be
admitted five minutes before the beginning
 of the examination. Upon entering the examination room, e.g., 200 people respond to a survey
candidates will refrain from talking to or
 communicating with other candidates. Candidates will read any
on studying habits and grades.
posted instructions concerning seating
 and other arrangements within the examination room.
Candidates must place their I.D. card on the left
 corner of the desk.

Sample Data Set


Descriptive Statistics
Subject Hours Spent Number of Exam Grade
Number Studying Classes  descriptive statistics are used to present
Attended quantitative data in a manageable form
1 10 10 72  works by reducing data into a simpler summary
2 5 10 88
3 12 8 91 Examples:
4 5 6 74  Range of hours spent studying (4-20)
5 6 9 65  Average age in a set of research data
6 4 4 61  How many females/males in the study
7 7 9 55
8 7 7 68
9 20 2 78

1
Multivariate Analysis Distributions
 A Scatterplot (Multivariate)

 examination across cases of more than one


variable at a time

 can graph more than one variable at a time

Grade
 age categories and income
 Gender and classes attended

Hours Studying

Univariate Analysis Distributions (cont’d)


 examination of one variable at a time  may also be displayed using percentages
 frequency distribution: a display of the  examples:
number of times each score or unit of  % of people under the poverty level
observation occurs in a set of data

 two ways to display a univariate distribution

Distributions (cont’d) Distributions (cont’d)

A Frequency Distribution Table:


Bar graph (nominal data)
Marital Status 60
50
Category Percent
40
Single 22%
Married 52% 30

Divorced 20% 20 Percent

Widowed 6% 10
0
Divorced

Widowed
Married
Single

2
Histogram
(used for interval / ratio data) Distributions
Stem and Leaf Chart

60

50

40 1= Strongly agree
2= Agree
30 3= Neither
4= Disagree
20 5= Strongly disagree

10

0
Response on Likert scale

The Normal Distribution Measures of Central Tendency


 an estimate of the “centre” of a distribution
 three different types of estimates:
 Mean
 Median
 Mode

 if the distribution is normal (i.e., bell-shaped),


the mean, median and mode are all equal

Mean
 most common method of describing central tendency Example #1 (Mean)
Median  consider the following scores:
 list all scores in numerical order, and then locate the 15,20,21,20,36,15, 25,15
score in the center of the sample

Mode
• if you have a “tie” for “most repeated score”, you will
have more than one mode

3
Example # 2 Median Example # 3 Mode
 Median is a more accurate measure of  consider the following set of scores:
15, 20, 21, 20, 36, 15, 25, 15
central tendency than the mean when there
are a few extreme scores that inflate the  again we first line up the scores
average 15, 15, 15, 20, 20, 21, 25, 36

 E.g., 2, 4, 5, 5, 5, 7, 8, 8, 10, 100

Measures of Dispersion Range


 measures of how spread out the scores are  the range is used to identify the highest and
around the mean lowest scores
 two estimates:
 range  consider the set of scores:
 standard deviation 15,20,21,20,36,15, 25,15

 standard deviation is more accurate/detailed  the range would be 15-36


 21 points separates the highest and the lowest
score

Standard Deviation
Standard Deviation
 a value that shows the relation that individual scores
have to the mean of the sample

 A value that shows the relation that individual


 if scores are said to be standardized to a normal curve,
scores have to the mean of the sample then there are several statistical techniques that can be
 Tells you how tightly all the various data points used to analyze the data set
are clustered around the mean of your data
1 Standard Deviation from the mean

4
Normal Distribution Standard Deviation
95%  assumptions may be made about the
percentage of scores as they deviate from
68% the mean
 if scores are normally distributed, then one
can assume that:
 approximately ____ of the scores in the sample
fall within one standard deviation of the mean
 approximately ____ of the scores fall within two
-2 sd -1 sd 0 1 sd 2 sd
standard deviations of the mean

Types of Dispersion Types of Dispersion


 Positively skewed
 Negatively skewed

Types of Dispersion Types of Dispersion


 Outliers
 Why are these important?
 What should we do with them?

5
STANDARD SCORES….
The z-score
 it is often useful to describe data points on different  a z-score is a standard measure of the distance
measures, in terms of a common scale between a single point in the data (e.g. an
individual’s score), and the overall mean for
 e.g. if you have two different measures of that variable
depression, and want to compare an
individual’s scores on the two measures  the z-distribution:
 ranges from negative infinity to positive infinity
 the easiest way to compare scores on a common
 has a mean of 0
scale is with the use of z-scores
 has a standard deviation of 1

Percentiles Example
 A standardized test (e.g., the GRE) has a population mean (µ) of 500
 we can also express z-scores as percentiles and a standard deviation (σ) of 100
Your score on the test (X) is 700
 refers to the proportion scoring less than a

 How well did you do compared to the average person?
particular value  Calculate a z-score (in standard deviation units)

 e.g. 75% of the population scores below the


75th percentile Z=X-µ Z = 700-500 Z = 200 / 100
 percentiles are obtained from a z-table
σ 100 =2

 Therefore, you scored 2 sd’s above the average, which puts you at
about the 97.5th percentile
 Note: percentile refers to the proportion scoring less than a particular
value

Interpreting Statistics
 When you see statistics reported in the results
section of an article, they tell you:

 a) what type of statistical test was used to measure


differences between groups on the D.V. (or the
extent to which the scores of 2 groups correlate with
Statistical Significance each other), and

 b) whether or not the difference was statistically


Significant Results! – what does that significant
mean?

6
Interpreting Statistics Interpreting Statistics
 Suppose that the results were reported as follows in a journal  Suppose that the results were reported as follows in a journal
article article:
 The number of classes attended by students was found to  Participants who shot free throws in front of an audience (M =
correlate positively with their grades, r = .43, p < .05. 6.50) were more successful than participants who took shots
 What does p < .05 mean? without an audience present (M = 3.50), t = 3.05, p < .01.
 The probability that the correlation happened entirely by chance  M refers to the sample group mean
is less than 5%  What does p < .01 mean?
 The prob. that the correlation is actually significant (not just a  The probability that the difference between groups on the D.V.
byproduct of chance) is > 95% was caused by chance (rather than the manipulation of the I.V.)
 95% certainty is usually the minimum standard accepted within is less than 1%
the psychological community for demonstrating an effect  The prob. that the difference is real is > 99%

Interpreting Statistics Hypothesis Statements


 Suppose that the results were reported as follows in a journal  Null Hypothesis = nothing, zero, zilch
article:
 Participants who shot free throws in front of an audience (M =
6.50) were more successful than participants who took shots
Always involves the assumption that:
without an audience present (M = 3.50), t = 3.05, p < .01.  Nothing has happened; or
 M refers to the sample group mean  No relationship exists; or
 No change has occurred
 What does p < .01 mean?
 The probability that the difference between groups on the D.V.
was caused by chance (rather than the manipulation of the I.V.)
Example: There will be no grade difference in
is less than ________ students who spend more than or less than 10
 The prob. that the difference is real is >_________ hours studying

 We always test alternative hypotheses against


the null hypotheses!

Hypothesis Statements Basis for Statistical Inference


 Since you are always testing your research
 inferential statistics used to decide whether
hypothesis against the null hypothesis, your
proposed effect is demonstrated when the null two populations are in fact different
hypothesis is ‘rejected’

 If you ‘reject’ the null hypothesis, you may


conclude that your research hypothesis
(alternative hypothesis) is likely to be correct.

 How can you ‘reject’ the null hypothesis ?

7
Basis for Statistical Inference Example
There is the perception that university students are more at risk for
 Decision Rule developing alcohol-related problems than are members of the general
population. You decide to take a sample of 100 university students and
assess the number of alcoholic drinks per week that they consume
 Could our findings be by chance alone?
You find a mean number of drinks/week of 4.2. You plan to compare
this to the number of drinks per week within the general population -
 If the probability of our result occurring is less which is 3.
than .05 we reject the null hypothesis.
Do university students drink more than the general population?

 There is a real difference!

The Assumed Distribution Decision Rule


First, we begin with an assumption. This is contained We use the assumed distribution to make a decision rule
within the null hypothesis - it represents our best guess about the size of the sample mean that we need to find
about the population mean that we are looking at. In our before we will be willing to conclude that the mean is
example…the population mean is 3. different

ASSUMED

µ=3 µ=3

Decision Rule Decision Rule


You will reject your null hypothesis that the mean of university
Let’s assume that you identify a cutoff of 4. The chances students is equal to 3 drinks per week, if your sample mean is > 4.
of findings a mean of 4 in the normal population is less
than 5%

ASSUMED SAMPLE

ASSUMED REJECT Ho

µ=3 x=4 µ= 3 x =4

8
Decision Rule Directional Hypothesis
This means that you will conclude that there is a In a directional hypothesis, there is a specific result
significantly higher number of drinks consumed among that one wants to test…change must occur in the
university students if your sample mean is greater than 4. correct “direction” away from the mean.

-If the test score or group mean exceeds a critical


value, then we reject the null hypothesis

ASSUMED REJECT Ho
-You expect that students who skip class would do worse!

µ=3 x=4

Directional Hypothesis Directional Hypothesis


In this example, the question might be “have scores decreased In this example, the question might be “have scores increased
on this variable(exam score)?” Since we are interested in only on this variable?” Since we are interested in only one
one outcome (a score decrease), it is a directional hypothesis. outcome (a score increase), it is a directional hypothesis. This
This is sometimes called a lower-tail test. is sometimes called a upper-tail test.

REJECT Ho REJECT Ho

−zcrit µx µx +z crit

Non-directional Hypothesis Non-directional Hypothesis


In a non-directional hypothesis, one is comparing change Note that there are
that might occur in either tail of the distribution. This is now two critical
sometimes called a “two-tailed” test. values, and therefore
two rejection areas
for the null
hypothesis…

REJECT Ho REJECT Ho

−zcrit µx +z crit −zcrit µx +z crit

9
Directional and Non-Directional
Hypothesis Errors in rejecting the null
One-Tailed Two-Tailed Test
 We reject the null hypothesis  We reject the null hypothesis  In an ideal world, the decision to reject the
only when the scores falls in only when the scores falls in
the extreme 5% the extreme 5% null (and conclude that the research
 5% at one end only  BUT the extreme is taken hypothesis is correct) would be 100%
 Easier to get significance from each end (2.5%) accurate
BUT what about unexpected  More conservative!

 However, there are instances when we reject
results?
the null hypothesis when, in reality, the null is
true and should not have been rejected (false
positive)
 This is called a Type 1 error (alpha)

What statistical test will you use?


Errors in rejecting the null  T-Test
 Only two groups; compare the mean
 There are also cases in which the statistical  Do men and women differ in alcohol use?
test tells us not to reject the null when, in
reality, the null was false and should have  ANOVA
been rejected  Three or more groups; compare the mean
 Is there a difference in university year (1st, 2nd, 3rd, 4th)
 The research hypothesis was true, but we and alcohol use?
failed to detect it
 Correlation
 As age increases (19-65) alcohol use decreases
 This is called a Type 2 error (beta)

10

You might also like