Psychological Statistics Midterm
Psychological Statistics Midterm
Psychological Statistics Midterm
Chapter 4
Descriptive Statistics
Ms. Cyrem F. Decena, RPm
Instructor 1
Chapter 4
Specific Objectives
At the end of the lesson, the students should be able to:
- Understand the principles of descriptive and inferential statistics.
- Apply the knowledge as to make summary about sample and the measures;
- Able to distinguish descriptive statistics from inferential statistics
- Apply knowledge to present quantitative description in a manageable form.
Duration
Example:
The following are set of scores (N=30) which was obtained from a 10 points Psych Statistics tests.
We will organize these scores by constructing a frequency distribution table.
Scores; 5 8 7 4 5 9 6 8 3 4 7
7 6 5 2 4 8 6 3 8 9 8
10 7 6 9 8 5 7 5
1. The highest score is X=10 and the lowest score is X=2. The first column X f
of the table lists the categories or scores that make up the scale of
measurement (X values) from 10 to 2. 10 1
2. Notice that all of the possible scores are listed in the table, and the 9 3
frequency associated with each scores is recorded in the second column. 8 6
3. As you can observe, there is only one student who got a perfect score and
most of the scores are 8 with 6 as its frequency, followed by 7 and 5 with 7 5
both 5 as its frequency. 6 4
4. The frequency can also be used to find the total number of scores. 5 5
Using SPSS: 4 3
1. Encode the scores in the first column or the first var and label it as 3 2
Scores. 2 1
2. Click Analyze, go to Descriptive Statistics then click Frequencies.
3. Click the Scores then click the arrow going right, then click OK.
SCORES
Frequency Percent Valid Percent Cumulative Percent
Valid 2 1 3.3 3.3 3.3
3 2 6.7 6.7 10.0
4 3 10.0 10.0 20.0
5 5 16.7 16.7 36.7
6 4 13.3 13.3 50.0
7 5 16.7 16.7 66.7
8 6 20.0 20.0 86.7
9 3 10.0 10.0 96.7
10 1 3.3 3.3 100.0
Total 30 100.0 100.0
∑𝑥 ∑𝑥
𝜇= 𝑥̅ =
𝑁 𝑛
Add all the scores in a selection and divide it to the total number of sample.
Example:
It is used when the values to be arranged have a corresponding weight or degree of importance.
Like in the computation of the Grade Point Average (GPA) of students, the corresponding weight or degree
of importance of the grades must considered. In a Likert Scale (5, 4, 3, 2, 1), the values elicited by the
respondents per item in a given observation must be considered to determine the weighted arithmetic mean.
The formula is:
𝑋1 𝑊1 + 𝑋2 𝑊2 +𝑋3 𝑊3 +....𝑋𝑛 𝑊𝑛 𝑋𝑖 𝑊𝑖
𝑥̅ = or 𝑥̅ =
𝑊1 +𝑊2 +𝑊3 +...𝑊𝑛 𝑊𝑖
Where: 𝑋𝑖 = frequency/observation
𝑊𝑖 = weight
𝑊𝑛 = total weight
Example:
1. Compute the GPA using the weighted arithmetic mean of the grades obtained by a
student.
Solution:
𝑋1 𝑊1 + 𝑋2 𝑊2 +𝑋3 𝑊3 +....𝑋𝑛 𝑊𝑛
𝑥̅ =
𝑊1 +𝑊2 +𝑊3 +...𝑊𝑛
1.75(3)+ 1.5(3)+1.25(3)+1.5(3)+1.25(4)+2.0(3)+1.0(3)+1.5(2)
𝑥̅ =
3+3+3+3+4+3+3+2
5.25+ 4.5+3.75+4.5+5+6+3+3
𝑥̅ =
24
35
𝑥̅ =
24
̅ = 1.46
𝒙
2. 100 students were asked to rate their attitude toward statistics. In five item questionnaire, the
5-point scale (Likert) was used. Compute the weighted arithmetic mean.
SCALES
Items Total WAM
5 4 3 2 1
1 20(100) 40(160) 20(60) 10(20) 10(10) 350/100 3.50
2 10(50) 40(160) 30(90) 20(40) 0(0) 340/100 3.40
3 5(25) 10(40) 30(90) 45(90) 10(10) 255/100 2.55
4 40(200) 25(100) 15(45) 10(20) 10(10) 375/100 3.75
5 30(150) 30(120) 20(60) 15(30) 5(5) 365/100 3.65
OVERALL MEAN 1685/500 3.37
A. If n is even
𝑋𝑛/2 + 𝑋(𝑛+2)/2
Md=
2
B. If n is odd
Md= 𝑋(𝑛+1)/2
Raw Score
2 Md= 𝑋(𝑛+1)/2
3
4 Md= 𝑋(7+1)/2
5 Md= 𝑋8/2
6
7 Md= 𝑋4
8 Md= 5
Students English 20/20 Arranged in
Order 10+11 𝑋𝑛/2 + 𝑋(𝑛+2)/2
Median = Md=
1. You 15 7 2 2
2. Joseph 18 8 𝑋8/2 + 𝑋(8+2)/2
3. Ryan 10 9 Md =10.50 Md=
2
4. Anna 8 10 𝑋4 + 𝑋10/2
Median Md=
5. Kiko 11 11
2
6. Teddy 9 14 𝑋4 + 𝑋5
7. Mela 14 15 Md=
8. Joni 7 18 2
10+ 11
Total Score 92 Md=
Mean 11.50 2
21
Median 10.50 Md=
2
Md= 𝟏𝟎. 𝟓
MODE
The mode is the most frequently occurring value. Count the number of times each scores
occurs and pick the score with most occurrences. When we use nominal scales and discrete
variables it is better to use mode. It is considered the least reliable measure of location.
When two frequencies are repeated equally, this is called bimodal. Polymodal is when three or
more modes are present.
How to Make Class Intervals in Statistics?
Groups of information in statistics in form of qualitative or quantitative attribute of set of
variables are referred as data. Data can be either grouped or ungrouped. Ungrouped data
is a rough data which have been just gathered and no further steps were performed on this
data.
If the data is organized in groups, which are called classes, the data is referred to as
grouped data. Each class has its own width, which is called the class interval. The correct
selection of the class interval is very important. The width of each class interval could be
equal or different depending on situation and on the way of how the data is grouped, but
the size of the interval is always a whole number.
Example:
A total of 40 respondents answered a personality test administered by the Psychology
students. The following are the scores of the respondents. Get the measures of central tendency of
the grouped data.
21 53 23 61 46 37 18 3 45 45
5 33 26 37 85 52 2 23 44 61
33 11 10 35 44 56 68 55 44 67
95 67 90 80 5 34 88 3 55 44
21 53 23 61 46 37 18 3 45 45
5 33 26 37 85 52 2 23 44 61
33 11 10 35 44 56 68 55 44 67
95 67 90 80 5 34 88 3 55 44
(20−13)16
Md= 33.5 +
11
(7)16
Md= 33.5 +
11
112
Md= 33.5 +
11
Md= 33.5 + 10.18
Md= 43.68 or 44
Solution:
𝑓𝑚 −𝑓𝑚−1
Mo= L + ( )𝑖
(𝑓𝑚 −𝑓𝑚−1 )+(𝑓𝑚 −𝑓𝑚+1 )
11−7
Mo= 33.5+ ((11−7)+(11−6)) 16
4
Mo= 33.5+ ( ) 16
4+5
4
Mo= 33.5+ ( ) 16
9
Mo= 33.5+ (0.44) 16
Mo= 33.5+ 7.04
Mo= 40.54 or 41
SHAPE OF DISTRIBUTION
In the proceeding chapters we will introduced to central tendencies and variability. Before
we discuss them let us take a look on the shape of distribution. In part of the relationships among
mean, median and mode are determined by the shape of the distribution.
SKEWNESS. This refers to the symmetrical and asymmetrical distribution of data. When data
are normally distributed, it is called symmetrical distribution of data. When data are distributed
mostly at the right side of the curve, the distribution is known as positively skewed. When the
distribution fluctuates to the left of the curve, the distribution is known as negatively skewed. The
illustration is shown below.
The skewness tells the relationship between the mean, median and the mode. The median is the
middle most value, the mode is the apex and the mean tends to be located towards the tail of the
distribution. This is because the mean represents all the values in any given distribution.
3(𝑥̅ −𝑀𝑑)
𝑆𝑘 = Where:
𝑆𝐷
Sk= Skewness x = mean
3 = Constant Md = Median
SD= Standard Deviation
Example:
𝑥̅ = 3.275
Md=2.90
SD= 1.073
3(𝑥̅ − 𝑀𝑑)
𝑆𝑘 =
𝑆𝐷
3(3.275 − 2.90)
𝑆𝑘 =
1.073
3(0.375)
𝑆𝑘 =
1.073
1.125
𝑆𝑘 =
1.073
𝑺𝒌 = 𝟏. 𝟎𝟓
Therefore, the coefficient of skewness is positively skewed because the Sk is positive.
When the Sk is negative, the distribution is called negatively skewed.
As stated by Reyes (1996), in a normal distribution, Sk=0. If the Sk has a negative value,
the distribution is skewed to the left. If the Sk has a positive value, the distribution is skewed to
the right. The greater value, the Sk departs from 0, the more skewed or asymmetrical is the
distribution. The nearer the distribution is to normal, the value of Sk comes closer to zero.
Note:
a. When ku= 3, the distribution is normal
b. When ku< 3, the distribution is platykurtic
c. When ku > 3, the distribution is leptokurtic
Σ(𝑋 − 𝑋̅)2
𝐾𝑢 =
𝑛𝑆 4
Example:
Given:
Σ(𝑋 − 𝑋̅ )2 = 15
n = 20
S = 7.95
Thus:
Σ(𝑋−𝑋̅)2 (15)2 225
𝐾𝑢 = = = = .004 means platykurtic
𝑛𝑆 4 20(7.95)4 54952.10
PERCENTILES
The most common definition of a percentile is a number where a certain percentage of
scores fall below that number. You might know that you scored 67 out of 90 on a test. But that
figure has no real meaning unless you know what percentile you fall into. If you know that your
score is in the 90th percentile, that means you scored better than 90% of people who took the test.
Percentile rank tells what percent of the cases got below the rank position. Percentile point
(Pn) is the score or value that corresponds to the given percentile rank.
Example: Find the percentile rank of the score 85% using the data below.
Scores 50 65 75 44 78 90 85 65 74 90
Order 1 2 3 4 5 6 7 8 9 10
Score 90 90 85 78 75 74 65 65 50 44
𝑏
PR = 𝑥 100
𝑛
7
PR = 𝑥 100
10
𝐏𝐑 = 𝟕𝟎 The score 85% is at 70th percentile rank. It means that you scored better than
70% of people who took the test.
Quartiles are values that divide your data into quarters. However, quartiles aren’t shaped like
pizza slices; Instead they divide your data into four segments according to where the numbers
fall on the number line. The four quarters that divide a data set into quartiles are:
Deciles are similar to quartiles. But while quartiles sort data into four quarters, deciles sort data
into ten equal parts:
Decile Rank 1 2 3 4 5 6 7 8 9 10
Percentile 10th 20th 30th 40th 50th 60th 70th 80th 90th 100th
Example: 31, 33, 18, 12, 5, 39, 25, 30, 31, 22, 16
Find P46, Q3 and D9
Another Example: P36
How to compute for P46
𝑃
Lp = (𝑛 + 1)
100
36
1. Put in ascending order; L36 = (11 + 1)
5, 12, 16, 18, 22, 25, 30, 31, 32, 33, 39 100
L36 = (12) 0.36
th
2. Use this formula in locating 46 Percentile L36 = 4.32
𝑃 This tells that 36th percentile is between
Lp = (𝑛 + 1) the 4th and 5th observation particularly
100
46 32% distance between them.
L46 = (11 + 1)
100
L46 = (12) 0.46
L46 = 5.52
Interpretation: This tells that 46th percentile is between the score
5th and 6th observation particularly
score
52% distance between them. 22 25
25-22
𝑃46 = 22 + 0.52 (3) = 22 + 1.56 = 23.56 𝑃36 = 18 + 0.32 (4)=18 + 1.28= 19.28
Interpretation: 36% of the scores are
Interpretation: 46% of the scores are less than 23.56 less than 19.28
39-33
RANGE
The range, is the simplest to compute by obtaining the difference between the largest and
the lowest values in the set of numerical data. It is considered a poor measure of variability or an
unstable form of measurement.
The range for ungrouped data is obtained by finding the difference between the highest and
the lowest value. For the grouped data, the range is determined by subtracting the lower boundary
of the lowest class interval from the upper boundary of the highest class of frequency distribution.
This so because the class boundaries are considered the true limits.
Example: Supposed you have 12 members in your group what would be the range of their score
considering the following values;
Scores:
9 8 7 6 4 7
9 7 5 6 8 5
The range is the distance between the highest score and the lowest score. Range=
largest value- smallest value. In our example, the largest score is 9 and the smallest score is 4.
The range is 9-4=5.
The mean absolute deviation is a method of obtaining the variation of all the values or scores from
the mean. Although there are limitations in obtaining the precise spread or variability, this is more reliable
than the range.
The MAD determines each individual score or value in a distribution that deviates from the mean
of a given distribution.
Formula:
∑|𝑥−𝑥̅ |
𝑀𝐴𝐷 = 𝑁
Where:
MAD=Mean absolute deviation
X= individual score or value
𝑥̅ = mean
∑|𝑥 − 𝑥̅ | = sum of the absolute deviations from the mean
N =total number of population/sample
Example: Compute the mean deviation of the following set of data:
10 15 20 25 30 15 12 10 28 17
Class
Interval f x fx (x-𝑥̅ ) |𝑥 − 𝑥̅ | f(|𝑥 − 𝑥̅ |)
(X)
∑ 𝑓𝑥 = 𝛴f(|𝑥 − 𝑥̅ |=
n= 40
2490 562.75
This formula is commonly used in research because in most cases we are dealing with estimated
sample values from the population. When determining the population variance, the formula is:
∑(𝑥 − 𝜇)2
𝜎2 =
𝑁
Where:
𝜎 2 = population variance
N = total population
X = values of observation
𝜇 = population mean
For example, calculate the sample variance for the given sample:
Scores (X) Deviation from the mean Squared deviation from the
(𝑥 − 𝑥̅ ) mean (𝑥 − 𝑥̅ )2
15 15-9.67= 5.33 28.40
13 13-9.67= 3.33 11.09
10 10-9.67= .33 0.11
7 7-9.67= -2.67 7.13
8 8-9.67= -1.67 2.79
5 5-9.67= -4.67 21.81
∑ 𝑥 = 58
x̅ = 58/6 ̅)𝟐 = 71.34
∑(𝒙 − 𝒙
= 9.67
Computation:
∑(𝑥 − 𝑥̅ )2
𝑆2 =
𝑛−1
2
71.34
𝑆 =
6−1
𝑺𝟐 = 𝟏𝟒. 𝟐𝟕
Using the population variance with the given sample above, it is calculated as,
∑(𝑥 − 𝜇)2
𝜎2 =
𝑁
2
71.34
𝜎 =
6
𝝈𝟐 = 𝟏𝟏. 𝟖𝟗
When the data are large and arranged in a frequency distribution (grouped data), are computed using the
formula:
∑ 𝑓 (|𝑥 − 𝑥̅ |)2
𝑆2 =
Σ𝑓 − 1
Where:
∑ 𝑓 = summation of frequency
|𝑥 − 𝑥̅ | = absolute deviation from the mean
Class
Interval f x fx (x-𝑥̅ ) |𝑥 − 𝑥̅ | (|𝑥 − 𝑥̅ |)2 f (|𝑥 − 𝑥̅ |)2
(X)
n= ∑ 𝑓𝑥 =
𝛴f(|𝑥 − 𝑥̅ |)2 =11264.31
40 2490
STANDARD DEVIATION
The standard deviation is the square root of the variance. This is a special form of average deviation
from the mean and it is an important measure of heterogeneity or homogeneity in a set of observations.
This commonly used in parametric statistics.
To interpret the standard deviation, the larger the value the greater dispersion, denoting
heterogeneous data. The lesser the value means that the scores are homogeneous.
∑(𝑥 − 𝜇)2
𝜎=√
𝑁
∑(𝑥 − 𝑥̅ )2
𝑆=√
𝑛−1
Example:
X 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
∑(𝑥 − 𝑥̅ )2
𝑆𝐷 = √
9 9-6= 3 9 𝑛−1
8 8-6= 2 4
7 7-6= 1 1 40
𝑆𝐷 = √
5 5-6= -1 1 5−1
1 1-6= -5 25 𝑆𝐷 = √10
Σ = 30 ̅)𝟐 = 40
∑(𝒙 − 𝒙 𝑺𝑫 = 𝟑. 𝟏𝟔
̅=𝟔
𝒙
When the data are arranged in frequency distribution (large data), the formulas are as follows:
1. Population standard deviation for grouped data. Using the coded method, the formula is:
∑ 𝑓 (𝑥 − 𝑥̅ )2
𝜎= √
𝑁
2. Sample standard deviation for grouped data.
∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑆=√
𝑛−1
Example:
Calculate the sample standard deviation for grouped data.
Class
Interval f x fx (x-𝑥̅ ) |𝑥 − 𝑥̅ | (|𝑥 − 𝑥̅ |)2 f(|𝑥 − 𝑥̅ |)2
(X)
n= ∑ 𝑓𝑚 =
𝛴f(|𝑥 − 𝑥̅ |)2 =11264.31
40 2490
Computation:
∑ 𝑓 (𝑥 − 𝑥̅ )2 ∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑆𝐷 = √ 𝑆𝐷 = √
𝑁 𝑛−1
11264.31 11264.31
𝑆𝐷 = √ 𝑆𝐷 = √
40 40 − 1
11264.31 11264.31
𝑆𝐷 = √ 𝑆𝐷 = √
40 39
𝑆𝐷 = √281.61 𝑆𝐷 = √288.83
𝑺𝑫 = 𝟏𝟔. 𝟕𝟖 𝒐𝒓 𝟏𝟕 𝑺𝑫 = 𝟏𝟔. 𝟗𝟗 𝒐𝒓 𝟏𝟕
NORMAL DISTRIBUTION
The normal curve that is bell-shaped, is widely known as normal distribution. Since many of the
frequency distribution are very close to the normal curve, let’s assume that they have normal distributions.
The normal curve is important not because scores are assumed to be normally distributed but because the
sampling distributions of various statistics are known assumed to be normal.
Generally, the bell-shaped graph is symmetric with respect to a vertical line drawn at the center
from the horizontal axis to the modal peak of the curve. The values of the mean, median and mode coincide
at the center of the distribution. This implies that the mean, median and mode are numerically equal. The
curve is asymptotic with respect to the horizontal axis; that is, the curve with two tails called asymptotes
never intersects the horizontal axis although they tend to approach and extend indefinitely in opposite
directions.
The standard deviation is the standard measure of variability which is measured along the
horizontal axis. The total area under the curve is 1 or 100%, which represents the total number of the
distribution, hence the right half of the vertical line or above the mean represents 0.5 or 50% of the cases,
and the other half to the left or below represents the other half which is also 50%.
The area of the normal curve may be sub-divided into three standard scores each to the left and
right of the vertical axis. Normally distributed populations usually do not exceed plus or minus three
standard deviations in as much as the area under the curve becomes negligible at 4 and 5 standard deviations
away from the mean.
Empirical Rule
The horizontal line under the normal curve is sub-divided into equal sub-intervals of at least three
units as mentioned earlier to the left and right of the vertical axis at the center of the curve. The empirical
rule is based on the equation introduced by Moivere. However, due to laborious effort when used directly,
statistical tables were made for easier computation. The rules illustrate the probabilities under the normal
curve as follows:
A real score or raw score can be transformed into a standard score called the z score. The z score
represents a normal distribution with the mean, and a standard deviation in case the population is used or
and a standard deviation s =1 in case sample is used. It indicates the deviation of the score from the mean
in each distribution. If a particular raw score is above the mean or to the right of mean, its equivalent z
score is positive; it is negative if it is below or to the left of the mean. The z score can be directly
transformed to percentile.
𝑥−𝜇 𝑥−𝑥̅
𝑧= , for population and 𝑧= for sample
𝜎 𝑠
Where:
𝑧 = standard score
𝜇 = population mean
𝜎 = population standard deviation
𝑥 = real score
𝑥̅ = sample mean
𝑠 = sample standard deviation
Examples:
The following are the final exam results of Glen’s performance in his three subject. On what subject
did he perform well? Worst?
Solution:
𝑥−𝑥̅ 84−81
English 101: 𝑧= = = 0.67
𝑠 4.5
𝑥−𝑥̅ 83−75
Math 101 𝑧= = = 1.33
𝑠 6
𝑥−𝑥̅ 90−92
P.E. 1: 𝑧= = = -0.31
𝑠 6.4
Interpretation:
The z scores indicate that Glen performed best in Math 101. He did not perform well in P.E. 1.
T SCORES
T score is a transformed score that always have a mean of 50 and a standard deviation of 10. This
is computed by multiplying the z-scores by 10 and then adding 50. A T-score in psychometric or
psychological testing is a specialized term that is not the same thing as t-score that you get from t-test. T-
scores in t-test can be positive or negative, T score in psychometric testing are always positive with mean
of 50.
A t-score is similar to a z score. It represents the number of standard deviation from the mean.
Many prefers t scores because the lack of negative numbers means they are easier to work and there is a
larger range so decimals are almost eliminated. The table shows z-scores and their equivalent t scores.
𝑇 = (𝑧 𝑥 10) + 50
Student Score 𝑥 − 𝑥̅ Z scores T-score
A 14 -1 -0.37 45.32
B 15 0 0.00 50.00
C 11 -4 -1.47 35.29
D 17 2 0.74 57.35
E 14 -1 -0.87 46.32
F 16 1 0.37 53.67
G 17 2 0.74 57.35
H 12 -3 -1.10 38.97
I 12 -3 -1.10 38.97
J 17 2 0.74 57.35
K 20 5 1.84 68.38
Σ𝑥 165
mean 15
sd 2.72
Reference:
Lambojon, Jr, Francisco, et.al. Psychological Statistics. Mindshapers C., Inc. 2017