Psychological Statistics Midterm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

This is a property of

PRESIDENT RAMON MAGSAYSAY STATE UNIVERSITY


NOT FOR SALE
Psychological Statistics

Chapter 4

Descriptive Statistics
Ms. Cyrem F. Decena, RPm
Instructor 1
Chapter 4

Descriptive and Inferential Statistics


Introduction
This chapter provides an overview of some of the statistical approaches researchers take to
understanding the results that are obtained in research.

Specific Objectives
At the end of the lesson, the students should be able to:
- Understand the principles of descriptive and inferential statistics.
- Apply the knowledge as to make summary about sample and the measures;
- Able to distinguish descriptive statistics from inferential statistics
- Apply knowledge to present quantitative description in a manageable form.

Duration

Chapter 4: Descriptive Statistics


• Frequency Distributions
 Central Tendency
 Mean
 Median
 Mode = 5 hours
• Percentiles (3 hours discussion; 2 hours
Inferential Statistics assessment)
• Measures of Variations
• Range
• Variance
• Standard Deviation
• Standard Scores
Lesson Proper
FREQUENCY DISTRIBUTIONS
A frequency distribution is an organized tabulation of the number of individuals located in different
categories in different level of measurements. This is used to group scores together in order which would
allow the researcher in a glance the set of scores. This could be presented either a table or a graph with
records of frequency or number of individuals in different categories.
The simplest frequency distribution table presents the measurement scale by listing each
measurement categories in a column (usually indicated by X) from highest to lowest value. Then
frequencies beside each category are then placed usually indicated by f.

Example:

The following are set of scores (N=30) which was obtained from a 10 points Psych Statistics tests.
We will organize these scores by constructing a frequency distribution table.

Scores; 5 8 7 4 5 9 6 8 3 4 7
7 6 5 2 4 8 6 3 8 9 8
10 7 6 9 8 5 7 5

1. The highest score is X=10 and the lowest score is X=2. The first column X f
of the table lists the categories or scores that make up the scale of
measurement (X values) from 10 to 2. 10 1
2. Notice that all of the possible scores are listed in the table, and the 9 3
frequency associated with each scores is recorded in the second column. 8 6
3. As you can observe, there is only one student who got a perfect score and
most of the scores are 8 with 6 as its frequency, followed by 7 and 5 with 7 5
both 5 as its frequency. 6 4
4. The frequency can also be used to find the total number of scores. 5 5
Using SPSS: 4 3
1. Encode the scores in the first column or the first var and label it as 3 2
Scores. 2 1
2. Click Analyze, go to Descriptive Statistics then click Frequencies.
3. Click the Scores then click the arrow going right, then click OK.
SCORES
Frequency Percent Valid Percent Cumulative Percent
Valid 2 1 3.3 3.3 3.3
3 2 6.7 6.7 10.0
4 3 10.0 10.0 20.0
5 5 16.7 16.7 36.7
6 4 13.3 13.3 50.0
7 5 16.7 16.7 66.7
8 6 20.0 20.0 86.7
9 3 10.0 10.0 96.7
10 1 3.3 3.3 100.0
Total 30 100.0 100.0

The output of the SPSS will look


like this, similar to our data
manually.
You can even make a graph by
clicking the charts and choose a
specific chart type then click OK.
CENTRAL TENDENCY
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics. The
mean (often called the average) is most likely the measure of central tendency that you are most
familiar with, but there are others, such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under different
conditions, some measures of central tendency become more appropriate to use than others.

THE MEAN (Ungrouped Data)


The Arithmetic mean or arithmetic average is defined as the sum of the values of the variables
divided by the number of observations. The definition is the same for both sample and the population,
although we use different symbols to refer each kind.
The mean is the most common measure of central tendency. It is simply the sum of the scores
divided by the number of scores. The symbol “𝜇” is used for the mean population. The symbol “𝑥̅ ” is used
for the mean sample.

The formula is for mean is shown below:

Population Mean Sample Mean

∑𝑥 ∑𝑥
𝜇= 𝑥̅ =
𝑁 𝑛

Add all the scores in a selection and divide it to the total number of sample.
Example:

Students Math 10/10


1. You 9 ∑𝑥
2. Joseph 8 𝑥̅ =
𝑛
3. Ryan 7
4. Anna 8 59
𝑥̅ =
5. Kiko 9 8
6. Teddy 5
7. Mela 7 𝑥̅ =7.375
8. Joni 6
MEAN 7.38
The Weighted Arithmetic Mean

It is used when the values to be arranged have a corresponding weight or degree of importance.
Like in the computation of the Grade Point Average (GPA) of students, the corresponding weight or degree
of importance of the grades must considered. In a Likert Scale (5, 4, 3, 2, 1), the values elicited by the
respondents per item in a given observation must be considered to determine the weighted arithmetic mean.
The formula is:

𝑋1 𝑊1 + 𝑋2 𝑊2 +𝑋3 𝑊3 +....𝑋𝑛 𝑊𝑛 𝑋𝑖 𝑊𝑖
𝑥̅ = or 𝑥̅ =
𝑊1 +𝑊2 +𝑊3 +...𝑊𝑛 𝑊𝑖
Where: 𝑋𝑖 = frequency/observation
𝑊𝑖 = weight
𝑊𝑛 = total weight
Example:
1. Compute the GPA using the weighted arithmetic mean of the grades obtained by a
student.

Grade (X) Units (w)


Fil 1 1.75 3
Math 1 1.5 3
Eng 1 1.25 3
Social Studies 1 1.5 3
Psych 1 1.25 4
FilPsych 2.0 3
Devt Psych 1.0 3
Envi Science 1.5 2

Solution:
𝑋1 𝑊1 + 𝑋2 𝑊2 +𝑋3 𝑊3 +....𝑋𝑛 𝑊𝑛
𝑥̅ =
𝑊1 +𝑊2 +𝑊3 +...𝑊𝑛
1.75(3)+ 1.5(3)+1.25(3)+1.5(3)+1.25(4)+2.0(3)+1.0(3)+1.5(2)
𝑥̅ =
3+3+3+3+4+3+3+2
5.25+ 4.5+3.75+4.5+5+6+3+3
𝑥̅ =
24
35
𝑥̅ =
24
̅ = 1.46
𝒙
2. 100 students were asked to rate their attitude toward statistics. In five item questionnaire, the
5-point scale (Likert) was used. Compute the weighted arithmetic mean.

SCALES
Items Total WAM
5 4 3 2 1
1 20(100) 40(160) 20(60) 10(20) 10(10) 350/100 3.50
2 10(50) 40(160) 30(90) 20(40) 0(0) 340/100 3.40
3 5(25) 10(40) 30(90) 45(90) 10(10) 255/100 2.55
4 40(200) 25(100) 15(45) 10(20) 10(10) 375/100 3.75
5 30(150) 30(120) 20(60) 15(30) 5(5) 365/100 3.65
OVERALL MEAN 1685/500 3.37

MEDIAN (Ungrouped Data)


The median is also a frequently used measure of central tendency. The median is the
midpoint of the distribution the same number of scores. Simply it is the middle score. The median
is denoted by Md.
Computation of Median

A. If n is even
𝑋𝑛/2 + 𝑋(𝑛+2)/2
Md=
2

B. If n is odd

Md= 𝑋(𝑛+1)/2

Legend: x = middle observation


n =number of observation
Example if n is odd number

Raw Score
2 Md= 𝑋(𝑛+1)/2
3
4 Md= 𝑋(7+1)/2
5 Md= 𝑋8/2
6
7 Md= 𝑋4
8 Md= 5
Students English 20/20 Arranged in
Order 10+11 𝑋𝑛/2 + 𝑋(𝑛+2)/2
Median = Md=
1. You 15 7 2 2
2. Joseph 18 8 𝑋8/2 + 𝑋(8+2)/2
3. Ryan 10 9 Md =10.50 Md=
2
4. Anna 8 10 𝑋4 + 𝑋10/2
Median Md=
5. Kiko 11 11
2
6. Teddy 9 14 𝑋4 + 𝑋5
7. Mela 14 15 Md=
8. Joni 7 18 2
10+ 11
Total Score 92 Md=
Mean 11.50 2
21
Median 10.50 Md=
2
Md= 𝟏𝟎. 𝟓

MODE
The mode is the most frequently occurring value. Count the number of times each scores
occurs and pick the score with most occurrences. When we use nominal scales and discrete
variables it is better to use mode. It is considered the least reliable measure of location.

Students Math 10/10 Arranged in Order There are 3 Modes or


1. You 9 5 Bimodal
2. Joseph 8 6 (7, 8 and 9)
3. Ryan 7 7
4. Anna 8 7
5. Kiko 9 8
6. Teddy 5 8 Modes
7. Mela 7 9
8. Joni 6 9
MEAN 7.38
MEDIAN 7.5
MODE 7,8, 9

When two frequencies are repeated equally, this is called bimodal. Polymodal is when three or
more modes are present.
How to Make Class Intervals in Statistics?
Groups of information in statistics in form of qualitative or quantitative attribute of set of
variables are referred as data. Data can be either grouped or ungrouped. Ungrouped data
is a rough data which have been just gathered and no further steps were performed on this
data.

If the data is organized in groups, which are called classes, the data is referred to as
grouped data. Each class has its own width, which is called the class interval. The correct
selection of the class interval is very important. The width of each class interval could be
equal or different depending on situation and on the way of how the data is grouped, but
the size of the interval is always a whole number.

Step by step procedure in class interval


1. Range = HS-LS
2. Class interval= √𝑛
3. Class width/size= Range
√𝑛

Example:
A total of 40 respondents answered a personality test administered by the Psychology
students. The following are the scores of the respondents. Get the measures of central tendency of
the grouped data.
21 53 23 61 46 37 18 3 45 45
5 33 26 37 85 52 2 23 44 61
33 11 10 35 44 56 68 55 44 67
95 67 90 80 5 34 88 3 55 44

Make the Class Interval for the Frequency Distribution Table


1. Range = HS-LS 3. Class width/size = Range
=95-2 √𝑛
=93 = 93
2. Class interval = √𝑛 6
= √40 =15.5 or 16
=6.32 or 6 class intervals class width
Class interval Tally f x fx <cf
82-97 IIII 4
66-81 IIIII 5
50-65 IIIII-II 7
34-49 IIIII-IIIII-I 11
18-33 IIIII-I 6
2-17 IIIII-II 7
i=16 ∑ 𝑓=40

21 53 23 61 46 37 18 3 45 45
5 33 26 37 85 52 2 23 44 61
33 11 10 35 44 56 68 55 44 67
95 67 90 80 5 34 88 3 55 44

MEAN (Grouped Data)


In a large data set, we can compute the mean conveniently when we group the frequencies
or arrange these together to form class intervals or frequency distributions. In setting the class
interval, consider determining the range. The difference is to be divided with the standard number
ranging from 10-20. The quotient is the desired class interval, and the lowest value must be a
multiple of class interval:

The formula of the sample mean for grouped data is:


∑ 𝑓𝑥
𝑋̅ =
𝑛
Where: 𝑋̅ = sample mean value
𝑓 = frequency of the class
𝑥 = midpoint of the class
n = number of cases or samples
Example:
1. Compute the mean for the following grouped data using the Midpoint method.
Class Tally f x fx Cumulative
interval Frequency <cf
82-97 IIII 4 89.5 358
66-81 IIII 5 73.5 367.5
50-65 IIIII-II 7 57.5 402.5
34-49 IIIII-IIIII-I 11 41.5 456.5
18-33 IIIII-II 6 25.5 153
2-17 IIIII-II 7 9.5 66.5
n= 40 ∑ 𝑓𝑥=1804
Note: The midpoint is the middle value in each class interval, 1.e., 2+17/2 = 9.5
Solution:
∑ 𝑓𝑥
𝑥̅ =
𝑛
1804
=
40
̅ = 45.10
𝒙

MEDIAN (Grouped Data)


The median for grouped data is used when the distribution is large and arrange in a
frequency distribution. The formula is:
𝑛
( 2 −𝐹𝑏)𝑖
Md= L + Where: Md = median
𝑓𝑚
L = exact lower limit of the median interval (33.5)
n = number of cases/samples (40)
Fb = cumulative frequency or the sum of the frequencies
before the median interval. (13)
fm = frequency of the median class (11)
i=class width (16)
Class interval Tally f x fx Cumulative
Frequency
<cf
82-97 IIII 4 89.5 358 40
66-81 IIIII 5 73.5 367.5 36
50-65 IIIII-II 7 57.5 402.5 31
34-49 L=34-.5=33.5 IIIII-IIIII-I 11 (fm) 41.5 456.5 24
18-33 IIIII-I 6 25.5 153 13 (fb)
2-17 IIIII-II 7 9.5 66.5 7
i=16 n= 40 ∑ 𝑓𝑥= 1804
Note: The location of median class is computed as n/2=20, then look for one-step higher number
under the cumulative frequency.
Solution:
𝑛
( −𝐹𝑏)𝑖
2
Md= L +
𝑓𝑚
40
( 2 −13)16
Md= 33.5 +
11

(20−13)16
Md= 33.5 +
11

(7)16
Md= 33.5 +
11
112
Md= 33.5 +
11
Md= 33.5 + 10.18
Md= 43.68 or 44

MODE (Grouped Data)


The formula for mode in grouped data is:
𝑓𝑚 −𝑓𝑚−1
Mo= L + ( )𝑖
(𝑓𝑚 −𝑓𝑚−1 )+(𝑓𝑚 −𝑓𝑚+1 )
Where: Mo = Mode
L = lower limit of the modal class
i = Class width
𝑓𝑚+1 = frequency of the class after the modal class
𝑓𝑚−1 = frequency of the class before the modal class
𝑓𝑚 = frequency of the the modal class

Class interval Tally f x fx Cumulative


Frequency
<cf
82-97 IIII 4 89.5 358 40
66-81 IIIII 5 73.5 367.5 36
50-65 IIIII-II 7(fm-1) 57.5 402.5 31
34-49 L=34-.5=33.5 IIIII-IIIII-I 11 (fm) 41.5 456.5 24
18-33 IIIII-I 6 (fm+1) 25.5 153 13
2-17 IIIII-II 7 9.5 66.5 7
i=16 n= 40 ∑ 𝑓𝑥= 1804

Solution:
𝑓𝑚 −𝑓𝑚−1
Mo= L + ( )𝑖
(𝑓𝑚 −𝑓𝑚−1 )+(𝑓𝑚 −𝑓𝑚+1 )
11−7
Mo= 33.5+ ((11−7)+(11−6)) 16
4
Mo= 33.5+ ( ) 16
4+5
4
Mo= 33.5+ ( ) 16
9
Mo= 33.5+ (0.44) 16
Mo= 33.5+ 7.04
Mo= 40.54 or 41
SHAPE OF DISTRIBUTION
In the proceeding chapters we will introduced to central tendencies and variability. Before
we discuss them let us take a look on the shape of distribution. In part of the relationships among
mean, median and mode are determined by the shape of the distribution.

SKEWNESS. This refers to the symmetrical and asymmetrical distribution of data. When data
are normally distributed, it is called symmetrical distribution of data. When data are distributed
mostly at the right side of the curve, the distribution is known as positively skewed. When the
distribution fluctuates to the left of the curve, the distribution is known as negatively skewed. The
illustration is shown below.

The skewness tells the relationship between the mean, median and the mode. The median is the
middle most value, the mode is the apex and the mean tends to be located towards the tail of the
distribution. This is because the mean represents all the values in any given distribution.

A positive or negative value of skewness is described as not normal distribution. Nonparametric


test is used when distribution is not normal, that can be utilized in both nominal and ordinal data.

3(𝑥̅ −𝑀𝑑)
𝑆𝑘 = Where:
𝑆𝐷
Sk= Skewness x = mean
3 = Constant Md = Median
SD= Standard Deviation

Example:
𝑥̅ = 3.275
Md=2.90
SD= 1.073
3(𝑥̅ − 𝑀𝑑)
𝑆𝑘 =
𝑆𝐷
3(3.275 − 2.90)
𝑆𝑘 =
1.073
3(0.375)
𝑆𝑘 =
1.073
1.125
𝑆𝑘 =
1.073
𝑺𝒌 = 𝟏. 𝟎𝟓
Therefore, the coefficient of skewness is positively skewed because the Sk is positive.
When the Sk is negative, the distribution is called negatively skewed.

As stated by Reyes (1996), in a normal distribution, Sk=0. If the Sk has a negative value,
the distribution is skewed to the left. If the Sk has a positive value, the distribution is skewed to
the right. The greater value, the Sk departs from 0, the more skewed or asymmetrical is the
distribution. The nearer the distribution is to normal, the value of Sk comes closer to zero.

KURTOSIS. It refers to the measure of the


magnitude of peakness or flatness of a distribution.
Some symmetrical curves may look just like normal
bell shape curve, but some are either excessively
steep or flat compared to normal bell curve. A steep
kurtosis is called leptokurtic, a normal curve is
called, mesokurtic and a flat curve is called
platykurtic. A positive value of kurtosis means that
the curve is middling and negative value of kurtosis
is a flat curve.

Note:
a. When ku= 3, the distribution is normal
b. When ku< 3, the distribution is platykurtic
c. When ku > 3, the distribution is leptokurtic

The formula for kurtosis is:

Σ(𝑋 − 𝑋̅)2
𝐾𝑢 =
𝑛𝑆 4
Example:
Given:
Σ(𝑋 − 𝑋̅ )2 = 15
n = 20
S = 7.95
Thus:
Σ(𝑋−𝑋̅)2 (15)2 225
𝐾𝑢 = = = = .004  means platykurtic
𝑛𝑆 4 20(7.95)4 54952.10

PERCENTILES
The most common definition of a percentile is a number where a certain percentage of
scores fall below that number. You might know that you scored 67 out of 90 on a test. But that
figure has no real meaning unless you know what percentile you fall into. If you know that your
score is in the 90th percentile, that means you scored better than 90% of people who took the test.
Percentile rank tells what percent of the cases got below the rank position. Percentile point
(Pn) is the score or value that corresponds to the given percentile rank.

How to get Percentile Rank:


1. Find the number of individual who scored below the individual’s score and the number
of individuals who scored exactly the same score. Arranged the raw scores in
descending order.
2. Take the number of individuals who scored below a specific raw score and divide it by
the total number of people who took the test.
3. Multiply this number by 100 to make the decimal a whole number.

Use this formula:


𝑏
PR = 𝑥 100 Where: PR = Percentile Rank
𝑛
b = number of scores that is below your score
n = total number of score
100 = constant number

Example: Find the percentile rank of the score 85% using the data below.
Scores 50 65 75 44 78 90 85 65 74 90

Order 1 2 3 4 5 6 7 8 9 10
Score 90 90 85 78 75 74 65 65 50 44

𝑏
PR = 𝑥 100
𝑛
7
PR = 𝑥 100
10
𝐏𝐑 = 𝟕𝟎 The score 85% is at 70th percentile rank. It means that you scored better than
70% of people who took the test.

Quartiles are values that divide your data into quarters. However, quartiles aren’t shaped like
pizza slices; Instead they divide your data into four segments according to where the numbers
fall on the number line. The four quarters that divide a data set into quartiles are:

25th quartile, 50th quartile, 75th quartile and 100th quartile

Deciles are similar to quartiles. But while quartiles sort data into four quarters, deciles sort data
into ten equal parts:
Decile Rank 1 2 3 4 5 6 7 8 9 10
Percentile 10th 20th 30th 40th 50th 60th 70th 80th 90th 100th
Example: 31, 33, 18, 12, 5, 39, 25, 30, 31, 22, 16
Find P46, Q3 and D9
Another Example: P36
 How to compute for P46
𝑃
Lp = (𝑛 + 1)
100
36
1. Put in ascending order; L36 = (11 + 1)
5, 12, 16, 18, 22, 25, 30, 31, 32, 33, 39 100
L36 = (12) 0.36
th
2. Use this formula in locating 46 Percentile L36 = 4.32
𝑃 This tells that 36th percentile is between
Lp = (𝑛 + 1) the 4th and 5th observation particularly
100
46 32% distance between them.
L46 = (11 + 1)
100
L46 = (12) 0.46
L46 = 5.52
Interpretation: This tells that 46th percentile is between the score
5th and 6th observation particularly
score
52% distance between them. 22 25
25-22
𝑃46 = 22 + 0.52 (3) = 22 + 1.56 = 23.56 𝑃36 = 18 + 0.32 (4)=18 + 1.28= 19.28
Interpretation: 36% of the scores are
Interpretation: 46% of the scores are less than 23.56 less than 19.28

 How to compute for Q3


Using the formula:
𝑝
𝐿𝑝 = (𝑛 + 1)
100
75
𝐿75 = (11 + 1)
100
𝐿75 = (12)0.75
𝐿75 = 9
Interpretation: P75 is the 9th value which is the score 32.

 How to compute for D9


Using the formula:
𝑝
𝐿𝑝 = (𝑛 + 1)
100
90
𝐿90 = (11 + 1) = 12(0.90) = 10.8
100
Interpretation: This tells that 9th decile is between the 10th and 11th observation particularly 80%
score score
distance between them. 33 39

39-33

𝑃90 = 33 + 0.8 (6) = 33 + 4.8 = 37.8


Interpretation: 90% of the of the scores are less than 37.8
MEASURES OF VARIATIONS
This concept is essential in statistics: it is a way to show how data is dispersed, or spread
out. Several measures of variation are used in statistics.

RANGE
The range, is the simplest to compute by obtaining the difference between the largest and
the lowest values in the set of numerical data. It is considered a poor measure of variability or an
unstable form of measurement.

The range for ungrouped data is obtained by finding the difference between the highest and
the lowest value. For the grouped data, the range is determined by subtracting the lower boundary
of the lowest class interval from the upper boundary of the highest class of frequency distribution.
This so because the class boundaries are considered the true limits.

Range(R) =Highest value (H) – Lowest value (L) or R= H-L


Grouped Data: Range(R)= UBHCI– LBHCI
(UBHCI)=Upper boundary of highest class interval
(LBLCI)=Lower boundary of lowest class interval

Example: Supposed you have 12 members in your group what would be the range of their score
considering the following values;

Scores:
9 8 7 6 4 7
9 7 5 6 8 5

The range is the distance between the highest score and the lowest score. Range=
largest value- smallest value. In our example, the largest score is 9 and the smallest score is 4.
The range is 9-4=5.

Mean Absolute Deviation (MAD)

The mean absolute deviation is a method of obtaining the variation of all the values or scores from
the mean. Although there are limitations in obtaining the precise spread or variability, this is more reliable
than the range.

The MAD determines each individual score or value in a distribution that deviates from the mean
of a given distribution.

Formula:
∑|𝑥−𝑥̅ |
𝑀𝐴𝐷 = 𝑁
Where:
MAD=Mean absolute deviation
X= individual score or value
𝑥̅ = mean
∑|𝑥 − 𝑥̅ | = sum of the absolute deviations from the mean
N =total number of population/sample
Example: Compute the mean deviation of the following set of data:
10 15 20 25 30 15 12 10 28 17

The mean is computed as:


10+15+20+25+30+15+12+10+28+17
𝑥̅ =
10
182
=
10
= 18.2
The MAD is computed as:
Raw Data
(x-𝑥̅ ) |𝑥 − 𝑥̅ |
(X)
10 10-18.2 = -8.2 8.2
15 15-18.2 = -3.2 3.2
20 20-18.2 = 1.8 1.8
25 25-18.2 = 6.8 6.8
30 30-18.2 = 11.8 11.8
∑|𝑥 − 𝑥̅ |
15 15-18.2 = -3.2 3.2 𝑀𝐴𝐷 =
𝑁
60.4
12 12-18.2 = -6.2 6.2 =
10
10 10-18.2 = -8.2 8.2 = 6.04
28 28-18.2 = 9.8 9.8
The 6.04 mean absolute deviation
17 17-18.2 = -1.2 1.2 denotes that, on the average, the
given set of data differs by 6.04 from
∑|𝑥 − 𝑥̅ |= a mean of 18.2.
60.4
The MAD in grouped data is computed as:

Class
Interval f x fx (x-𝑥̅ ) |𝑥 − 𝑥̅ | f(|𝑥 − 𝑥̅ |)
(X)

90-99 3 94.5 283.5 94.5-62.25= 32.25 32.25 96.75

80-89 3 84.5 253.5 84.5-62.25= 22 22 66

70-79 8 74.5 596 74.5-62.25= 12.25 12.25 98

60-69 9 64.5 580.5 64.5-62.25= 2.25 2.25 20.25

50-59 5 54.5 272.5 54.5-62.25=-7.75 7.75 38.75

40-49 9 44.5 400.5 44.5-62.25=-17.75 17.75 159.75

30-39 3 34.5 103.5 44.5-62.25=-27.75 27.75 83.25

∑ 𝑓𝑥 = 𝛴f(|𝑥 − 𝑥̅ |=
n= 40
2490 562.75

∑ 𝑓𝑥 ∑ 𝑓|𝑥 − 𝑥̅ | The 14.07 mean absolute deviation


𝑥̅ = 𝑀𝐴𝐷 =
𝑛 denotes that, on the average, the given
𝑛
2490 562.75 set of data differs by 14.07 from a
= = mean of 62.25
40
40 =14.07
̅=62.25
𝒙
VARIANCE (𝑺𝟐 )
The variance is defined as the average of the squared deviations from the mean. The square root
of this variance is known as the standard deviation. It is an alternative method for converting both positive
and negative numbers into all positive numbers by squaring these deviations. The formula is:
∑(𝑥 − 𝑥̅ )2
𝑆2 =
𝑛−1
Where:
𝑆 2 = sample variance
n = sample size
x = sample observation
x̅ = sample mean

This formula is commonly used in research because in most cases we are dealing with estimated
sample values from the population. When determining the population variance, the formula is:
∑(𝑥 − 𝜇)2
𝜎2 =
𝑁
Where:
𝜎 2 = population variance
N = total population
X = values of observation
𝜇 = population mean

For example, calculate the sample variance for the given sample:
Scores (X) Deviation from the mean Squared deviation from the
(𝑥 − 𝑥̅ ) mean (𝑥 − 𝑥̅ )2
15 15-9.67= 5.33 28.40
13 13-9.67= 3.33 11.09
10 10-9.67= .33 0.11
7 7-9.67= -2.67 7.13
8 8-9.67= -1.67 2.79
5 5-9.67= -4.67 21.81
∑ 𝑥 = 58
x̅ = 58/6 ̅)𝟐 = 71.34
∑(𝒙 − 𝒙
= 9.67

Computation:
∑(𝑥 − 𝑥̅ )2
𝑆2 =
𝑛−1
2
71.34
𝑆 =
6−1
𝑺𝟐 = 𝟏𝟒. 𝟐𝟕

Using the population variance with the given sample above, it is calculated as,
∑(𝑥 − 𝜇)2
𝜎2 =
𝑁
2
71.34
𝜎 =
6
𝝈𝟐 = 𝟏𝟏. 𝟖𝟗
When the data are large and arranged in a frequency distribution (grouped data), are computed using the
formula:

∑ 𝑓 (|𝑥 − 𝑥̅ |)2
𝑆2 =
Σ𝑓 − 1
Where:
∑ 𝑓 = summation of frequency
|𝑥 − 𝑥̅ | = absolute deviation from the mean

The variance in grouped data is computed as:

Class
Interval f x fx (x-𝑥̅ ) |𝑥 − 𝑥̅ | (|𝑥 − 𝑥̅ |)2 f (|𝑥 − 𝑥̅ |)2
(X)

90-99 3 94.5 283.5 94.5-62.25= 32.25 32.25 1040.06 3120.19

80-89 3 84.5 253.5 84.5-62.25=22 22 484 1452

70-79 8 74.5 596 75.5-62.25=12.25 12.25 150.06 1200.5

60-69 9 64.5 580.5 2.25 2.25 5.06 45.56

50-59 5 54.5 272.5 -7.75 7.75 60.06 300.31

40-49 9 44.5 400.5 -17.75 17.75 315.06 2835.56

30-39 3 34.5 103.5 -27.75 27.75 770.06 2310.19

n= ∑ 𝑓𝑥 =
𝛴f(|𝑥 − 𝑥̅ |)2 =11264.31
40 2490

∑ 𝑓 (|𝑥 − 𝑥̅ |)2 ∑ 𝑓 (|𝑥 − 𝑥̅ |)2


𝟐
𝝈 = 𝑆2 =
Σ𝑓 Σ𝑓 − 1
11264.31 11264.31
2
𝑆 = 𝑆2 =
40 40 − 1
𝑺𝟐 = 281.61 𝑺𝟐 = 288.83

STANDARD DEVIATION
The standard deviation is the square root of the variance. This is a special form of average deviation
from the mean and it is an important measure of heterogeneity or homogeneity in a set of observations.
This commonly used in parametric statistics.
To interpret the standard deviation, the larger the value the greater dispersion, denoting
heterogeneous data. The lesser the value means that the scores are homogeneous.

The formula for population standard deviation (ungrouped data) is:

∑(𝑥 − 𝜇)2
𝜎=√
𝑁

The formula for sample standard deviation (ungrouped data) is:

∑(𝑥 − 𝑥̅ )2
𝑆=√
𝑛−1

Example:

X 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
∑(𝑥 − 𝑥̅ )2
𝑆𝐷 = √
9 9-6= 3 9 𝑛−1
8 8-6= 2 4
7 7-6= 1 1 40
𝑆𝐷 = √
5 5-6= -1 1 5−1
1 1-6= -5 25 𝑆𝐷 = √10

Σ = 30 ̅)𝟐 = 40
∑(𝒙 − 𝒙 𝑺𝑫 = 𝟑. 𝟏𝟔
̅=𝟔
𝒙

When the data are arranged in frequency distribution (large data), the formulas are as follows:

1. Population standard deviation for grouped data. Using the coded method, the formula is:

∑ 𝑓 (𝑥 − 𝑥̅ )2
𝜎= √
𝑁
2. Sample standard deviation for grouped data.
∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑆=√
𝑛−1
Example:
Calculate the sample standard deviation for grouped data.
Class
Interval f x fx (x-𝑥̅ ) |𝑥 − 𝑥̅ | (|𝑥 − 𝑥̅ |)2 f(|𝑥 − 𝑥̅ |)2
(X)

90-99 3 94.5 283.5 32.25 32.25 1040.06 3(32.25)2 =3120.19

80-89 3 84.5 253.5 22 22 484 3(22)2 =1452

70-79 8 74.5 596 12.25 12.25 150.06 8(12.25)2=1200.5

60-69 9 64.5 580.5 2.25 2.25 5.06 9(2. 25)2 =45.56

50-59 5 54.5 272.5 -7.75 7.75 60.06 5(7.75)2 =300.31

40-49 9 44.5 400.5 -17.75 17.75 315.06 9(17.75)2=2835.56

30-39 3 34.5 103.5 -27.75 27.75 770.06 3(27.75)2=2310.19

n= ∑ 𝑓𝑚 =
𝛴f(|𝑥 − 𝑥̅ |)2 =11264.31
40 2490

Computation:
∑ 𝑓 (𝑥 − 𝑥̅ )2 ∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑆𝐷 = √ 𝑆𝐷 = √
𝑁 𝑛−1

11264.31 11264.31
𝑆𝐷 = √ 𝑆𝐷 = √
40 40 − 1

11264.31 11264.31
𝑆𝐷 = √ 𝑆𝐷 = √
40 39
𝑆𝐷 = √281.61 𝑆𝐷 = √288.83
𝑺𝑫 = 𝟏𝟔. 𝟕𝟖 𝒐𝒓 𝟏𝟕 𝑺𝑫 = 𝟏𝟔. 𝟗𝟗 𝒐𝒓 𝟏𝟕
NORMAL DISTRIBUTION

The normal curve that is bell-shaped, is widely known as normal distribution. Since many of the
frequency distribution are very close to the normal curve, let’s assume that they have normal distributions.
The normal curve is important not because scores are assumed to be normally distributed but because the
sampling distributions of various statistics are known assumed to be normal.

The Concept of the Normal Curve


The normal curve is sometimes called Gaucassian curve in honor of Karl Freidrich Gauss (1775-
1855), the first mathematician to make use of its properties. However, it was found out later that Abraham
de Moivre (1667-1754) was the one who introduced the equation of the “curve of error or normal
probability.”

Generally, the bell-shaped graph is symmetric with respect to a vertical line drawn at the center
from the horizontal axis to the modal peak of the curve. The values of the mean, median and mode coincide
at the center of the distribution. This implies that the mean, median and mode are numerically equal. The
curve is asymptotic with respect to the horizontal axis; that is, the curve with two tails called asymptotes
never intersects the horizontal axis although they tend to approach and extend indefinitely in opposite
directions.

The standard deviation is the standard measure of variability which is measured along the
horizontal axis. The total area under the curve is 1 or 100%, which represents the total number of the
distribution, hence the right half of the vertical line or above the mean represents 0.5 or 50% of the cases,
and the other half to the left or below represents the other half which is also 50%.
The area of the normal curve may be sub-divided into three standard scores each to the left and
right of the vertical axis. Normally distributed populations usually do not exceed plus or minus three
standard deviations in as much as the area under the curve becomes negligible at 4 and 5 standard deviations
away from the mean.

Empirical Rule
The horizontal line under the normal curve is sub-divided into equal sub-intervals of at least three
units as mentioned earlier to the left and right of the vertical axis at the center of the curve. The empirical
rule is based on the equation introduced by Moivere. However, due to laborious effort when used directly,
statistical tables were made for easier computation. The rules illustrate the probabilities under the normal
curve as follows:

1. About or roughly 68% of the area of the distribution is between 𝜇 ± 1𝜎 or 0 to ±1.


2. About or roughly 95% of the area of the distribution is between 𝜇 ± 2𝜎 or 0 to ±2.
3. About or roughly 99.5% of the area of the distribution is between 𝜇 ± 3𝜎 or 0 to ±3.
STANDARD SCORES
It is universally understood units in testing that allow users to evaluate a person’s performance in
reference to other persons who has the same or similar test.

A real score or raw score can be transformed into a standard score called the z score. The z score
represents a normal distribution with the mean, and a standard deviation in case the population is used or
and a standard deviation s =1 in case sample is used. It indicates the deviation of the score from the mean
in each distribution. If a particular raw score is above the mean or to the right of mean, its equivalent z
score is positive; it is negative if it is below or to the left of the mean. The z score can be directly
transformed to percentile.

The transformation van be determined using the following formulas:

𝑥−𝜇 𝑥−𝑥̅
𝑧= , for population and 𝑧= for sample
𝜎 𝑠

Where:
𝑧 = standard score
𝜇 = population mean
𝜎 = population standard deviation
𝑥 = real score
𝑥̅ = sample mean
𝑠 = sample standard deviation
Examples:
The following are the final exam results of Glen’s performance in his three subject. On what subject
did he perform well? Worst?

Subject Grade Mean (X) Standard


Deviation (SD)
English 101 84 81 4.5
Math 101 76 75 6
P.E 1 90 92 6.4

Solution:

𝑥−𝑥̅ 84−81
English 101: 𝑧= = = 0.67
𝑠 4.5
𝑥−𝑥̅ 83−75
Math 101 𝑧= = = 1.33
𝑠 6
𝑥−𝑥̅ 90−92
P.E. 1: 𝑧= = = -0.31
𝑠 6.4

Interpretation:
The z scores indicate that Glen performed best in Math 101. He did not perform well in P.E. 1.
T SCORES
T score is a transformed score that always have a mean of 50 and a standard deviation of 10. This
is computed by multiplying the z-scores by 10 and then adding 50. A T-score in psychometric or
psychological testing is a specialized term that is not the same thing as t-score that you get from t-test. T-
scores in t-test can be positive or negative, T score in psychometric testing are always positive with mean
of 50.
A t-score is similar to a z score. It represents the number of standard deviation from the mean.
Many prefers t scores because the lack of negative numbers means they are easier to work and there is a
larger range so decimals are almost eliminated. The table shows z-scores and their equivalent t scores.

𝑇 = (𝑧 𝑥 10) + 50
Student Score 𝑥 − 𝑥̅ Z scores T-score
A 14 -1 -0.37 45.32
B 15 0 0.00 50.00
C 11 -4 -1.47 35.29
D 17 2 0.74 57.35
E 14 -1 -0.87 46.32
F 16 1 0.37 53.67
G 17 2 0.74 57.35
H 12 -3 -1.10 38.97
I 12 -3 -1.10 38.97
J 17 2 0.74 57.35
K 20 5 1.84 68.38
Σ𝑥 165
mean 15
sd 2.72
Reference:

Lambojon, Jr, Francisco, et.al. Psychological Statistics. Mindshapers C., Inc. 2017

You might also like