Statistics

Statistics EASSESS1
Statistics
One does not need to be a statistical wizard to grasp the basic mathematical
concepts needed to understand major measurement issues.
Statistics
plural sense a set of numerical data
singular sense branch of science which deals with the, collection, presentation,
analysis and interpretation of data
Population vs. Sample
Population a collection of all the elements under consideration in any statistical

study
Sample a part (or subset) of the population from which information is

collected
Parameter vs. Statistic
Parameter a numerical characteristic of the population
Statistic a numerical characteristic of the sample
Areas of Statistics
Descriptive Statistics comprise those methods concerned with collecting, describing ,

and analyzing a set of data without drawing conclusions or inferences about a large
group
Inferential Statistics comprise those methods concerned with the analysis of sample
data leading to predictions or inferences about the population
Data
refers to the information collected, organized, analyzed, and interpreted by

researchers
Classification of Data
Qualitative have labels or names assigned to their respective categories
Examples:
Color - red, blue, yellow, green
Sex - male, female
Statistics EASSESS1
Quantitative any attribute that we measure in numbers
Examples:
weight - 160 lbs, 25 kg, 77 mg, etc.
height - 34 in., 5 cm, 5ft. 6 in., etc
Raw Data data in its original form
Array data arranged either from highest to lowest or from lowest to highest
Measurement
Measurement is defined as a set of rules for assigning numbers to represent objects,

traits, attributes, or behaviors.
Scales of Measurement
Nominal Scales:
• qualitative system for categorizing objects or people.
• numbers or symbols are used simply to classify an object, person, or

characteristics into categories
• the categories must be distinct, non-overlapping, and exhaustive
• weakest level of measurement
• Examples: Gender - Female =1, Male = 2; Eye Color - Brown =1, Blue =2,
Green = 3.
Ordinal Scales:
• allows you to rank people or objects according to the quantity of a characteristic.
• contains the properties of the nominal level but the numbers assigned to
categories of any variable may be ranked or ordered in some low - to - high
manner
Example: Graduation Class Rank - 1 = Valedictorian, 2 = Salutatorian, 3 = 3rd

Rank, etc..
Interval level
• contains the properties of the ordinal level but the distances between any two
numbers on the scale are of known sizes
• characterized by a common and constant unit of measurement
• units of measurement are arbitrary
• the number zero does not imply the absence of the characteristic under
consideration (thus, the zero point is arbitrary
Examples: IQs, GRE scores

Statistics EASSESS1
Ratio level
• contains the properties of the interval level but it has a true zero point, that is, the
number zero indicates the absence of the characteristic under consideration
• strongest level of measurement
Examples:
- height in meters, feet, etc.

- weight in kilograms, pounds, etc.
Distributions
• Distribution: a set of scores.
• Raw Score Distributions
• Frequency Distributions
Ungrouped Frequency Distribution

Grouped Frequency Distribution
• Frequency Graphs
Frequency Distribution Table
refers to the tabular arrangement of data by classes or categories together with the
number of observations falling with each class
1. Single-value grouping
• a form of frequency distribution where the distinct values are used as classes
Example:
The following data represent the number of school-age children from a sample of
30 families in a certain residential area.
0 0 3 2 0 2
0 1 4 4 1 1
0 0 3 3 0 0
2 1 1 0 2 0
2 0 0 2 1 2
Statistics EASSESS1
Single-Value Frequency Distribution For The School-Age Data
Frequency Distribution For Number of School-Age Children
Number of Number of Relative

School-Age
Children Families Frequency
0 12 0.40
1 6 0.20
2 7 0.23
3 3 0.10
4 2 0.07
Grouping by class intervals
Definitions:
Class interval the numbers defining a class
Class limits the smallest and the largest values than can fall in a given class
Class boundaries numbers that are halfway between the upper limit of a class and the
lower limit of the next class
Class size length of the class interval; computed by taking the difference
between two successive upper lower class boundaries or class
limits
Class mark midpoint of an interval; computed by taking the average of the lower
and upper class limits of a given class interval
Relative frequency obtained by dividing the class frequency by the total number of
observations
Relative percentage obtained by multiplying the relative frequency by 100%
Steps in Constructing a Frequency Distribution Table
1. Determine an adequate number of classes (K).
• not too many, not too few
• usually between 6 and 16
• classes should be non-overlapping
2. Determine the range (R).
R = highest – lowest
Statistics EASSESS1
3. Compute the ratio of R and K (C*).
C* = R/K
4. Determine the class size (C) by rounding-off C* to a number that is easy to work
with.
5. List the class intervals.
6. Tally the frequency for each class.
7. Sum the frequency column and check against the total number of observations.
Example :
The following are the scores of 4th year high school students in a certain
achievement test in Mathematics
11 14 14 14 16 17 20 24
25 25 28 30 30 31 31 33
34 34 35 35 37 37 37 38
39 41 41 42 44 44 44 45
47 47 47 47 51 53 53 54
54 55 55 56 56 56 57 57
58 58 58 58 59 60 60 60
61 62 62 62 65 66 66 66
66 67 68 68 74 75 76 76
81 87 92 92 97
Statistics EASSESS1
Frequency distribution table for the achievement test scores of 4th

year high school students
Scores Class Boundaries Class Number Percentage

Mark of
students
10 – 19 9.5 − 19.5 14.5 6 7.79
20 – 29 19.5 − 29.5 24.5 5 6.49
30 – 39 29.5 − 39.5 34.5 14 18.18
40 – 49 39.5 − 49.5 44.5 11 14.29
50 – 59 49.5 − 59.5 54.5 17 22.08
60 – 69 59.5 − 69.5 64.5 15 19.48
70 – 79 69.5 − 79.5 74.5 4 5.19
80 – 89 79.5 − 89.5 84.5 2 2.60
90 - 99 89.5 − 99.5 94.5 3 3.90
DESCRIPTIVE STATISTICS
MEASURES OF CENTRAL TENDENCY
A measure of central tendency is a single figure, which is representative of the general

level of magnitudes or values of the items in the set of data.
There are three most commonly used measure of central tendency: the mean, the median
and the mode.
Mean ( X ). It the average of the set of data
It is the sum of the scores divided by the number of scores.
Properties of Mean
1. The mean is sensitive to the exact values of all the scores in the distribution
2. The sum of the deviations about the mean is zero.
3. The mean is very sensitive to the extreme scores when the scores are not
balanced at both ends of the distribution
4. The sum of the squared deviations of all the scores about the mean is a
minimum.
5. Among the measures of central tendency, the mean is least subject to sampling
variation.
Statistics EASSESS1
When to Use
1. It is appropriate for interval set of data

2. When further statistical computation is expected
3. When the set is normally distributed
Mean for Ungrouped Data

N
X X 1 + X 2 + X 3 + ... + X N
X = i =1
=
N N
Where: X- scores
N- number of cases
weighted Mean ( X w )
Sometimes values are not equally important in a distribution, in order to give these
quantities equal importance. It is necessary to assign weights and then calculate the
weighted mean.
Xw =
w1 X 1 + w2 X 2 + ... + wN X N
=
w X N N
w1 + w2 + ... + wN W
Mean for Grouped Data
1. Using Classmark
X=
 fX where: X- classmark
N
2. Using the Code Method
  fu 
X = X0 + i
 N 
 
Where: X0 classmark with the code 0
U code
f frequency
i class width
N total number of cases

Statistics EASSESS1
Median ( X )
The median is the middle value when the data is arranged in ascending or
descending order.
N +1
To determine the position of the median, use .
2
Properties of the median
1. The median is less sensitive than the mean

2. Under usual circumstances, the median is more subject to sampling variability
than the mean but less to sampling variability than the mode.
When to Use
1. It is appropriate for ordinal data

2. If the middle score in the distribution is desired
3. If we avoid the influence of the extreme values
Median for Grouped data
N 
 −  F
X = LB +  i
2
f
Where: LB Lower Boundary of the
median class
<F Less than cumulative
frequency ( Below)
f Frequency
N Number of cases
I class width
Mode
The value in a set that has the highest frequency.
When to use
1. It is appropriate for nominal data

2. If a quick approximation of the central tendency is desired
3. If the most frequently occurring score is needed
Statistics EASSESS1
Mode for Grouped Data
 1 
X = LB +   i
 1 +  2 
Where: LB modal class of modal class
1 difference between the highest frequency and the frequency
just above it
2 difference between the highest frequency and the frequency
just below it
i class width
True Mode
X = 3Median − 2Mean = 3 X − 2 X
Crude Mode
Midpoint of the class interval with the highest frequency
Measures of Variability
The measure of variability is a value that that describes how far scores are spread apart.
A deviation score or value tells us how far away the raw score or value departs from the
mean.
1. Range. It is the difference between the highest score and the lowest score in the
distribution.
R= highest score – lowest score
= HS-LS
2. Mean Deviation. It is the average distance between the mean and the scores in the
distribution.
MD=
X−X
N
Statistics EASSESS1
3. Variance. It is computed by squaring each deviation from the mean, adding them up,
and dividing by the number of cases.
Sample Variance:
X−X
2
s2 =
n −1
Population Variance
X−X
2
2 =
N
4. Standard Deviation. It is the square root of the variance
Ungouped Data
Sample SD
 (X − X )
2
s=
n −1
Population SD
 (X − X )
2
=
N
Alternate Formulas
Sample SD
( X ) 2
X 2
−
n
s=
n −1
Population SD
N  X 2 − ( X )
1
=
2
MEASURES OF LOCATION
numbers below which a specified amount or percentage of data must lie and are
oftentimes used to find the position of a specific piece of data in relation to the entire set
of data
Percentiles
◼ values that divide an ordered set of data into 100 equal parts
◼ the ith percentile (i=1,2,...,99) , denoted by Pi, is a value below which i% of the data
must lie
Statistics EASSESS1
To determine Pi, we have the following steps:
i. Arrange the data from lowest to highest.
ii. If ni/100 is a whole number, Pi is the mean of the mean of the (ni/100)th and (ni/100
+ 1)th ordered values.
iii. If ni/100 is not a whole number, Pi is the kth ordered value where k is the closest
whole number greater than ni/100.
Deciles
• values that divide an ordered set of data into 10 equal parts
• the ith decile (i=1,2,...,9) , denoted by Di, is a value below which 10i% of the data
must lie
Quartiles
• values that divide an ordered set of data into 4 equal parts
• the ith quartile (i=1,2,3) , denoted by Qi, is a value below which 25i% of the data
must lie
• MEASURES OF SKEWNESS
• refer to the degree of asymmetry, or departure from symmetry of a distribution;

it indicates not only the amount of skewness but also the direction
Examples of Symmetric Distributions
Two Types of Skewness
1. Positive Skewness or Skewness to the Right
• distribution tapers more to the right than to the left
• longer tail to the right
Mo Md 
Frequency distribution of a positively skewed data set

Statistics EASSESS1
2. Negative Skewness or Skewness to the Left
• distribution tapers more to the left than right
• longer tail to the left
Frequency distribution of a negatively skewed data set
Correlation (r)
• Correlation coefficient ranges from -1.0 to +1.0
• Correlations differ on two parameters: size and sign.
• Sign - can be positive or negative. Indicates the pattern of the relationship.
• Size - a correlation of 0.0 indicates the absence of a relationship; the closer the
correlation gets to 1.0, the stronger the relationship; a 1.0 indicates a perfect
relationship.
Scatterplots
• Scatterplots: graph depicting the relationship between two variables (X & Y). Each
mark in the scatterplot actually represents two scores, an individual’s scores on
the X and the Y variable.
Statistics EASSESS1
Qualitative Interpretation of Correlations
General Guidelines:
• < 0.30 Weak
• 0.30 - 0.70 Moderate
• 0.70 Strong
Statistics EASSESS1
Mr. Valid established the validity of his test using the test-retest method. He administered
the same test to the same students with one-month interval. The following are the scores
of the students:
Student A 1st administration 2nd administration
A 23 23
B 24 26
C 15 14
D 24 25
E 17 18
F 18 17
G 23 23
H 24 25
I 25 27
J 34 35
K 23 23
L 22 24

Statistics

Uploaded by

Copyright:

Available Formats

Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics

Uploaded by

Copyright:

Available Formats

Statistics EASSESS1

plural sense a set of numerical data

Population vs. Sample

Population a collection of all the elements under consideration in any statistical

Sample a part (or subset) of the population from which information is

Parameter vs. Statistic

Parameter a numerical characteristic of the population

Statistic a numerical characteristic of the sample

Descriptive Statistics comprise those methods concerned with collecting, describing ,

refers to the information collected, organized, analyzed, and interpreted by

Qualitative have labels or names assigned to their respective categories

Quantitative any attribute that we measure in numbers

Raw Data data in its original form

Measurement is defined as a set of rules for assigning numbers to represent objects,

• qualitative system for categorizing objects or people.

• numbers or symbols are used simply to classify an object, person, or

• the categories must be distinct, non-overlapping, and exhaustive

• weakest level of measurement

• allows you to rank people or objects according to the quantity of a characteristic.

Example: Graduation Class Rank - 1 = Valedictorian, 2 = Salutatorian, 3 = 3rd

• characterized by a common and constant unit of measurement

• units of measurement are arbitrary

Examples: IQs, GRE scores

• strongest level of measurement

- height in meters, feet, etc.

• Distribution: a set of scores.

• Raw Score Distributions

Ungrouped Frequency Distribution

Frequency Distribution Table

Single-Value Frequency Distribution For The School-Age Data

Frequency Distribution For Number of School-Age Children

Number of Number of Relative

Grouping by class intervals

Class interval the numbers defining a class

Relative percentage obtained by multiplying the relative frequency by 100%

Steps in Constructing a Frequency Distribution Table

1. Determine an adequate number of classes (K).

• not too many, not too few

• usually between 6 and 16

• classes should be non-overlapping

2. Determine the range (R).

3. Compute the ratio of R and K (C*).

5. List the class intervals.

6. Tally the frequency for each class.

Frequency distribution table for the achievement test scores of 4th

Scores Class Boundaries Class Number Percentage

10 – 19 9.5 − 19.5 14.5 6 7.79

20 – 29 19.5 − 29.5 24.5 5 6.49

30 – 39 29.5 − 39.5 34.5 14 18.18

40 – 49 39.5 − 49.5 44.5 11 14.29

50 – 59 49.5 − 59.5 54.5 17 22.08

60 – 69 59.5 − 69.5 64.5 15 19.48

70 – 79 69.5 − 79.5 74.5 4 5.19

80 – 89 79.5 − 89.5 84.5 2 2.60

90 - 99 89.5 − 99.5 94.5 3 3.90

MEASURES OF CENTRAL TENDENCY

A measure of central tendency is a single figure, which is representative of the general

Mean ( X ). It the average of the set of data

It is the sum of the scores divided by the number of scores.

1. It is appropriate for interval set of data