Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Statistics EASSESS1

Statistics

One does not need to be a statistical wizard to grasp the basic mathematical
concepts needed to understand major measurement issues.

Statistics

plural sense a set of numerical data

singular sense branch of science which deals with the, collection, presentation,
analysis and interpretation of data

Population vs. Sample

Population a collection of all the elements under consideration in any statistical


study

Sample a part (or subset) of the population from which information is


collected

Parameter vs. Statistic

Parameter a numerical characteristic of the population

Statistic a numerical characteristic of the sample

Areas of Statistics

Descriptive Statistics comprise those methods concerned with collecting, describing ,


and analyzing a set of data without drawing conclusions or inferences about a large
group

Inferential Statistics comprise those methods concerned with the analysis of sample
data leading to predictions or inferences about the population

Data

refers to the information collected, organized, analyzed, and interpreted by


researchers

Classification of Data

Qualitative have labels or names assigned to their respective categories

Examples:
Color - red, blue, yellow, green
Sex - male, female
Statistics EASSESS1

Quantitative any attribute that we measure in numbers

Examples:
weight - 160 lbs, 25 kg, 77 mg, etc.
height - 34 in., 5 cm, 5ft. 6 in., etc

Raw Data data in its original form

Array data arranged either from highest to lowest or from lowest to highest

Measurement

Measurement is defined as a set of rules for assigning numbers to represent objects,


traits, attributes, or behaviors.

Scales of Measurement

Nominal Scales:

• qualitative system for categorizing objects or people.

• numbers or symbols are used simply to classify an object, person, or


characteristics into categories

• the categories must be distinct, non-overlapping, and exhaustive

• weakest level of measurement

• Examples: Gender - Female =1, Male = 2; Eye Color - Brown =1, Blue =2,
Green = 3.

Ordinal Scales:

• allows you to rank people or objects according to the quantity of a characteristic.

• contains the properties of the nominal level but the numbers assigned to
categories of any variable may be ranked or ordered in some low - to - high
manner

Example: Graduation Class Rank - 1 = Valedictorian, 2 = Salutatorian, 3 = 3rd


Rank, etc..

Interval level

• contains the properties of the ordinal level but the distances between any two
numbers on the scale are of known sizes

• characterized by a common and constant unit of measurement

• units of measurement are arbitrary

• the number zero does not imply the absence of the characteristic under
consideration (thus, the zero point is arbitrary

Examples: IQs, GRE scores


Statistics EASSESS1

Ratio level

• contains the properties of the interval level but it has a true zero point, that is, the
number zero indicates the absence of the characteristic under consideration

• strongest level of measurement

Examples:

- height in meters, feet, etc.


- weight in kilograms, pounds, etc.

Distributions

• Distribution: a set of scores.

• Raw Score Distributions

• Frequency Distributions

Ungrouped Frequency Distribution


Grouped Frequency Distribution
• Frequency Graphs

Frequency Distribution Table

refers to the tabular arrangement of data by classes or categories together with the
number of observations falling with each class

1. Single-value grouping

• a form of frequency distribution where the distinct values are used as classes

Example:

The following data represent the number of school-age children from a sample of
30 families in a certain residential area.

0 0 3 2 0 2

0 1 4 4 1 1

0 0 3 3 0 0

2 1 1 0 2 0

2 0 0 2 1 2
Statistics EASSESS1

Single-Value Frequency Distribution For The School-Age Data

Frequency Distribution For Number of School-Age Children

Number of Number of Relative


School-Age
Children Families Frequency

0 12 0.40

1 6 0.20

2 7 0.23

3 3 0.10

4 2 0.07

Grouping by class intervals

Definitions:

Class interval the numbers defining a class

Class limits the smallest and the largest values than can fall in a given class

Class boundaries numbers that are halfway between the upper limit of a class and the
lower limit of the next class

Class size length of the class interval; computed by taking the difference
between two successive upper lower class boundaries or class
limits

Class mark midpoint of an interval; computed by taking the average of the lower
and upper class limits of a given class interval

Relative frequency obtained by dividing the class frequency by the total number of
observations

Relative percentage obtained by multiplying the relative frequency by 100%

Steps in Constructing a Frequency Distribution Table

1. Determine an adequate number of classes (K).

• not too many, not too few

• usually between 6 and 16

• classes should be non-overlapping

2. Determine the range (R).

R = highest – lowest
Statistics EASSESS1

3. Compute the ratio of R and K (C*).

C* = R/K

4. Determine the class size (C) by rounding-off C* to a number that is easy to work
with.

5. List the class intervals.

6. Tally the frequency for each class.

7. Sum the frequency column and check against the total number of observations.

Example :

The following are the scores of 4th year high school students in a certain
achievement test in Mathematics

11 14 14 14 16 17 20 24

25 25 28 30 30 31 31 33

34 34 35 35 37 37 37 38

39 41 41 42 44 44 44 45

47 47 47 47 51 53 53 54

54 55 55 56 56 56 57 57

58 58 58 58 59 60 60 60

61 62 62 62 65 66 66 66

66 67 68 68 74 75 76 76

81 87 92 92 97
Statistics EASSESS1

Frequency distribution table for the achievement test scores of 4th


year high school students

Scores Class Boundaries Class Number Percentage


Mark of
students

10 – 19 9.5 − 19.5 14.5 6 7.79

20 – 29 19.5 − 29.5 24.5 5 6.49

30 – 39 29.5 − 39.5 34.5 14 18.18

40 – 49 39.5 − 49.5 44.5 11 14.29

50 – 59 49.5 − 59.5 54.5 17 22.08

60 – 69 59.5 − 69.5 64.5 15 19.48

70 – 79 69.5 − 79.5 74.5 4 5.19

80 – 89 79.5 − 89.5 84.5 2 2.60

90 - 99 89.5 − 99.5 94.5 3 3.90

DESCRIPTIVE STATISTICS

MEASURES OF CENTRAL TENDENCY

A measure of central tendency is a single figure, which is representative of the general


level of magnitudes or values of the items in the set of data.

There are three most commonly used measure of central tendency: the mean, the median
and the mode.

Mean ( X ). It the average of the set of data

It is the sum of the scores divided by the number of scores.

Properties of Mean

1. The mean is sensitive to the exact values of all the scores in the distribution
2. The sum of the deviations about the mean is zero.
3. The mean is very sensitive to the extreme scores when the scores are not
balanced at both ends of the distribution
4. The sum of the squared deviations of all the scores about the mean is a
minimum.
5. Among the measures of central tendency, the mean is least subject to sampling
variation.
Statistics EASSESS1

When to Use

1. It is appropriate for interval set of data


2. When further statistical computation is expected
3. When the set is normally distributed

Mean for Ungrouped Data


N

X X 1 + X 2 + X 3 + ... + X N
X = i =1
=
N N

Where: X- scores

N- number of cases

weighted Mean ( X w )

Sometimes values are not equally important in a distribution, in order to give these
quantities equal importance. It is necessary to assign weights and then calculate the
weighted mean.

Xw =
w1 X 1 + w2 X 2 + ... + wN X N
=
w X N N

w1 + w2 + ... + wN W

Mean for Grouped Data

1. Using Classmark

X=
 fX where: X- classmark
N

2. Using the Code Method

  fu 
X = X0 + i
 N 
 

Where: X0 classmark with the code 0

U code

f frequency

i class width

N total number of cases


Statistics EASSESS1

Median ( X )

The median is the middle value when the data is arranged in ascending or
descending order.

N +1
To determine the position of the median, use .
2

Properties of the median

1. The median is less sensitive than the mean


2. Under usual circumstances, the median is more subject to sampling variability
than the mean but less to sampling variability than the mode.
When to Use

1. It is appropriate for ordinal data


2. If the middle score in the distribution is desired
3. If we avoid the influence of the extreme values

Median for Grouped data

N 
 −  F
X = LB +  i
2
f

Where: LB Lower Boundary of the

median class

<F Less than cumulative

frequency ( Below)

f Frequency

N Number of cases

I class width

Mode

The value in a set that has the highest frequency.

When to use

1. It is appropriate for nominal data


2. If a quick approximation of the central tendency is desired
3. If the most frequently occurring score is needed
Statistics EASSESS1

Mode for Grouped Data

 1 
X = LB +   i
 1 +  2 

Where: LB modal class of modal class

1 difference between the highest frequency and the frequency

just above it

2 difference between the highest frequency and the frequency

just below it

i class width

True Mode

X = 3Median − 2Mean = 3 X − 2 X

Crude Mode

Midpoint of the class interval with the highest frequency

Measures of Variability

The measure of variability is a value that that describes how far scores are spread apart.

A deviation score or value tells us how far away the raw score or value departs from the
mean.

1. Range. It is the difference between the highest score and the lowest score in the
distribution.
R= highest score – lowest score

= HS-LS

2. Mean Deviation. It is the average distance between the mean and the scores in the
distribution.

MD=
X−X
N
Statistics EASSESS1

3. Variance. It is computed by squaring each deviation from the mean, adding them up,
and dividing by the number of cases.

Sample Variance:

X−X
2

s2 =
n −1

Population Variance

X−X
2

2 =
N

4. Standard Deviation. It is the square root of the variance

Ungouped Data

Sample SD

 (X − X )
2

s=
n −1

Population SD

 (X − X )
2

=
N

Alternate Formulas

Sample SD

( X ) 2

X 2

n
s=
n −1

Population SD

N  X 2 − ( X )
1
=
2

MEASURES OF LOCATION

numbers below which a specified amount or percentage of data must lie and are
oftentimes used to find the position of a specific piece of data in relation to the entire set
of data

Percentiles

◼ values that divide an ordered set of data into 100 equal parts

◼ the ith percentile (i=1,2,...,99) , denoted by Pi, is a value below which i% of the data
must lie
Statistics EASSESS1

To determine Pi, we have the following steps:

i. Arrange the data from lowest to highest.

ii. If ni/100 is a whole number, Pi is the mean of the mean of the (ni/100)th and (ni/100
+ 1)th ordered values.

iii. If ni/100 is not a whole number, Pi is the kth ordered value where k is the closest
whole number greater than ni/100.

Deciles

• values that divide an ordered set of data into 10 equal parts

• the ith decile (i=1,2,...,9) , denoted by Di, is a value below which 10i% of the data
must lie

Quartiles

• values that divide an ordered set of data into 4 equal parts

• the ith quartile (i=1,2,3) , denoted by Qi, is a value below which 25i% of the data
must lie

• MEASURES OF SKEWNESS

• refer to the degree of asymmetry, or departure from symmetry of a distribution;


it indicates not only the amount of skewness but also the direction

Examples of Symmetric Distributions

Two Types of Skewness

1. Positive Skewness or Skewness to the Right

• distribution tapers more to the right than to the left

• longer tail to the right

Mo Md 

Frequency distribution of a positively skewed data set


Statistics EASSESS1

2. Negative Skewness or Skewness to the Left

• distribution tapers more to the left than right

• longer tail to the left

Frequency distribution of a negatively skewed data set

Correlation (r)

• Correlation coefficient ranges from -1.0 to +1.0

• Correlations differ on two parameters: size and sign.

• Sign - can be positive or negative. Indicates the pattern of the relationship.

• Size - a correlation of 0.0 indicates the absence of a relationship; the closer the
correlation gets to 1.0, the stronger the relationship; a 1.0 indicates a perfect
relationship.

Scatterplots

• Scatterplots: graph depicting the relationship between two variables (X & Y). Each
mark in the scatterplot actually represents two scores, an individual’s scores on
the X and the Y variable.
Statistics EASSESS1

Qualitative Interpretation of Correlations

General Guidelines:

• < 0.30 Weak

• 0.30 - 0.70 Moderate

• 0.70 Strong
Statistics EASSESS1

Mr. Valid established the validity of his test using the test-retest method. He administered
the same test to the same students with one-month interval. The following are the scores
of the students:

Student A 1st administration 2nd administration

A 23 23
B 24 26
C 15 14
D 24 25
E 17 18
F 18 17
G 23 23
H 24 25
I 25 27
J 34 35
K 23 23
L 22 24

You might also like