pr2-c4-l5

Lesson 5.
Planning Data Analyses Using Statistics
Purpose of Data Analysis Plan
The purpose of Data Analysis Plan is to gather useful information to find solutions to research questions
of interest. It may be used to :
● DESCRIBE DATA SETS;
● DETERMINE THE DEGREE OF RELATIONSHIP OF VARIABLES;
● DETERMINE DIFFERENCES BETWEEN VARIABLES;
● PREDICT OUTCOMES; AND
●ZCOMPARE VARIABLES
All of the above could be manipulated by using any or a combination of the following data analysis
strategies:
Exploratory Data Analysis
- This type of data analysis is used when it is not clear what to expect from the data. This strategy uses
numerical and visual presentations such as graphs. Since the research of interest is new, it is possible to
find some inconsistencies, such as missing values, distribution of the data or unusually small or too large
values or invalid data.
Descriptive Data Analysis
- This type of data analysis is used to describe, show or summarize data in a meaningful way, leading to a
simple interpretation of data. The commonly used descriptive statistics are those that analyze the
distribution of data such as frequency, percentage, measures of central tendency and measures of
dispersion.
Inferential Data Analysis
- Uses statistical techniques to extrapolate information from a smaller sample to make predictions and
draw conclusions about a larger population. It uses probability theory and statistical models to estimate
population parameters and test population hypotheses based on sample data.
Quantitative Analysis in Evaluation
Determining the level of measurement of the quantitative data is important before proceeding with
analysis of data. The choice of statistical measure/s to use is dependent on the level of measurement of
the data. The following are the levels of measurement scales:
Nominal Scale -- A nominal scale of measurement is used for labelling variables. It is sometimes called
categorical data. Basketball players wear sports shirts with numbers, but that is just a way to identify the
players. Likewise, if you want to categorize respondents based on gender, you could use 1 for male, and
2 for female. No order or distance is observed. The Yes or No scale is an example of nominal data. The
numbers assigned to the variables have no quantitative value. Some examples of variables measured on
a nominal scale are gender, religious affiliation, race or ethnic group.
Ordinal Scale -- An ordinal scale of measurement assigns order on items on the characteristics being
measured. It involves the ranking of individuals, attitudes and characteristics. The order in the honor roll
(first honor, second honor, third honor); order of agreement (strongly agree, agree, strongly disagree) or
economic status (low, average, high) are some examples. Numerical scores such as first, second, third
and so on are assigned but the numerical value or quantity has no value except its ability to establish
ranking among a set of data. You can talk about ordering, but differences in order between the ranks are
not specified.
Interval Scale -- The interval scale has equal units of measurement, thereby, making it possible to
interpret the order of the scale scores and the distance between them. However, interval scales do not
have a "true zero." With interval data, addition and subtraction are possible but you cannot multiply or
divide.
Ratio Scale -- Ratio scale is considered the highest level of measurement. It has the characteristics of an
interval scale but it has a zero point. Because of this property, all statistical operations can be performed
on ratio scales. All descriptive and inferential statistics may be applied. All variables can be added,
subtracted, multiplied, and divided.
A. Descriptive Data Analysis
1. Measures of Central Tendency

Suppose, senior high school students were asked how many hours they spent on the computer, and in
what subject they often used the computer for. Results of the survey could indicate that on the average,
the senior high school students spent two (2) or more hours with a range of one (1) to four (4) hours. A
typical senior high school student spent more than two hours studying his/her research subject using
the computer.
In the above example, the findings are presented as averages. The use of the phrase “on the average”
and the word “typical” denote that one is interested to determine the center or middle of a set of data.
The common measures of central tendency, sometimes called measures of location or center, include
the mean, median, and mode.
1.1 Mean
Often called the arithmetic average of a set of data, the mean is the sum of the observed values in the
distribution divided by the number of observations. It is frequently used for interval or ratio data. The
symbol x̄ (x bar) is used to denote the arithmetic mean.
The mean is calculated by summing up the observations (items, height, scores or responses) and dividing
by the number of observations.
A. For Ungrouped Data
B. For Grouped Data
When the observations are grouped into classes, the formula for grouped data is as follows:
1.2 The Weighted Mean
The weighted average or weighted mean is necessary in some situations. Suppose that you are given the
means of two or more measurements and wish to find the mean of all the measures combined into one
group. The formula is:
1.3 Median
The median is the midpoint of the distribution. It represents the point in the data where 50% of the
values fall below that point and 50% fall above it. When the distribution has an even number of
observations, the median is the average of the two middle scores. The median is the most appropriate
measure of central tendency for ordinal data.
The median may be calculated from ungrouped data by doing the following steps:
Arrange the items (scores, responses, observations) from lowest to highest.
Count to the middle value. For an odd number of values arranged from lowest to highest, the median
corresponds to value. If the array contains an even number of observations, the median is the average
of the two middle values.
Example 1:
Consider these odd numbers of numerical values:
7, 8, 8, 9, 10, 12, 23
By inspection, the median is 9 because half of the values (7, 8, 8) are below 9 and half (10, 12, 23) are
above 9. Since n = 7 is odd, the median has rank:
Median = 9
Example 2:
Consider these even numbers of numerical values:
12, 15, 18, 22, 30, 32
The two middle values are 18 and 22. If the average of the two middle numbers is taken, that is,
18+22=40 and divided by 2, the median is 20.
Median = 20
A. For Grouped Data
1.4 Mode
The mode is the most frequently occurring value in a set of observations. In cases where there is more
than one observation which is the highest but with equal frequency, the distribution is bimodal (with 2
highest observations) or multimodal with more than two highest observations. In cases where every
item has an equal number of observations, there is no mode. The mode is appropriate for nominal data.
Example 1:
The ages of fifteen (15) persons assembled in a room are as follows:
16, 18, 18, 25, 25, 25, 30, 34, 36, and 38.
Solution:
An age of 25 is the mode because it has been recorded three times in the sample, more than any other
age.
Answer: Mode = 25
Example 2:
The number of hours spent by 10 students in an internet café was as follows:
2, 2, 2, 3, 3, 4, 4, 4, 5, 5
Solution:
Both 2 and 4 have a frequency of 3. The data is therefore bimodal.

Answer: Mode = 2 and 4
Example 3:
Referring to the data on the distribution of the ages of 100 people interviewed for a survey on a topic on
national interest, the modal class is 31-40. The mode which corresponds to the class midpoint would be:
31 + 40 / 2 = 35.5
2. Measures of Dispersion
Suppose you ask a group of senior high school students to rate the quality of food at the school canteen
and you find out that the average rating is 3.5 using the following scale: 5 (Excellent); 4 (Very
Satisfactory); 3 (Satisfactory); 2 (Fair); 1 (Poor); How close are the ratings given by the students? Do their
ratings cluster around the middle point of 3, or are their ratings spread or dispersed, with some students
giving ratings of 1 and the rest giving ratings of 5?
The extent of the spread, or the dispersion of the data, is described by a group of measures called
measures of dispersion, also called measures of variability. The measures to be considered are the
range, average or mean deviation, standard deviation, and the variance.
2.1 The Range
The range is the difference between the largest and the smallest values in a set of data.
Consider the following scores obtained by ten (10) students participating in a mathematics contest: 6,
10, 12, 15, 18, 18, 20, 23, 25, 28
Thus, the range is 22. The scores range from 6 to 28.
2.2 Average (Mean) Deviation
This measure of spread is defined as the absolute difference or deviation between the values in a set of
data and the mean, divided by the total number of values in the set of data.
In mathematics, the term "absolute" represented by the sign "∣ ∣" simply means taking the value of a
number without regard to positive or negative sign.

The formula based on the definition is:
2.3 Standard Deviation
The standard deviation (SD) is a measure of the spread or variation of data about the mean.
SD is computed by calculating the average distance that the average value is from the mean.
The formula for calculating the standard deviation for ungrouped data is given by:
Example 1:
Let us consider the same data used in the illustration for using the range. The values are 6, 10, 12, 15,
18, 18, 20, 23, 25, 28.
Solution:
1. Compute the mean
2. Subtract the mean from each score
3. Square each difference from Step 2
4. Sum all the squares from Step 3
5. Divide the number in Step 4 by n−1. The number of items or scores is denoted by n. The quantity n−1
is called the degrees of freedom, a statistical concept that produces a more accurate estimate of the
data.
6. Compute the standard deviation using the formula below:
Interpretation of the Standard Deviation
The standard deviation allows you to reach conclusions about scores in the distribution. The following
conclusions can be reached if that distribution of scores is normal:
1. Approximately 68% of the scores in the sample falls within one standard deviation of the mean.
2. Approximately 95% of the scores in the sample falls within two standard deviations of the mean.
3. Approximately, 99% of the scores in the sample falls within three standard deviations of the mean.
4. In our example, with a mean of 17.5 and a standard deviation of 6.95, we can say that,
68% of the scores will fall in the range = (17.5−6.95) to (17.5+6.95)=10.5 to 24.45
5. Likewise, 95% of the scores will fall in the range = 17.5−(2×6.95) to 17.5+(2×6.95)=(17.5−13.9) to
(17.5+13.9)=3.6 to 31.4
B. Inferential Data Analysis
Inferential statistics refers to statistical measures and techniques that allow us to use samples to make
generalizations about the population from which the samples were drawn.
Below is a list of common statistical measures to measure significant differences and relationships
between variables.
1. Test of Significance of Difference (T-test)
Between Means - For independent samples (i.e., when the respondents consist of two different groups
such as boys and girls, working mothers and non-working mothers, healthy and malnourished children
and the like)
Analysis of Variance (ANOVA) - ANOVA is used when the significance of the difference of means of two
or more groups are to be determined at one time.
- One-Way Analysis of Variance
A typical ANOVA Table
ANOVA relies on the F-ratio to test the hypothesis that the two variances are equal; that is, the
subgroups are from the same population. "Between groups" refers to the variation between each group
mean and the grand or overall mean.
2. Tests of Relationship
Spearman Rank-Order Correlation or Spearman rho. This is used when data available are expressed in
terms of ranks (ordinal variable).
Chi-Square Test for Independence. This is used when data are expressed in terms of frequencies or
percentages (nominal variables).
Case 1: Multinomial
Case 2: Contingency Table
Product – Moment Coefficient of Correlation or Pearson. This is used when data are expressed in terms
of scores such as weights and heights or scores in a test (ratio or interval).
Case 1: When deviations from the mean are used
Case 2: When raw scores on the original observations are used
T-test to test the Significance of Pearson. The T-test is used to determine if the value of the computed
coefficient of correlation is significant. That is, does it represent a real correlation, or is it the obtained
coefficient of correlation merely brought about by:
where:
r = correlation coefficient
n = number of samples
The coefficient of determination (r2) can also be used to indicate what proportion of the total variation
in the dependent variable is explained by the linear relationship with the independent variable. You can
multiply by 100 to convert the coefficient of determination to percent.

pr2-c4-l5

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

pr2-c4-l5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

pr2-c4-l5

Uploaded by

Copyright:

Available Formats

Lesson 5.

Planning Data Analyses Using Statistics

Purpose of Data Analysis Plan

● DESCRIBE DATA SETS;

● DETERMINE THE DEGREE OF RELATIONSHIP OF VARIABLES;

● DETERMINE DIFFERENCES BETWEEN VARIABLES;

● PREDICT OUTCOMES; AND

Exploratory Data Analysis

Descriptive Data Analysis

Inferential Data Analysis

A. Descriptive Data Analysis

1. Measures of Central Tendency

A. For Ungrouped Data

B. For Grouped Data

1.2 The Weighted Mean

A. For Ungrouped Data

Arrange the items (scores, responses, observations) from lowest to highest.

Consider these odd numbers of numerical values:

Consider these even numbers of numerical values:

12, 15, 18, 22, 30, 32

The ages of fifteen (15) persons assembled in a room are as follows:

The number of hours spent by 10 students in an internet café was as follows:

Both 2 and 4 have a frequency of 3. The data is therefore bimodal.

2.1 The Range

Thus, the range is 22. The scores range from 6 to 28.

2.2 Average (Mean) Deviation

A. For Ungrouped Data

2.3 Standard Deviation

A. For Ungrouped Data

1. Compute the mean

2. Subtract the mean from each score

3. Square each difference from Step 2

4. Sum all the squares from Step 3

6. Compute the standard deviation using the formula below:

Interpretation of the Standard Deviation

B. Inferential Data Analysis

1. Test of Significance of Difference (T-test)

- One-Way Analysis of Variance

A typical ANOVA Table

Case 2: Contingency Table

Case 1: When deviations from the mean are used

Case 2: When raw scores on the original observations are used

You might also like