pr2-c4-l5
pr2-c4-l5
pr2-c4-l5
The purpose of Data Analysis Plan is to gather useful information to find solutions to research questions
of interest. It may be used to :
●ZCOMPARE VARIABLES
All of the above could be manipulated by using any or a combination of the following data analysis
strategies:
- This type of data analysis is used when it is not clear what to expect from the data. This strategy uses
numerical and visual presentations such as graphs. Since the research of interest is new, it is possible to
find some inconsistencies, such as missing values, distribution of the data or unusually small or too large
values or invalid data.
- This type of data analysis is used to describe, show or summarize data in a meaningful way, leading to a
simple interpretation of data. The commonly used descriptive statistics are those that analyze the
distribution of data such as frequency, percentage, measures of central tendency and measures of
dispersion.
- Uses statistical techniques to extrapolate information from a smaller sample to make predictions and
draw conclusions about a larger population. It uses probability theory and statistical models to estimate
population parameters and test population hypotheses based on sample data.
Quantitative Analysis in Evaluation
Determining the level of measurement of the quantitative data is important before proceeding with
analysis of data. The choice of statistical measure/s to use is dependent on the level of measurement of
the data. The following are the levels of measurement scales:
Nominal Scale -- A nominal scale of measurement is used for labelling variables. It is sometimes called
categorical data. Basketball players wear sports shirts with numbers, but that is just a way to identify the
players. Likewise, if you want to categorize respondents based on gender, you could use 1 for male, and
2 for female. No order or distance is observed. The Yes or No scale is an example of nominal data. The
numbers assigned to the variables have no quantitative value. Some examples of variables measured on
a nominal scale are gender, religious affiliation, race or ethnic group.
Ordinal Scale -- An ordinal scale of measurement assigns order on items on the characteristics being
measured. It involves the ranking of individuals, attitudes and characteristics. The order in the honor roll
(first honor, second honor, third honor); order of agreement (strongly agree, agree, strongly disagree) or
economic status (low, average, high) are some examples. Numerical scores such as first, second, third
and so on are assigned but the numerical value or quantity has no value except its ability to establish
ranking among a set of data. You can talk about ordering, but differences in order between the ranks are
not specified.
Interval Scale -- The interval scale has equal units of measurement, thereby, making it possible to
interpret the order of the scale scores and the distance between them. However, interval scales do not
have a "true zero." With interval data, addition and subtraction are possible but you cannot multiply or
divide.
Ratio Scale -- Ratio scale is considered the highest level of measurement. It has the characteristics of an
interval scale but it has a zero point. Because of this property, all statistical operations can be performed
on ratio scales. All descriptive and inferential statistics may be applied. All variables can be added,
subtracted, multiplied, and divided.
In the above example, the findings are presented as averages. The use of the phrase “on the average”
and the word “typical” denote that one is interested to determine the center or middle of a set of data.
The common measures of central tendency, sometimes called measures of location or center, include
the mean, median, and mode.
1.1 Mean
Often called the arithmetic average of a set of data, the mean is the sum of the observed values in the
distribution divided by the number of observations. It is frequently used for interval or ratio data. The
symbol x̄ (x bar) is used to denote the arithmetic mean.
The mean is calculated by summing up the observations (items, height, scores or responses) and dividing
by the number of observations.
When the observations are grouped into classes, the formula for grouped data is as follows:
The weighted average or weighted mean is necessary in some situations. Suppose that you are given the
means of two or more measurements and wish to find the mean of all the measures combined into one
group. The formula is:
1.3 Median
The median is the midpoint of the distribution. It represents the point in the data where 50% of the
values fall below that point and 50% fall above it. When the distribution has an even number of
observations, the median is the average of the two middle scores. The median is the most appropriate
measure of central tendency for ordinal data.
The median may be calculated from ungrouped data by doing the following steps:
Count to the middle value. For an odd number of values arranged from lowest to highest, the median
corresponds to value. If the array contains an even number of observations, the median is the average
of the two middle values.
Example 1:
7, 8, 8, 9, 10, 12, 23
By inspection, the median is 9 because half of the values (7, 8, 8) are below 9 and half (10, 12, 23) are
above 9. Since n = 7 is odd, the median has rank:
Median = 9
Example 2:
The two middle values are 18 and 22. If the average of the two middle numbers is taken, that is,
18+22=40 and divided by 2, the median is 20.
Median = 20
A. For Grouped Data
1.4 Mode
The mode is the most frequently occurring value in a set of observations. In cases where there is more
than one observation which is the highest but with equal frequency, the distribution is bimodal (with 2
highest observations) or multimodal with more than two highest observations. In cases where every
item has an equal number of observations, there is no mode. The mode is appropriate for nominal data.
Example 1:
16, 18, 18, 25, 25, 25, 30, 34, 36, and 38.
Solution:
An age of 25 is the mode because it has been recorded three times in the sample, more than any other
age.
Answer: Mode = 25
Example 2:
2, 2, 2, 3, 3, 4, 4, 4, 5, 5
Solution:
Example 3:
Referring to the data on the distribution of the ages of 100 people interviewed for a survey on a topic on
national interest, the modal class is 31-40. The mode which corresponds to the class midpoint would be:
31 + 40 / 2 = 35.5
2. Measures of Dispersion
Suppose you ask a group of senior high school students to rate the quality of food at the school canteen
and you find out that the average rating is 3.5 using the following scale: 5 (Excellent); 4 (Very
Satisfactory); 3 (Satisfactory); 2 (Fair); 1 (Poor); How close are the ratings given by the students? Do their
ratings cluster around the middle point of 3, or are their ratings spread or dispersed, with some students
giving ratings of 1 and the rest giving ratings of 5?
The extent of the spread, or the dispersion of the data, is described by a group of measures called
measures of dispersion, also called measures of variability. The measures to be considered are the
range, average or mean deviation, standard deviation, and the variance.
The range is the difference between the largest and the smallest values in a set of data.
Consider the following scores obtained by ten (10) students participating in a mathematics contest: 6,
10, 12, 15, 18, 18, 20, 23, 25, 28
This measure of spread is defined as the absolute difference or deviation between the values in a set of
data and the mean, divided by the total number of values in the set of data.
In mathematics, the term "absolute" represented by the sign "∣ ∣" simply means taking the value of a
number without regard to positive or negative sign.
The standard deviation (SD) is a measure of the spread or variation of data about the mean.
SD is computed by calculating the average distance that the average value is from the mean.
The formula for calculating the standard deviation for ungrouped data is given by:
Example 1:
Let us consider the same data used in the illustration for using the range. The values are 6, 10, 12, 15,
18, 18, 20, 23, 25, 28.
Solution:
5. Divide the number in Step 4 by n−1. The number of items or scores is denoted by n. The quantity n−1
is called the degrees of freedom, a statistical concept that produces a more accurate estimate of the
data.
The standard deviation allows you to reach conclusions about scores in the distribution. The following
conclusions can be reached if that distribution of scores is normal:
1. Approximately 68% of the scores in the sample falls within one standard deviation of the mean.
2. Approximately 95% of the scores in the sample falls within two standard deviations of the mean.
3. Approximately, 99% of the scores in the sample falls within three standard deviations of the mean.
4. In our example, with a mean of 17.5 and a standard deviation of 6.95, we can say that,
68% of the scores will fall in the range = (17.5−6.95) to (17.5+6.95)=10.5 to 24.45
5. Likewise, 95% of the scores will fall in the range = 17.5−(2×6.95) to 17.5+(2×6.95)=(17.5−13.9) to
(17.5+13.9)=3.6 to 31.4
Inferential statistics refers to statistical measures and techniques that allow us to use samples to make
generalizations about the population from which the samples were drawn.
Below is a list of common statistical measures to measure significant differences and relationships
between variables.
Between Means - For independent samples (i.e., when the respondents consist of two different groups
such as boys and girls, working mothers and non-working mothers, healthy and malnourished children
and the like)
Analysis of Variance (ANOVA) - ANOVA is used when the significance of the difference of means of two
or more groups are to be determined at one time.
ANOVA relies on the F-ratio to test the hypothesis that the two variances are equal; that is, the
subgroups are from the same population. "Between groups" refers to the variation between each group
mean and the grand or overall mean.
2. Tests of Relationship
Spearman Rank-Order Correlation or Spearman rho. This is used when data available are expressed in
terms of ranks (ordinal variable).
Chi-Square Test for Independence. This is used when data are expressed in terms of frequencies or
percentages (nominal variables).
Case 1: Multinomial
Product – Moment Coefficient of Correlation or Pearson. This is used when data are expressed in terms
of scores such as weights and heights or scores in a test (ratio or interval).
T-test to test the Significance of Pearson. The T-test is used to determine if the value of the computed
coefficient of correlation is significant. That is, does it represent a real correlation, or is it the obtained
coefficient of correlation merely brought about by:
where:
r = correlation coefficient
n = number of samples
The coefficient of determination (r2) can also be used to indicate what proportion of the total variation
in the dependent variable is explained by the linear relationship with the independent variable. You can
multiply by 100 to convert the coefficient of determination to percent.