Chapter Five_Processing and analysis
Chapter Five_Processing and analysis
2
Introduction
Data analysis refer to computation of certain measures along
3
Data processing
1. Editing
Editing of data is examining the collected raw data
to detect errors and omissions and correct these
when possible.
Editing involves a careful examination of the
completed questionnaires and/schedules.
Editing is done to assure that the data are accurate,
consistent with other faces gathered, uniformly
entered, as completed as possible and have been
well arranged to facilitate coding and tabulation.
4
Data processing
1. Editing
Editing can be done at field of office:
Field Editing: reviewing of reporting form during data
collection. (translating, rewriting, abbreviation, etc)
Central editing: it taken place when all forms or schedules
have been completed and returned to the office. Editor(s)
may correct the obvious errors such as:
An entry in the wrong place,
Identify missing value,
Strike out the answer if it is inappropriate.
5
Data processing
2. Coding
Coding refers to the process of assigning numerical
or other symbols to answers so that responses can
be put into a limited number of categories or
classes.
Such classes should be appropriate to the research
problem under consideration.
Coding decision should be usually be taken at the
designing stage of the questionnaire.
This makes it possible to pre-code the questionnaire
choices and which in turn is helpful for computer
6
tabulation.
Data processing
3. Classification
Most research studies result in a large volume of raw
data which must be reduced way the entire data get
divided into number of groups or classes.
Classification can be categorized as:
Classification according to attribute
Classification according to class interval
7
Data processing
A. Classification according to attributes
Data are classified based on common characteristics
which can either be descriptive (such as literacy, sex,
honesty, etc.) or numerical (such as weight, height,
income, etc.).
Descriptive characteristics refer to qualitative
phenomenon which cannot be measured quantitatively;
only their presence or absence in an individual item can
be noticed.
Data obtained this way on the basis of certain attributes
are known as statistics of attributes and their classification
is said to be classification according to attributes.
8
Data processing
A. Classification according to attributes …
Such classification can be simple of manifold
classification.
Simple Classification: we consider only one attribute and
divided the universe into two classes (one class consisting
of items possessing the given attribute and the other class
consisting of items which do not possess the given
attribute).
Manifold Classification we consider two or more
attributes simultaneously, and divided that data into a
number of classes (Total Number of class = 2n, where n
is a number of attributes).
9
Data processing
B. Classification According to Class-Interval
The numerical characteristics refer to quantitative
phenomenon which can be measured through some statistical
units.
Such data are known as statistical of variables and are classified
on the basis of class intervals.
The entire data may be divided into a number of groups or
classes or class intervals.
Each group of class-interval, thus, has an upper limit as well
as a lower limit which are known as class limits.
The difference between the two class limits is known as class
magnitude.
10
Data processing
B. Classification According to Class-Interval
We may classes with equal classes magnitudes or
with unequal class magnitudes.
The number of items fall in a given class is known as
the frequencies of the given class.
Classification according to class-interval usually
involves the following three main problems:
1. How many classes should be there? What should be
their magnitudes?
2. How to choose class limits?
3. How to determine the frequency of each class
11
Data processing
Weight
(Binned) Cumulative
Label Frequency Percent Percent
Below 30 429 4.3 4.3
31 – 50 3545 35.3 39.6
51 – 70 5700 56.8 96.4
Above 70 357 3.6 100.0
Total 10031 100.0
12
Data processing
Age Cumulative
(Binned) Frequency Percent Percent
1 – 14 331 3.3 3.3
15 – 24 1563 15.6 18.9
25 – 49 6478 64.5 83.5
50 – 64 1193 11.9 95.4
Above 64 466 4.6 100.0
Total 10031 100.0
13
Data processing
4. Tabulation
When a mass of data has been assembled, it becomes
necessary for the research to arrange the same in
some kind of concise and logical order. This procedure
is refer to as tabulation.
Tabulation is the process of summarizing raw data and
displaying the same in compact form for farther
analysis.
Tabulation is an orderly arrangement of data in
columns and rows.
14
Data processing
4. Tabulation
Tabulation is essential because of the following
reasons:
A. It conserves space and reduces explanatory and
descriptive statement to a minimum.
B. It facilitates the process of comparison.
C. It facilitates the summation of items and the decision
of errors and omissions.
D. It provides a basis for various statistical computations.
15
Analysis of Data
Analysis is computation of certain indices or measures
along with searching for patterns of relationship that
exist among the data groups.
Analysis, particularly in case of survey or experimental
data, involves estimating the values of unknown
parameters of the population and testing of hypotheses
for drawing inferences.
Analysis can be categorized as:
Descriptive analysis and
Inferential analysis
16
Analysis of Data
Descriptive Analysis
This study provides us with profiles of companies,
work groups, persons and other subjects on any of a
multiple of characteristics such as size.
This sort of analysis may be in respect of one, two, or
more than two variables.
With studies workout various measures that show the
size and shape of a distribution(s) along with the study
of measuring relationships between variables.
We use correlation analysis and casual analysis.
17
Analysis of Data
Descriptive Analysis
Correlation analysis is studies the joint variation of
two or more variables for determining the amount of
correlation between two or more variables.
Casual analysis: is concerned with the study of how
one or more variables affect changes in another
variable. It is a study of functional relationships
existing between two or more variables. This analysis
can be termed as regression analysis.
18
Analysis of Data
Inferential Analysis
It is concerned with the various tests of significance for
testing hypotheses in order to determine with what
validity data can be said to indicate some
conclusion(s).
It is also concerned with the estimation of population
values.
It is manly on the basis of inferential analysis that the
task of interpretation is performed.
Interpretation is task of drawing inferences and
conclusion).
19
What is Research Hypothesis?
Definitions:
1. Hypothesis is a tentative intelligent guess postulating from
the purpose of directing the researcher towards the
solution of problem.
2. It is a statement which predicts the relationship between
two or more variables.
3. It is necessary link between theory and investigation,
usually stated after an extensive survey of the literature.
4. Science Research starts from Problem but solution starts
from Hypothesis.
20
Hypothesis Contd…
Formulation of Research Hypothesis:
21
Hypothesis Testing
Hypothesis testing is the use of statistics to determine
the probability that a given hypothesis is true.
Is also called significance testing
Tests a claim about a parameter using evidence (data in a
sample)
Goal is to make statement(s) regarding unknown
population parameter values based on sample data
22
Hypothesis Testing
Elements of a hypothesis test:
Null hypothesis - Statement regarding the value(s) of
unknown parameter(s). Typically will imply no association
between explanatory and response variables in our
applications (will always contain an equality)
Alternative hypothesis - Statement contradictory to the
null hypothesis (will always contain an inequality)
Test statistic - Quantity based on sample data and null
hypothesis used to test between null and alternative
hypotheses
Rejection region - Values of the test statistic for which we
reject the null in favor of the alternative hypothesis
23
Hypothesis Testing Steps
A. Null and alternative hypotheses
B. Test statistic
C. P-value and interpretation
D. Significance level (optional)
24
Hypothesis Testing
Hypothesis can appear in your report on either:
1. Deductive form: makes positive statement about the outcome of the study.
It can be in the form of directional or non-directional.
Directional: stipulate (specify) the direction of the expected results.
Ex 1: The performance of young employees is significantly higher than
those who are experienced and old.
Ex 2: The security level of latest version of OS is significantly
better than the older version of OS with cheaper price.
Non- directional: Does not specify the direction of expected difference
or relationship.
Ex 1: There is a difference in performance of employees who are young and
those who are old and experienced.
Ex 2: There is a significant security level difference between the
latest version of OS and older version of OS with cheaper price
25
Hypothesis Testing
Hypothesis can appear in your report on either:
2. Null Form (of Hypothesis): Makes a statement that states no
relationship.
Ex 1: There is no significant difference in the performance of
employees between those who are young and old
Ex 2: There is no significant security level difference between
the latest version of OS and the older version of OS
with cheaper price
3. Question form (of Hypothesis): Put the hypothesis in question form.
Ex 1: Does the change in the experience of the employees affect the
performance of employees?
Ex 2. Does the change in the version of OS affects the security
level of OS
26
Hypothesis can appear in your report on either:
4. Alternative hypothesis: This signifies a statement
written opposite to the null form, i.e., when the final
decision is made at a given significance level if the null hypothesis
is to rejected alternative hypothesis gets accepted, the alternative
hypothesis too on equal footing along with the null hypothesis
provides direction to the research.
Ex 1: There is significant difference in the performance of
employees between those who are young and old
Ex 2: There is significant security level difference
between those OS which is latest and old
27