Esson: Introduction To Quantitative Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

LESSON 1

INTRODUCTION TO QUANTITATIVE METHODS

TOPICS
1.1. Data Collection
1.2. Primary and Secondary Data, Qualitative and Quantitative Data
1.3. Assessment of Quantitative Data
1.4. Data Processing

LEARNING OUTCOMES
At the end of the lesson, the students should be able to:
1. identify the different methods in collecting data and it’s level of
measurement
2. differentiate the use of quantitative and qualitative data
3. apply the different process on assessing and presenting collected
data
4.

TOPIC 1.1: DATA COLLECTION

Quantitative data is defined as the value of


data in the form of counts or numbers where
each data-set has a unique numerical value
associated with it.

This data is any quantifiable information that


can be used for mathematical calculations and
statistical analysis, such that real-life decisions
can be made based on these mathematical
derivations. Quantitative data is used to
answer questions such as “How many?”,
“How often?”, “How much?”. This data can be
verified and can also be conveniently
evaluated using mathematical techniques.

For example, there are quantities corresponding to various parameters, for instance, “How much did that
laptop cost?” is a question which will collect quantitative data. There are values associated with most
measuring parameters such as pounds or kilograms for weight, dollars for cost etc.

Quantitative data makes measuring various parameters controllable due to the ease of mathematical
derivations they come with.

1
Quantitative data is usually collected for statistical analysis using surveys, polls or questionnaires sent
across to a specific section of a population. The retrieved results can be established across a population.

COLLECTION METHOD
As quantitative data is in the form of numbers, mathematical and statistical analysis of these numbers
can lead to establishing some conclusive results.

There are two main Quantitative Data Collection Methods:

1. Surveys.

Traditionally, surveys were conducted using paper-


based methods and have gradually evolved into
online mediums. Closed-ended questions form a
major part of these surveys as they are more
effective in collecting quantitative data.

The survey makes include answer options which


they think are the most appropriate for a particular
question. Surveys are integral in collecting feedback
from an audience which is larger than the
conventional size.

A critical factor about surveys is that the responses collected should be such that they can be
generalized to the entire population without significant discrepancies.

PRINCIPLES IN MANAGING SURVEY TO COLLECT QUANTITATIVE DATA

 Fundamental Levels of Measurement. There are four measurement scales which are
fundamental to creating a multiple-choice question in a survey in collecting quantitative

2
data. They are, nominal, ordinal, interval and ratio measurement scales without the
fundamentals of which, no multiple choice questions can be created.

LEVELS OF MEASUREMENT

Level 01. Nominal Scale: 1st Level Measurement

Nominal Scale also called the categorical variable scale. Defined as a scale used for labelling variables
into distinct classifications and doesn’t involve a quantitative value or order.

This scale is the simplest of the four variable measurement scales. Calculations done on these variables
will be futile as there is no numerical value of the options. There are cases where this scale is used for
the purpose of classification – the numbers associated with variables of this scale are only tags for
categorization or division.

Example question for Nominal Scale:

Where do you live?

1 – Suburbs

2 – City

3 – Town

Two primary ways of collecting Nominal Scale Data

1. By asking an open-ended question, the answers of which can be coded to a


respective number of label decided by the researcher.

2. The other alternative to collect nominal data is to include a multiple choice question
in which the answer will be labeled.

Nominal Scales Examples

 Gender

 Political preferences

3
 Place of residence

What is your Gender? What is your Political preference? Where do you live?
 1 – Independent  1 – Suburbs
 Male
 2 – Democrat  2 – City
 Female
 3 – Republican  3 – Town

Level 02. Ordinal Scale: 2nd Level Measurement

Ordinal scale is defined as a variable measurement scale used to simply depict the order of variables and
not the difference between each of the variables. These scales are generally used to depict non-
mathematical ideas such as frequency, satisfaction, happiness, a degree of pain, etc. It is quite
straightforward to remember the implementation of this scale as ‘Ordinal’ sounds similar to ‘Order’,
which is exactly the purpose of this scale.

Ordinal Scale maintains descriptional qualities along with an intrinsic order but is void of an origin of
scale and thus, the distance between variables can’t be calculated. Descriptional qualities indicate
tagging properties similar to the nominal scale, in addition to which, the ordinal scale also has a relative
position of variables. Origin of this scale is absent due to which there is no fixed start or “true zero”.

Ordinal Scale Examples

Status at workplace, tournament team rankings, order of product quality, and order of agreement or
satisfaction are some of the most common examples of the ordinal Scale. These scales are generally used
in market research to gather and evaluate relative feedback about product satisfaction, changing
perceptions with product upgrades, etc.

For example, a semantic differential scale question such as:

How satisfied are you with our services?

 Very Unsatisfied – 1

 Unsatisfied – 2

 Neutral – 3

 Satisfied – 4

 Very Satisfied – 5

1. Here, the order of variables is of prime importance and so is the labeling. Very
unsatisfied will always be worse than unsatisfied and satisfied will be worse than very
satisfied.

2. This is where ordinal scale is a step above nominal scale – the order is relevant to the
results and so is their naming.

3. Analyzing results based on the order along with the name becomes a convenient
process for the researcher.

4. If they intend to obtain more information than what they would collect using a nominal
scale, they can use the ordinal scale.

This scale not only assigns values to the variables but also measures the rank or order of the variables,
such as:

4
 Grades

 Satisfaction

 Happiness

Ordinal Data and Analysis

Ordinal scale data can be presented in tabular or graphical formats for a researcher to conduct a
convenient analysis of collected data. Also, methods such as Mann-Whitney U test and Kruskal–Wallis H
test can also be used to analyze ordinal data. These methods are generally implemented to compare
two or more ordinal groups.

In the Mann-Whitney U test, researchers can conclude which variable of one group is bigger or smaller
than another variable of a randomly selected group. While in the Kruskal–Wallis H test, researchers can
analyze whether two or more ordinal groups have the same median or not.

Graphical Format

Mann-Whitney U Test in SPSS Statistics

Kruskal-Wallis H Test in SPSS Statistics


Level 03. Interval Scale: 3rd Level Measurement

Interval scale is defined as a numerical scale where the order of the variables is known as well as the
difference between these variables. Variables that have familiar, constant, and computable differences
are classified using the Interval scale. It is easy to remember the primary role of this scale too, ‘Interval’
indicates ‘distance between two entities’, which is what Interval scale helps in achieving.

5
Interval scale contains all the properties of the ordinal scale, in addition to which, it offers a calculation
of the difference between variables. The main characteristic of this scale is the equidistant difference
between objects.

For instance, consider a Celsius/Fahrenheit temperature scale –

 80 degrees is always higher than 50 degrees and the difference between


these two temperatures is the same as the difference between 70 degrees
and 40 degrees.

 Also, the value of 0 is arbitrary because negative values of temperature do


exist – which makes the Celsius/Fahrenheit temperature scale a classic
example of an interval scale.

 Interval scale is often chosen in research cases where the difference


between variables is a mandate – which can’t be achieved using a nominal
or ordinal scale. The Interval scale quantifies the difference between two
variables whereas the other two scales are solely capable of associating
qualitative values with variables.

 The mean and median values in an ordinal scale can be evaluated, unlike
the previous two scales.

 In statistics, interval scale is frequently used as a numerical value can not


only be assigned to variables but calculation on the basis of those values can
also be carried out.

Even if interval scales are amazing, they do not calculate the “true zero” value which is why the next
scale comes into the picture.

Interval Data and Analysis

All the techniques applicable to nominal and ordinal data analysis are applicable to Interval Data as well.
Apart from those techniques, there are a few analysis methods such as descriptive statistics, correlation
regression analysis which is extensively for analyzing interval data.

Descriptive statistics is the term given to the analysis of numerical data which helps to describe, depict,
or summarize data in a meaningful manner and it helps in calculation of mean, median, and mode.

Interval Scale Examples

 There are situations where attitude scales are considered to be interval


scales.

 Apart from the temperature scale, time is also a very common example of
an interval

 Scale as the values are already established, constant, and measurable.

 Calendar years and time also fall under this category of measurement
scales.

 Likert scale, Net Promoter Score, Semantic Differential Scale, Bipolar Matrix
Table, etc. are the most-used interval scale examples.

The following questions fall under the Interval Scale category:

 What is your family income?

 What is the temperature in your city?

Level 04. Ratio Scale: 4th Level Measurement

6
Ratio Scale is defined as a variable measurement scale which not only produces the order of variables
but also makes the difference between variables known along with information on the value of true
zero. It is calculated by assuming that the variables have an option for zero, the difference between the
two variables is the same and there is a specific order between the options.

With the option of true zero, varied inferential, and descriptive analysis techniques can be applied to the
variables. The best examples of ratio scales are weight and height. In market research, a ratio scale is
used to calculate market share, annual sales, the price of an upcoming product, the number of
consumers, etc.

 Ratio scale provides the most detailed information as researchers and


statisticians can calculate the central tendency using statistical techniques
such as mean, median, mode, and methods such as geometric mean, the
coefficient of variation, or harmonic mean can also be used on this scale.

 Ratio scale accommodates the characteristic of three other variable


measurement scales, i.e. labeling the variables, the significance of the order
of variables, and a calculable difference between variables (which are
usually equidistant).

 Because of the existence of true zero value, the ratio scale doesn’t have negative values.

 To decide when to use a ratio scale, the researcher must observe whether the variables have all
the characteristics of an interval scale along with the presence of the absolute zero value.

 Mean, mode and median can be calculated using the ratio scale.

Ratio Scale Examples

The following questions fall under the Ratio Scale category:

 What is your daughter’s current height?

o Less than 5 feet.

o 5 feet 1 inch – 5 feet 5 inches

o 5 feet 6 inches- 6 feet

o More than 6 feet

 What is your weight in kilograms?

o Less than 50 kilograms

o 51- 70 kilograms

o 71- 90 kilograms

o 91-110 kilograms

o More than 110 kilograms

 Use of Different Question Types. To collect quantitative data, close-ended


questions have to be used in a survey. They can be a mix of multiple question
types including multiple-choice questions like semantic differential scale
questions, rating scale questions etc. that can help collect data that can be analyzed and
made sense of.

 Survey Distribution and Survey Data Collection: In the above, we have seen the process
of building a survey along with the survey design to collect quantitative data. Survey
distribution to collect data is the other important aspect of the survey process. There
are different ways of survey distribution.

7
Some of the most commonly used methods are:

 Email. Sending a survey via email is the most commonly used and most effective
methods of survey distribution.

 Buy Respondents. Another effective way to distribute a survey and collect


quantitative data is to use a sample. Since the respondents are knowledgeable and
also are open to participating in research studies, the responses are much higher.

 Embed survey in a website. Embedding a survey in a website increases a high


number of responses as the respondent is already in close proximity to the brand
when the survey pops up.

 Social distribution. Using social media to distribute the survey aids in collecting
higher number of responses from the people that are aware of the brand.

 QR Codes. QuestionPro QR codes store the URL for the survey. You
can print/publish this code in magazines, on signs, business cards, or on just about
any object/medium.

 SMS survey: A quick and time effective way of conducting a survey to collect a high
number of responses is the SMS survey.

 QuestionPro app: The QuestionPro App allows to quickly circulate surveys and the
responses can be collected both online and offline.

 API integration: You can use the API integration of the QuestionPro platform for
potential respondents to take your survey.

2. One-on-One Interviews.

This quantitative data collection method was also


traditionally conducted face-to-face but has shifted to
telephonic and online platforms. Interviews offer a
marketer the opportunity to gather extensive data from
the participants. Quantitative interviews are immensely
structured and play a key role in collecting information.

There are three major sections of these online


interviews:

 Face-to-Face Interviews. An
interviewer can prepare a list of
important interview questions in
addition to the already asked survey questions. This way, interviewees provide
exhaustive details about the topic under discussion. An interviewer can manage
to bond with the interviewee on a personal level which will help him/her to
collect more details about the topic due to which the responses also improve.
Interviewers can also ask for an explanation from the interviewees about
unclear answers.

 Online/Telephonic Interviews. Telephone-based interviews are no more a


novelty but these quantitative interviews have also moved to online mediums
such as Skype or Zoom. Irrespective of the distance between the interviewer
and the interviewee and their corresponding time zones, communication
becomes one-click away with online interviews. In case of telephone interviews,
the interview is merely a phone call away.

 Computer Assisted Personal Interview. This is a one-on-one interview


technique where the interviewer enters all the collected data directly into a

8
laptop or any other similar device. The processing time is reduced and also the
interviewers don’t have to carry physical questionnaires and merely enter the
answers in the laptop.

All of the above quantitative data collection methods can be achieved by


using surveys, questionnaires and polls.

9
Name: _______________________________________________ Date: ________________
Course, Year and Section: ________________________________ Score:________________

TASK 1

DIRECTION: Read and analyse each statement carefully. Encircle the letter of the best answer from the
given choices.

1. The instructor of BSIT 3rd year students record the hair color of each student. What level
of measurement is being used in the given scenario?

a. O c. R
r a
d t
i i
n o
a
d. I
l
n
b. N t
o e
m r
i v
n a
a l
l

2. In data collection method posting Google Form for survey in Facebook is example of what
method?

a. Email distribution methods

b. QR Codes

c. Embed survey in a website

d. Social distribution

3. A guard duty compiles a list of temperatures in degree Celsius of each employee in OMSC
Lubang for the month of January. What level of measurement is being used in the given
scenario?

a. Nominal c. Ration

b. Ordinal d. Interval

4. It is a traditionally conducted in data collection method.

a. One-on-one interview c. Embed survey in a website

b. Face-to-face interview d. API integration

5. Mr. Dela Cruz is an instructor of the BSIT 3rd year students in OMSC Lubang, he records
the height of each student. What level of measurement is being used in the given
scenario?

a. Nominal c. Ratio

b. Ordinal d. Interval

10
6. Which of the following statement is example or ordinal measurement?

a. Male and female choices indicating biological sex

b. A scale from 1 to 4 measuring life satisfaction

c. A list of six different religious affiliations

d. A whole number indication of a person’s age in year

7. The instructor of a class BSIT 3A records the letter grade for Quantitative Method (incld.
Modelling and Simulation) for each student. What level of measurement is being used in
the given scenario?

a. Nominal c. Ratio

b. Ordinal d. Interval

8. In the following statement which is correct in terms of variable with ratio measurement?

a. Do not have a true zero in their measurement scale

b. Have a true zero in their measurement scale

c. Have categories with meaningful order

d. Do not have categories with meaningful order

9. Ms. Cruz critic the list of the top 10 most viewed in YouTube in 2020. What level of
measurement is being used in the given scenario?

a. Nominal

b. Ordinal

c. Ratio

d. Interval

10. Ms. Gomez classified the exam as Easy, Difficult or Impossible. What level of
measurement is being used in the given scenario?

a. Nominal c. Ratio

b. Ordinal d. Interval

11
TOPIC 1.2: PRIMARY AND SECONDARY DATA, QUALITATIVE AND
QUANTITATIVE DATA

In a time when data is becoming easily accessible to researchers all over the world, the practicality of
utilizing secondary data for research is becoming more prevalent, same as its questionable authenticity
when compared with primary data.

Primary data and secondary data both have their advantages and disadvantages. Therefore, when
carrying out research, it is left for the researcher to weigh these factors and choose the better one.

It is therefore important for one to study the similarities and differences between these data types so as
to make proper decisions when choosing a better data type for research work.

What is Primary Data?

Primary data is the kind of data that is collected directly from the data source without going through any
existing sources. It is mostly collected specially for a research project and may be shared publicly to be
used for other research.

Primary data is often reliable, authentic, and objective in as much as it was collected with the purpose of
addressing a particular research problem. It is noteworthy that primary data is not commonly collected
because of the high cost of implementation.

A common example of primary data is the data collected by organizations during market research,
product research, and competitive analysis. This data is collected directly from its original source which in
most cases are the existing and potential customers. Most of the people who collect primary data are
government authorized agencies, investigators, research-based private institutions, etc.

What is Secondary Data?

Secondary data is the data that has been collected in the past by someone else but made available for
others to use. They are usually once primary data but become secondary when used by a third party.

Secondary data are usually easily accessible to researchers and individuals because they are mostly
shared publicly. This, however, means that the data are usually general and not tailored specifically to
meet the researcher's needs as primary data does.

Some common sources of secondary data include trade publications, government statistics, journals,
etc. In most cases, these sources cannot be trusted as authentic.

12
Source of Secondary Data

 Books  Websites

 Published Sources  Blogs

 Unpublished Personal  Diaries


Sources
 Government Records
 Journal
 Podcasts
 Newspapers

Qualitative Data and Quantitative Data

Data analysis is broad, exploratory, and downright complex. But when we take a step back and attempt
to simplify data analysis, we can quickly see it boils down to two things: qualitative and quantitative
data. These two types of data are quite different, yet, they make up all of the data that will ever be
analyzed.

Before diving into data analytics, it’s important to understand the key differences between qualitative
and quantitative data.

13
One type of data is objective, to-the-point, and conclusive. The other type of data is subjective,
interpretive, and exploratory. So, which is which?

What is Qualitative Data?

Qualitative data is non-statistical and is


typically unstructured or semi-
structured in nature. This data isn’t
necessarily measured using hard
numbers used to develop graphs and
charts. Instead, it is categorized based
on properties, attributes, labels, and
other identifiers.

Qualitative data can be used to ask the


question “why.” It is investigative and is
often open-ended until further research
is conducted. Generating this data from qualitative research is used for theorizations, interpretations,
developing hypotheses, and initial understandings.

Qualitative data can be generated through:

 Texts and documents

 Audio and video recordings

 Images and symbols

 Interview transcripts and focus groups

 Observations and notes

Surprisingly enough, identification numbers like an SSN or driver’s license are also considered qualitative
data because they are categorical and unique to one person.

What is Quantitative Data?

Contrary to qualitative data, quantitative data is statistical and is typically structured in nature –
meaning it is more rigid and defined. This type of data is measured using numbers and values, which
makes it a more suitable candidate for data analysis.

14
Whereas qualitative is open for exploration, quantitative data is much more concise and close-ended. It
can be used to ask the questions “how much” or “how many,” followed by conclusive information.

Quantitative data can be generated through:

 Tests

 Experiments

 Surveys

 Market reports

 Metrics

Quantitative data can actually be broken into further sub-categories. These categories are called
discrete and continuous data.

Types of Quantitative Data with Examples

 Counter. Count equated with entities. For example, the number of people who download a
particular application from the App Store.

 Measurement of physical objects: Calculating measurement of any physical thing. For example,
the HR executive carefully measures the size of each cubicle assigned to the newly joined
employees.

 Sensory calculation: Mechanism to naturally “sense” the measured parameters to create a


constant source of information. For example, a digital camera converts electromagnetic
information to a string of numerical data.

 Projection of data: Future projection of data can be done using algorithms and other
mathematical analysis tools. For example, a marketer will predict an increase in the sales after
launching a new product with thorough analysis.

 Quantification of qualitative entities: Identify numbers to qualitative information. For example,


asking respondents of an online survey to share the likelihood of recommendation on a scale of
0-10.

Quantitative data can be counted, measured, and expressed using numbers. Qualitative data is
descriptive and conceptual. Qualitative data can be categorized based on traits and characteristics.

15
Name: _______________________________________________ Date: ________________
Course, Year and Section: ________________________________ Score:________________

TASK 2

DIRECTION: Read and analyze each statement carefully. Put a ( ) if the given data at the column Data is
Quantitative or Qualitative.

QUANTITATIVE QUALITATIVE
DATA
DATA DATA

1. He ran 50 kilometers.

2. It taste sweet.

3. It is 50 degree Fahrenheit.

4. My fingernails is 7 cm long.

5. The color of laptop is black.

6. The tree is green.

7. One leaf 5 cm long.

8. The meat is 25 kilograms.

9. He is a male.

10. The wind speed of the storm is 93 kph.

11. The bird is flying.

12. The flower is red.

13. The height of the table.

14. Your salary is 20,000.

15. My age is 23.

16
TOPIC 1.3: ASSESSMENT OF QUANTITATIVE DATA

Quantitative Assessment Methods

This topics will focus on guiding you through the process of planning for gathering and analyzing
quantitative data. Quantitative data is data that can be analysed as numbers opposed to qualitative
data. In addition, this topic will briefly cover issues of how to make decisions about how such data is
gathered, analyzed and used to make a decisions and arguments. Specific attention will be focused on
how to build the structures that make gathering such data easier.

Quantitative helps us to look below the surface and see what is going on in a more definable way. It also
provides data that for some is more convincing.

Need to think through ahead of time “what story will I need to tell” and “what data is needed to tell the
story and convincing?”

Examples of Quantitative assessment tools:

 Benchmarking

Involves cross comparing organizations or programs relative to specific aspects of best practices. This
compares performance results in terms of key performance indicators (formulas or ratios) in areas such
as production, marketing, sales, market share and overall financials.

In quantitative tests, procedures on problems of known size are executed. Analysis of results then
establishes equations which can be used to predict performance on planned workload

 Cost Benefits Analysis

It is a technique developed by economists for judging the net social benefit or cost of a project or policy
involves assessing the cost effectiveness of implementing or maintaining programs or services.

A cost-benefit analysis is a process businesses use to analyze decisions. The business or analyst sums the
benefits of a situation or action and then subtracts the costs associated with taking that action.

For example, the analysis of a decision to construct a facility in a particular city could include
quantitative factors, such as the amount of tax breaks that can be obtained, as well as qualitative
factors, such as the rating of the schools in that city to which workers would send their children.

The Cost-Benefit Analysis Process

A cost-benefit analysis (CBA) should begin with compiling a comprehensive list of all the costs and
benefits associated with the project or decision.

The costs involved in a CBA might include the following:

 Direct costs would be direct labor involved in manufacturing,


inventory, raw materials, manufacturing expenses.

 Indirect costs might include electricity, overhead costs from


management, rent, and utilities.

 Intangible costs of a decision, such as the impact on customers,


employees, or delivery times.

17
 Opportunity costs such as alternative investments, or buying a plant
versus building one.

 Cost of potential risks such as regulatory risks, competition, and


environmental impacts.

Benefits might include the following:

 Revenue and sales increases from increased production or new


product.

 Intangible benefits, such as improved employee safety and morale,


as well as customer satisfaction due to enhanced product offerings
or faster delivery.

 Competitive advantage or market share gained as a result of the


decision.

An analyst or project manager should apply a monetary measurement to all of the items on the cost-
benefit list, taking special care not to underestimate costs or overestimate benefits.

Limitations of Cost-Benefit Analysis

For projects that involve small to mid-level capital expenditures and are short to intermediate in terms
of time to completion, an in-depth cost-benefit analysis may be sufficient enough to make a well-
informed, rational decision.

 Existing Data Analysis

Using existing data from large databases to understand a phenomenon and how it affects a population,
for example. Types of existing data include institutional data (PULSE survey, Senior Survey, etc.),
national data from research projects, etc.

 Mixed-Methods

A research approach that uses two or more methods, with at least one being quantitative and one being
qualitative in nature. A mixed-method evaluation systematically integrates two or more evaluation
methods, potentially at every stage of the evaluation process, usually drawing on both quantitative and
qualitative data. Mixed-method evaluations may use multiple designs, for example incorporating both
randomized control trial experiments and case studies.

Uses of Mixed Methods Research Designs

 Validate findings using quantitative and qualitative data sources

Evaluators can use a convergent design to compare findings from qualitative and quantitative data
sources. It involves collecting both types of data at roughly the same time; assessing information using
parallel constructs for both types of data; separately analyzing both types of data; and comparing results
through procedures such as a side-by-side comparison in a discussion, transforming the qualitative data
set into quantitative scores, or jointly displaying both forms of data.

For example, the investigator can gather qualitative data to assess the personal experiences of patients
while also gathering data from survey instruments measuring the quality of care. The two types of data
can provide validation for each other and also create a solid foundation for drawing conclusions about
the intervention.

18
 Use qualitative data to explore quantitative findings

This explanatory sequential design typically involves two phases: (1) an initial quantitative instrument
phase, followed by (2) a qualitative data collection phase, in which the qualitative phase builds directly
on the results from the quantitative phase. In this way, the quantitative results are explained in more
detail through the qualitative data.

For example, findings from instrument data about costs can be explored further with qualitative focus
groups to better understand how the personal experiences of individuals match up to the instrument
results. This kind of study illustrates the use of mixed methods to explain qualitatively how the
quantitative mechanisms might work.

 Develop survey instruments

Yet another mixed methods study design could support the development of appropriate quantitative
instruments that provide accurate measures within a context. This exploratory sequential
design involves first collecting qualitative exploratory data, analyzing the information, and using the
findings to develop a psychometric instrument well adapted to the sample under study. This instrument
is then, in turn, administered to a sample of a population.

For example, a PCMH study could begin with a qualitative exploration through interviews with primary
care providers to assess what constructs should be measured to best understand improved quality of
care.

 Use qualitative data to augment a quantitative outcomes study

An outcomes study, for example a randomized, controlled trial, with qualitative data collection and
analysis added, is called an embedded design. Within this type of an outcomes study, the researcher
collects and analyzes both quantitative and qualitative data. The qualitative data can be incorporated
into the study at the outset (for example, to help design the intervention); during the intervention, and
after the intervention (for example, to help explain the results). In this way, the qualitative data
augment the outcomes study, which is a popular approach within implementation and dissemination
research.

 Involve community-based stakeholders

A community-based participatory approach is an example of a multiphase design. This advanced mixed


methods approach involves community participants in many quantitative and qualitative phases of
research to bring about change.

The multiple phases all address a common objective of assessing and refining the model. This design
would involve primary care providers and staff, patients, and other providers and individuals in the
community in the research process. Key stakeholders participate as co-researchers in a project,
providing input about their needs, ways to address them, and ways to implement changes.

Advantages of Mixed-Methods

 Compares quantitative and qualitative data. Mixed methods are especially


useful in understanding contradictions between quantitative results and
qualitative findings.

 Reflects participants’ point of view. Mixed methods give a voice to study


participants and ensure that study findings are grounded in participants’
experiences.

 Fosters scholarly interaction. Such studies add breadth to multidisciplinary


team research by encouraging the interaction of quantitative, qualitative,
and mixed methods scholars.

 Provides methodological flexibility. Mixed methods have great flexibility


and are adaptable to many study designs, such as observational studies and

19
randomized trials, to elucidate more information than can be obtained in
only quantitative research.

 Collects rich, comprehensive data. Mixed methods also mirror the way
individuals naturally collect information—by integrating quantitative and
qualitative data. For example, sports stories frequently integrate
quantitative data (scores or number of errors) with qualitative data
(descriptions and images of highlights) to provide a more complete story
than either method would alone.

 Rubric

Scoring guide for evaluating performance, ability, or effectiveness, for a specific domain made up of
definitions of quality work, well-defined criteria for measuring quality work, and scoring method (using
numbers) to indicate level of performance.

 Satisfaction

Satisfaction research helps the company to determine their customer’s satisfaction towards their
products and services. In order for the research to be trustworthy and practical, it has to have validity,
reliability, objectivity and has to be economically profitable.

There are many risks in conducting customer satisfaction research.

 Having a wrong target group, the research does not cover the whole
sample or there is not a valid register and is focused on certain types of
respondents.

 Imperfect questionnaires, negligence of the interviewers and errors in


interpretation. As a result the research is gives false results and is
lacking validity and reliability.

TOPIC 1.4: DATA PROCESSING

Data Processing

Data processing occurs when data is collected and translated into usable information. Usually
performed by a data scientist or team of data scientists, it is important for data processing to be done
correctly as not to negatively affect the end product, or data output.

Data processing starts with data in its raw form and converts it into a more readable format (graphs,
documents, etc.), giving it the form and context necessary to be interpreted by computers and utilized
by employees throughout an organization. Activities involve when processing data

1. Questionnaire Checking

20
A questionnaire is a research instrument consisting of a series of questions for the purpose of gathering
information from respondents. Questionnaires can be thought of as a kind of written interview. They
can be carried out face to face, by telephone, computer or post.

The initial step in questionnaire checking involves reviewing all questionnaire for completeness and
interviewing or completion quality.

Questionnaire checking involves eliminating unacceptable questionnaires. These questionnaires may be


incomplete, instructions not followed, little variance, missing pages, past cutoff date or respondent not
qualified.

2. Editing

Editing is the review of the questionnaires with the objective of increasing accuracy and precision. It
consist of screening questionnaire to identify illegible, incomplete, inconsistent or ambiguous responses.
Editing looks to correct illegible, incomplete, inconsistent and ambiguous answers.

Treatment of Unsatisfactory Results:

 Returning to the Field. The questionnaires with unsatisfactory responses may be


returned to the field, where the interviews re-contact the respondents.

 Assigning Missing Values. It returning the questionnaires to the field is not feasible,
the editor may assign missing values to unsatisfactory responses.

 Discarding Unsatisfactory Respondents. In this approach, the respondents with


unsatisfactory responses are simply discarded.

3. Coding

Coding typically assigns alpha or numeric codes to answers that do not already have them so that
statistical techniques can be applied. Coding is the process of assigning a code, usually a number, to
each possible response to each question. The code includes an indication of the column position (field)
and data record will occupy.

Guidelines for coding unstructured questions:

 Category codes should be mutually exclusive and collectively exhausted.

 Only a few (10% or less) of the responses should fall into the “other” category

 Category codes should be assigned for critical issues even if no one has mentioned
them.

 Data should be coded to retain as much details as possible

21
4. Data Classification

The method of arranging data into homogeneous classes according to some common features
present in the data is called classification.

Raw data cannot be easily understood, and it is not fit for further analysis and interpretation. This
arrangement of data helps users in comparison and analysis.

A planned data analysis system makes fundamental data easy to find and recover. This can be of
particular interest for legal discovery, risk management and compliance. Written methods and set
of guidelines for data classification should determine what levels and measures the company will
use to organize data and define the roles of employees within the business regarding input
stewardship. Once a data-classification scheme has been designed, security standards that stipulate
proper approaching practices for each division and storage criteria that determine the data’s
lifecycle demands should be discussed.

Quantitative Classification is a type of classification is made on the basis some measurable


characteristics like height, weight, age, income, marks of students, etc.

Variable refers to quantity or attribute whose value varies from one investigation to another.
Derived from the word “vary” which means to differ or change. Variable is the characteristic which
varies or differ or changes from person to person, time to time, place to place etc.

Kinds of Variable

 Discrete Variable

Variables which are capable of taking an only exact value and not any fractional value are termed as
discrete variables.

[22]
For example, a number of workers or number of students in a class is a discrete variable as they
cannot be in fraction. Similarly, a number of children in a family can be 1, 2 or so on, but cannot be
1.5, 2.75.

 Continuous Variable

Those variables which can take all the possible values (integral as well as fractional) in a given
specified range are termed as continuous variables.

For example, Temperature, Height, Weight, Marks etc.

Objectives of Data Classification

 To consolidate the volume of data in such a way that similarities and


differences can be quickly understood. Figures can consequently be ordered in
a few sections holding common traits.

 To aid comparison.

 To point out the important characteristics of the data at a flash.

 To give importance to the prominent data collected while separating the


optional elements.

 To allow a statistical method of the material gathered.

Class Boys Girls

Grade I 82 34

Grade II 74 43

Grade III 92 27

Grade IV 87 30

Grade V 90 25

Grade VI 75 22

Example of classification in a table Gender-wise and class-wise information about students in School

[23]
5. Tabulation

Tabulation is a systematic & logical presentation of numeric data in rows and columns to facilitate
comparison and statistical analysis. It facilitates comparison by bringing related information close to
each other and helps in further statistical analysis and interpretation.

In other words, the method of placing organized data into a tabular form is called as tabulation. It
may be complex, double or simple depending upon the nature of categorization.

5 major objectives of Tabulation

1. To simplify the complex Data. It reduces the bulk information, i.e. raw data in a
simplified and meaningful form so that it could be easily by a common man in less
time.

2. To bring out essential features of the Data. It brings out the chief/main
characteristics of data, and it presents fact clearly and precisely without textual
explanation.

3. To facilitate comparison. Presentation of data in row & column is helpful in


simultaneous detailed comparison on the basis of several parameters.

4. To Facilitate Statistical Analysis. Tables serve as the best source of organized data
for further statistical analysis. The task of computing average, dispersion,
correlation, etc. becomes easier if data is presented in the form of table.

5. Saving of Space. A table presents facts in a better way than the textual form.

Types of Tabulation

 Simple Tabulation or One-way Tabulation. One-way tables are those that


present data for a single, categorical variable. Categorical variables refer to
variables described by labels or names, such as hat color, shoe style or a
dog breed.

For Example: The one-way table below showcases data on three hat color choices of 10 men
surveyed.

Hat Color Choices Red Blue Yellow

5 3 2

 Double Tabulation or Two-way Tabulation. Anyone familiar with crosstab


software is already familiar with two-way tables. Also known as
contingency tables or cross-tabulations, two-way tables are ideal for
analyzing relationships between categorical variables. Like one-way tables,
crosstab software tables can double as frequency counts or relative
frequencies.

For Example: The two-way table below shows data on the preferred leisure activity of 50 adults,
with preferences broken down by gender.

Leisure
Dance Sports TV Total
Activity

[24]
Men 2 10 8 20

Women 16 6 8 30

Total 18 16 16 50

 Complex Tabulation. When the data are tabulated according to many


characteristics, it is said to be a complex tabulation

[25]
6. Graphical Representation

Graphical representation refers to the use of intuitive charts to clearly visualize and simply data sets.
Data is ingested into graphical representation of data software and then represented by a variety of
symbols, such as lines on a line chart, bars on a bar chart, or slices on a pie chart, from which users can
gain greater insight than by numerical analysis alone.

Representational graphics can quickly illustrate general behavior and highlight phenomenon, anomalies,
and relationships between data points that may otherwise be overlooked, and may contribute to
predictions and better, data-driven decisions. The types of representational graphics used will depend
on the type of data being explored.

Types of Graphical Representation

Data charts are available in a wide variety of maps, diagrams, and graphs that typically include textual
titles and legends to denote the purpose, measurement units, and variables of the chart. Choosing the
most appropriate chart depends on a variety of different factors -- the nature of the data, the purpose
of the chart, and whether a graphical representation of qualitative data or a graphical representation of
quantitative data is being depicted.

1. Bar Graph

Contains a vertical axis and horizontal axis and displays data as rectangular bars with lengths
proportional to the values that they represent; a useful visual aid for marketing purposes

2. Choropleth

Thematic map in which an aggregate summary of a geographic characteristic within an area is


represented by patterns of shading proportionate to a statistical variable. Choropleth comes from the
Greek choros (area) and pleth (multitude). Immerse colors the map regions based on the measure you
choose.

Use a Choropleth to compare aggregate values across regions. Choropleths are useful for spotting
outliers, but are not intended to provide detail on the values within a region.

Voter turnout at latest elections in Europe

Countries with higher turnout (green) and a lower turnout (red) than the EU28 average (68%) at their
last national elections. In comparison, the turnout at the US election 2016 was 61.4%.

[26]
3. Heat map

A colored, two-dimensional matrix of cells in which each cell represents a grouping of data and each
cell’s color indicates its relative value. Cell color indicates the relative value of the cells, from one end of
the spectrum to the other.

Heat maps are ideal for spotting outliers, which show up vividly on the color spectrum. They work best
when the number of groupings is not too large, since large numbers of groupings cause the heat map to
exceed the viewport, making comparison harder.

A geographical heat map or geo heat map represents areas of high and low density of a certain
parameter (for instance, population density, network density, etc.) by displaying data points on a real
map in a visually interactive manner. Industries like real estate, travel, food, and so on can greatly
benefit from the usage of geographical heat maps.

4. Histogram

[27]
Frequency distribution and graphical representation uses adjacent vertical bars erected over discrete
intervals to represent the data frequency within a given interval; a useful visual aid for meteorology and
environment purposes.

A histogram is a graphical display of data using bars of different heights. In a histogram, each bar groups
numbers into ranges. Taller bars show that more data falls in that range. A histogram displays the shape
and spread of continuous sample data.

Example of a Histogram

Jeff is the branch manager at a local bank. Recently, Jeff’s been receiving customer feedback saying that
the wait times for a client to be served by a customer service representative are too long. Jeff decides to
observe and write down the time spent by each customer on waiting. Here are his findings from
observing and writing down the wait times spent by 20 customers:

5. Line Graph

Line Graph displays continuous data; ideal for predicting future events over time; a useful visual aid for
marketing purposes. A line graph is usually used to show the change of information over a period of

[28]
time. This means that the horizontal axis is usually a time scale, for example minutes, hours, days,
months or years.

Example: The table shows the daily earnings of a store for five days.

Day Mon Tues Wed Thurs Fri


Earnings 300 450 200 400 650

Example: The table shows the daily sales in RM of different categories of items for five days.

Day Mon Tues Wed Thurs Fri


Drinks 300 450 150 400 650
Food 400 500 350 300 500

6. Pie Chart

[29]
A Pie Chart is a type of graph that displays data in a
circular graph. The pieces of the graph are proportional
to the fraction of the whole in each category. In other
words, each slice of the pie is relative to the size of that
category in the group as a whole. The entire “pie”
represents 100 percent of a whole, while the pie “slices”
represent portions of the whole. Shows percentage
values as a slice of pie; a useful visual aid for marketing
purposes.

The following chart shows water usage (image courtesy


of the EPA). You can see that toilet water usage is
greater than shower water usage because the piece of
the “pie” is greater:

7. Scatter plot

The Scatter Plot displays unaggregated, row-level data as points, plotting the points along an x and y
axis. Each axis represents a quantitative measure. You can use additional measures to change the size or
color points, making the scatter plot capable of representing up to four measures for each group (x, y,
size, and color).

A diagram that shows the relationship between two sets of data, where each dot represents individual
pieces of data and each axis represents a quantitative measure. Scatter plots resemble Bubble charts,
but are used to view unaggregated data, while Bubble charts aggregate data.

Use a scatter plot chart to study the correlation between two measures, or to spot outliers or clusters in
the distribution of data. You can use a Scatter Plot to visualize any dataset, but they are most useful for
exploring large amounts of data.

Example, the local ice cream shop keeps track of how much ice cream they sell versus the noon
temperature on that day. Here are their figures for the last 12 days:

Ice Cream Sales vs.


Temperature
Ice Cream
$700
$600
$500
$400
$300
$200
$100
$0
[30] 0 5 10 15 20 25 30
Temperature Ice Cream
°C Sales
14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408

8. Stacked Bar Graph

A stacked bar graph (or stacked bar chart) is a chart that uses bars to show comparisons between
categories of data, but with ability to break down and compare parts of a whole. Each bar in the chart
represents a whole, and segments in the bar represent different parts or categories of that whole.

Stacked bars do a good job of featuring the total and also providing a hint as to how the total for each
category value is divided into parts. Stacked Bar graph can have one category axis and up to two
numerical axes. Category axis describes the types of categories being compared, and the numerical axes
represent the values of the data.

Stacked Bar graph can be used to represent: Ranking, Nominal Comparisons, Part-to-whole, Deviation,
or Distribution.

Example of Nominal Comparison

[31]
Example of Ranking

9. Timeline Chart

A long bar labelled with dates paralleling it that display a list of events in chronological order, a useful
visual aid for history charting purposes. A timeline chart is an effective way to visualize a process using
chronological order. Since details are displayed graphically, important points in time can be easy seen
and understood. Often used for managing a project’s schedule, timeline charts function as a sort of
calendar of events within a specific period of time.

Types of Timeline Charts

 Standard Timeline Charts

Standard timeline charts illustrate events accompanied by explanatory text or images. They’re used for a
variety of purposes, one of which is to narrate historical events.

Example. This standard timeline graph shows the launch year of each of the popular social media
networks. Image source: Venngage.com

[32]
 Gantt Chart

A Gantt chart uses bars of varying sizes spread across a timeline to represent a task’s start date,
duration, and finish date.

10. Tree Diagram

A tree diagram is a new management planning tool that depicts the hierarchy of tasks and subtasks
needed to complete and objective. The tree diagram starts with one item that branches into two or
more, each of which branch into two or more, and so on. The finished diagram bears a resemblance to a
tree, with a trunk and multiple branches.

It is used to break down broad categories into finer and finer levels of detail. Developing the tree
diagram helps you move your thinking step by step from generalities to specifics.

Examples of Tree Diagram

[33]
11. Venn Diagram

A Venn diagram is an illustration that uses circles to show the relationships among things or finite
groups of things. Circles that overlap have a commonality while circles that do not overlap do not share
those traits.

Venn diagram help to visually represent the similarities and differences between two concepts

Example 1. Below, we can see that there are orange fruits (circle B) such as persimmons and tangerines
while apples and cherries (circle A) come in Sky-blue colors. Peppers and tomatoes come in both red and
orange color, as presented by the overlapping areas of the two circles.

Example 2. Below, we see that Car A is a sedan that’s powered by gasoline and gets 20 miles per gallon,
while Car B is a hybrid, gets 40 miles-per-gallon for mileage, and is a hatchback.

[34]
7. Data Cleaning

Data cleaning is the process of preparing data for analysis by removing or modifying data that is
incorrect, incomplete, irrelevant, duplicated, or improperly formatted.

This data is usually not necessary or helpful when it comes to analyzing data because it may hinder the
process or provide inaccurate results. There are several methods for cleaning data depending on how it
is stored along with the answers being sought.

Data cleaning is not simply about erasing information to make space for new data, but rather finding a
way to maximize a data set’s accuracy without necessarily deleting information.

For one, data cleaning includes more actions than removing data, such as fixing spelling and syntax
errors, standardizing data sets, and correcting mistakes such as empty fields, missing codes, and
identifying duplicate data points. Data cleaning is considered a foundational element of the data science
basics, as it plays an important role in the analytical process and uncovering reliable answers.

Most importantly, the goal of data cleaning is to create data sets that are standardized and uniform to
allow business intelligence and data analytics tools to easily access and find the right data for each
query.

What are the benefits of Data Cleaning?

There are many benefits to having clean data:

1. It removes major errors and inconsistencies that are inevitable when multiple sources of data
are being pulled into one dataset.

2. Using tools to clean up data will make everyone on your team more efficient as you’ll be able to
quickly get what you need from the data available to you.

3. Fewer errors means happier customers and fewer frustrated employees.

4. It allows you to map different data functions, and better understand what your data is intended
to do, and learn where it is coming from.

[35]
Data Cleaning in six step

Here are some best practices when it comes to create a data cleaning process:

1. Monitor errors

Keep a record of trends where most of your errors are coming from. This will make it a lot easier to
identify and fix incorrect or corrupt data. Records are especially important if you are integrating other
solutions with your fleet management software, so that your errors don’t clog up the work of other
departments.

2. Standardize your process

Standardize the point of entry to help reduce the risk of duplication.

3. Validate data accuracy

Once you have cleaned your existing database, validate the accuracy of your data. Research and invest
in data tools that allow you to clean your data in real-time. Some tools even use AI or machine
learning to better test for accuracy.

4. Scrub for duplicate data

Identify duplicates to help save time when analyzing data. Repeated data can be avoided by researching
and investing in different data cleaning tools that can analyze raw data in bulk and automate the process
for you.

5. Analyze your data

After your data has been standardized, validated and scrubbed for duplicates, use third-party sources to
append it. Reliable third-party sources can capture information directly from first-party sites, then clean
and compile the data to provide more complete information for business intelligence and analytics.

6. Communicate with your team

Share the new standardized cleaning process with your team to promote adoption of the new protocol.
Now that you’ve scrubbed down your data, it’s important to keep it clean. Keeping your team in the
loop will help you develop and strengthen customer segmentation and send more targeted information
to customers and prospects.

Finally, monitor and review data regularly to catch inconsistencies.

8. Data Adjusting

You’re strategic and tactical quantitative research work – designing, programming, and fielding an online
questionnaire – result in raw data files containing all the respondents’ answers to your survey. Typically,
some form of data preparation must be completed before your analysis begins. Neglecting to carefully
prepare your raw data may jeopardize the statistical results and bias your interpretations and
subsequent findings.

Sometimes, your data must be statistically adjusted to become representative of your target population.
While this is not always necessary, it can enhance the quality of your data.

There are three techniques at your disposal: weighting, variable re-specification, and scale
transformations.

[36]
1. Weighting Data

Weighting is a statistical adjustment made by assigning a weight to each respondent in the database to
reflect that respondent’s importance relative to the other respondents. The purpose of weighting is to
increase or decrease the number of respondents in the sample that have certain characteristics so that
the sample data is more representative of the target population.

The analysis compares three primary statistical methods for weighting survey data: raking, matching and
propensity weighting.

 Ranking

For public opinion surveys, the most prevalent method for


weighting is iterative proportional fitting, more commonly
referred to as raking. With raking, a researcher chooses a
set of variables where the population distribution is
known, and the procedure iteratively adjusts the weight
for each case until the sample distribution aligns with the
population for those variables.

 Matching

Matching is another technique that has been proposed as


a means of adjusting online opt-in samples. It involves
starting with a sample of cases (i.e., survey interviews)
that is representative of the population and contains all of the variables to be used in the adjustment.
This “target” sample serves as a template for what a survey sample would look like if it was randomly
selected from the population.

[37]
 Propensity weighting

A key concept in probability-based sampling is that if survey respondents have different probabilities of
selection, weighting each case by the inverse of its probability of selection removes any bias that might
result from having different kinds of people represented in the wrong proportion.

2. Variable Respecification

Variable re-specification involves the transformation of data to create new variables or modify existing
variables. Respecification of variables is the process of modifying existing variables in order to create
new variables that better answer research questions.

For example, you ask respondents about purchase intent on a 7-point scale, so you have 7 different
response categories in your survey that you collapse into 3 or 4 total categories in the dataset (e.g.,

[38]
collapsing by respondents “most likely to buy” – those who select 7, 6, or 5 on the scale –, “neutral” –
those who select 4 –, and “least likely to buy” – those who select 3, 2 or 1).

Alternatively, you could create new variables that are the combination of several other variables. You
can also create new variables by taking a ratio among two existing variables.

The use of dummy variables is another type of respecification technique that uses variables that take
only two values, typically 0 or 1, to respecify categorical values. Dummy variables, also called binary,
dichotomous, instrumental, or qualitative variables, are helpful when the category coding is not
meaningful for statistical analysis. Instead, you can represent the categories with dummy variables.

For example, if you have heavy, light, and non-users coded as 3, 2, 1 respectively, you can represent
these with the dummy variables X3, X2, X1. Heavy users (X3) would = 1 in the data sheet, and the others
would = 0. Light users (X2) would = 1 in the datasheet, and all others = 0. And non-users (X3) would = 1,
with all others = 0.

Product Original Dummy Dummy Dummy


Usage Variable Variable Variable Variable
Category Code Code Code Code
X1 X2 X3
Heavy Users 3 0 0 1
Light Users 2 0 1 0
Non-users 1 1 0 0

3. Scale Transformation

Scale transformation involves a manipulation of scale values to ensure comparability with other scales
or otherwise make the data suitable for analysis. Frequently different scales are employed for measuring
different variables.

For example, image variables may be measured on a 7-point semantic differential scale altitude
variables on a continuous rating scale, and lifestyle variables on a 5-point Likert scale.

Therefore, it would not be meaningful to make comparisons across the measurement scales for any
respondent. To compare attitudinal scores with lifestyle or image scores, it would be necessary to
transform the various scales. Even if the same scale is employed for all the variables, different
respondents may use the scale differently. For example, some respondents consistently use the upper
end of a rating scale whereas others consistently use the lower end. These differences can be corrected
by appropriately transforming the data.

[39]

You might also like