Esson: Introduction To Quantitative Methods
Esson: Introduction To Quantitative Methods
Esson: Introduction To Quantitative Methods
TOPICS
1.1. Data Collection
1.2. Primary and Secondary Data, Qualitative and Quantitative Data
1.3. Assessment of Quantitative Data
1.4. Data Processing
LEARNING OUTCOMES
At the end of the lesson, the students should be able to:
1. identify the different methods in collecting data and it’s level of
measurement
2. differentiate the use of quantitative and qualitative data
3. apply the different process on assessing and presenting collected
data
4.
For example, there are quantities corresponding to various parameters, for instance, “How much did that
laptop cost?” is a question which will collect quantitative data. There are values associated with most
measuring parameters such as pounds or kilograms for weight, dollars for cost etc.
Quantitative data makes measuring various parameters controllable due to the ease of mathematical
derivations they come with.
1
Quantitative data is usually collected for statistical analysis using surveys, polls or questionnaires sent
across to a specific section of a population. The retrieved results can be established across a population.
COLLECTION METHOD
As quantitative data is in the form of numbers, mathematical and statistical analysis of these numbers
can lead to establishing some conclusive results.
1. Surveys.
A critical factor about surveys is that the responses collected should be such that they can be
generalized to the entire population without significant discrepancies.
Fundamental Levels of Measurement. There are four measurement scales which are
fundamental to creating a multiple-choice question in a survey in collecting quantitative
2
data. They are, nominal, ordinal, interval and ratio measurement scales without the
fundamentals of which, no multiple choice questions can be created.
LEVELS OF MEASUREMENT
Nominal Scale also called the categorical variable scale. Defined as a scale used for labelling variables
into distinct classifications and doesn’t involve a quantitative value or order.
This scale is the simplest of the four variable measurement scales. Calculations done on these variables
will be futile as there is no numerical value of the options. There are cases where this scale is used for
the purpose of classification – the numbers associated with variables of this scale are only tags for
categorization or division.
1 – Suburbs
2 – City
3 – Town
2. The other alternative to collect nominal data is to include a multiple choice question
in which the answer will be labeled.
Gender
Political preferences
3
Place of residence
What is your Gender? What is your Political preference? Where do you live?
1 – Independent 1 – Suburbs
Male
2 – Democrat 2 – City
Female
3 – Republican 3 – Town
Ordinal scale is defined as a variable measurement scale used to simply depict the order of variables and
not the difference between each of the variables. These scales are generally used to depict non-
mathematical ideas such as frequency, satisfaction, happiness, a degree of pain, etc. It is quite
straightforward to remember the implementation of this scale as ‘Ordinal’ sounds similar to ‘Order’,
which is exactly the purpose of this scale.
Ordinal Scale maintains descriptional qualities along with an intrinsic order but is void of an origin of
scale and thus, the distance between variables can’t be calculated. Descriptional qualities indicate
tagging properties similar to the nominal scale, in addition to which, the ordinal scale also has a relative
position of variables. Origin of this scale is absent due to which there is no fixed start or “true zero”.
Status at workplace, tournament team rankings, order of product quality, and order of agreement or
satisfaction are some of the most common examples of the ordinal Scale. These scales are generally used
in market research to gather and evaluate relative feedback about product satisfaction, changing
perceptions with product upgrades, etc.
Very Unsatisfied – 1
Unsatisfied – 2
Neutral – 3
Satisfied – 4
Very Satisfied – 5
1. Here, the order of variables is of prime importance and so is the labeling. Very
unsatisfied will always be worse than unsatisfied and satisfied will be worse than very
satisfied.
2. This is where ordinal scale is a step above nominal scale – the order is relevant to the
results and so is their naming.
3. Analyzing results based on the order along with the name becomes a convenient
process for the researcher.
4. If they intend to obtain more information than what they would collect using a nominal
scale, they can use the ordinal scale.
This scale not only assigns values to the variables but also measures the rank or order of the variables,
such as:
4
Grades
Satisfaction
Happiness
Ordinal scale data can be presented in tabular or graphical formats for a researcher to conduct a
convenient analysis of collected data. Also, methods such as Mann-Whitney U test and Kruskal–Wallis H
test can also be used to analyze ordinal data. These methods are generally implemented to compare
two or more ordinal groups.
In the Mann-Whitney U test, researchers can conclude which variable of one group is bigger or smaller
than another variable of a randomly selected group. While in the Kruskal–Wallis H test, researchers can
analyze whether two or more ordinal groups have the same median or not.
Graphical Format
Interval scale is defined as a numerical scale where the order of the variables is known as well as the
difference between these variables. Variables that have familiar, constant, and computable differences
are classified using the Interval scale. It is easy to remember the primary role of this scale too, ‘Interval’
indicates ‘distance between two entities’, which is what Interval scale helps in achieving.
5
Interval scale contains all the properties of the ordinal scale, in addition to which, it offers a calculation
of the difference between variables. The main characteristic of this scale is the equidistant difference
between objects.
The mean and median values in an ordinal scale can be evaluated, unlike
the previous two scales.
Even if interval scales are amazing, they do not calculate the “true zero” value which is why the next
scale comes into the picture.
All the techniques applicable to nominal and ordinal data analysis are applicable to Interval Data as well.
Apart from those techniques, there are a few analysis methods such as descriptive statistics, correlation
regression analysis which is extensively for analyzing interval data.
Descriptive statistics is the term given to the analysis of numerical data which helps to describe, depict,
or summarize data in a meaningful manner and it helps in calculation of mean, median, and mode.
Apart from the temperature scale, time is also a very common example of
an interval
Calendar years and time also fall under this category of measurement
scales.
Likert scale, Net Promoter Score, Semantic Differential Scale, Bipolar Matrix
Table, etc. are the most-used interval scale examples.
6
Ratio Scale is defined as a variable measurement scale which not only produces the order of variables
but also makes the difference between variables known along with information on the value of true
zero. It is calculated by assuming that the variables have an option for zero, the difference between the
two variables is the same and there is a specific order between the options.
With the option of true zero, varied inferential, and descriptive analysis techniques can be applied to the
variables. The best examples of ratio scales are weight and height. In market research, a ratio scale is
used to calculate market share, annual sales, the price of an upcoming product, the number of
consumers, etc.
Because of the existence of true zero value, the ratio scale doesn’t have negative values.
To decide when to use a ratio scale, the researcher must observe whether the variables have all
the characteristics of an interval scale along with the presence of the absolute zero value.
Mean, mode and median can be calculated using the ratio scale.
o 51- 70 kilograms
o 71- 90 kilograms
o 91-110 kilograms
Survey Distribution and Survey Data Collection: In the above, we have seen the process
of building a survey along with the survey design to collect quantitative data. Survey
distribution to collect data is the other important aspect of the survey process. There
are different ways of survey distribution.
7
Some of the most commonly used methods are:
Email. Sending a survey via email is the most commonly used and most effective
methods of survey distribution.
Social distribution. Using social media to distribute the survey aids in collecting
higher number of responses from the people that are aware of the brand.
QR Codes. QuestionPro QR codes store the URL for the survey. You
can print/publish this code in magazines, on signs, business cards, or on just about
any object/medium.
SMS survey: A quick and time effective way of conducting a survey to collect a high
number of responses is the SMS survey.
QuestionPro app: The QuestionPro App allows to quickly circulate surveys and the
responses can be collected both online and offline.
API integration: You can use the API integration of the QuestionPro platform for
potential respondents to take your survey.
2. One-on-One Interviews.
Face-to-Face Interviews. An
interviewer can prepare a list of
important interview questions in
addition to the already asked survey questions. This way, interviewees provide
exhaustive details about the topic under discussion. An interviewer can manage
to bond with the interviewee on a personal level which will help him/her to
collect more details about the topic due to which the responses also improve.
Interviewers can also ask for an explanation from the interviewees about
unclear answers.
8
laptop or any other similar device. The processing time is reduced and also the
interviewers don’t have to carry physical questionnaires and merely enter the
answers in the laptop.
9
Name: _______________________________________________ Date: ________________
Course, Year and Section: ________________________________ Score:________________
TASK 1
DIRECTION: Read and analyse each statement carefully. Encircle the letter of the best answer from the
given choices.
1. The instructor of BSIT 3rd year students record the hair color of each student. What level
of measurement is being used in the given scenario?
a. O c. R
r a
d t
i i
n o
a
d. I
l
n
b. N t
o e
m r
i v
n a
a l
l
2. In data collection method posting Google Form for survey in Facebook is example of what
method?
b. QR Codes
d. Social distribution
3. A guard duty compiles a list of temperatures in degree Celsius of each employee in OMSC
Lubang for the month of January. What level of measurement is being used in the given
scenario?
a. Nominal c. Ration
b. Ordinal d. Interval
5. Mr. Dela Cruz is an instructor of the BSIT 3rd year students in OMSC Lubang, he records
the height of each student. What level of measurement is being used in the given
scenario?
a. Nominal c. Ratio
b. Ordinal d. Interval
10
6. Which of the following statement is example or ordinal measurement?
7. The instructor of a class BSIT 3A records the letter grade for Quantitative Method (incld.
Modelling and Simulation) for each student. What level of measurement is being used in
the given scenario?
a. Nominal c. Ratio
b. Ordinal d. Interval
8. In the following statement which is correct in terms of variable with ratio measurement?
9. Ms. Cruz critic the list of the top 10 most viewed in YouTube in 2020. What level of
measurement is being used in the given scenario?
a. Nominal
b. Ordinal
c. Ratio
d. Interval
10. Ms. Gomez classified the exam as Easy, Difficult or Impossible. What level of
measurement is being used in the given scenario?
a. Nominal c. Ratio
b. Ordinal d. Interval
11
TOPIC 1.2: PRIMARY AND SECONDARY DATA, QUALITATIVE AND
QUANTITATIVE DATA
In a time when data is becoming easily accessible to researchers all over the world, the practicality of
utilizing secondary data for research is becoming more prevalent, same as its questionable authenticity
when compared with primary data.
Primary data and secondary data both have their advantages and disadvantages. Therefore, when
carrying out research, it is left for the researcher to weigh these factors and choose the better one.
It is therefore important for one to study the similarities and differences between these data types so as
to make proper decisions when choosing a better data type for research work.
Primary data is the kind of data that is collected directly from the data source without going through any
existing sources. It is mostly collected specially for a research project and may be shared publicly to be
used for other research.
Primary data is often reliable, authentic, and objective in as much as it was collected with the purpose of
addressing a particular research problem. It is noteworthy that primary data is not commonly collected
because of the high cost of implementation.
A common example of primary data is the data collected by organizations during market research,
product research, and competitive analysis. This data is collected directly from its original source which in
most cases are the existing and potential customers. Most of the people who collect primary data are
government authorized agencies, investigators, research-based private institutions, etc.
Secondary data is the data that has been collected in the past by someone else but made available for
others to use. They are usually once primary data but become secondary when used by a third party.
Secondary data are usually easily accessible to researchers and individuals because they are mostly
shared publicly. This, however, means that the data are usually general and not tailored specifically to
meet the researcher's needs as primary data does.
Some common sources of secondary data include trade publications, government statistics, journals,
etc. In most cases, these sources cannot be trusted as authentic.
12
Source of Secondary Data
Books Websites
Data analysis is broad, exploratory, and downright complex. But when we take a step back and attempt
to simplify data analysis, we can quickly see it boils down to two things: qualitative and quantitative
data. These two types of data are quite different, yet, they make up all of the data that will ever be
analyzed.
Before diving into data analytics, it’s important to understand the key differences between qualitative
and quantitative data.
13
One type of data is objective, to-the-point, and conclusive. The other type of data is subjective,
interpretive, and exploratory. So, which is which?
Surprisingly enough, identification numbers like an SSN or driver’s license are also considered qualitative
data because they are categorical and unique to one person.
Contrary to qualitative data, quantitative data is statistical and is typically structured in nature –
meaning it is more rigid and defined. This type of data is measured using numbers and values, which
makes it a more suitable candidate for data analysis.
14
Whereas qualitative is open for exploration, quantitative data is much more concise and close-ended. It
can be used to ask the questions “how much” or “how many,” followed by conclusive information.
Tests
Experiments
Surveys
Market reports
Metrics
Quantitative data can actually be broken into further sub-categories. These categories are called
discrete and continuous data.
Counter. Count equated with entities. For example, the number of people who download a
particular application from the App Store.
Measurement of physical objects: Calculating measurement of any physical thing. For example,
the HR executive carefully measures the size of each cubicle assigned to the newly joined
employees.
Projection of data: Future projection of data can be done using algorithms and other
mathematical analysis tools. For example, a marketer will predict an increase in the sales after
launching a new product with thorough analysis.
Quantitative data can be counted, measured, and expressed using numbers. Qualitative data is
descriptive and conceptual. Qualitative data can be categorized based on traits and characteristics.
15
Name: _______________________________________________ Date: ________________
Course, Year and Section: ________________________________ Score:________________
TASK 2
DIRECTION: Read and analyze each statement carefully. Put a ( ) if the given data at the column Data is
Quantitative or Qualitative.
QUANTITATIVE QUALITATIVE
DATA
DATA DATA
1. He ran 50 kilometers.
2. It taste sweet.
3. It is 50 degree Fahrenheit.
4. My fingernails is 7 cm long.
9. He is a male.
16
TOPIC 1.3: ASSESSMENT OF QUANTITATIVE DATA
This topics will focus on guiding you through the process of planning for gathering and analyzing
quantitative data. Quantitative data is data that can be analysed as numbers opposed to qualitative
data. In addition, this topic will briefly cover issues of how to make decisions about how such data is
gathered, analyzed and used to make a decisions and arguments. Specific attention will be focused on
how to build the structures that make gathering such data easier.
Quantitative helps us to look below the surface and see what is going on in a more definable way. It also
provides data that for some is more convincing.
Need to think through ahead of time “what story will I need to tell” and “what data is needed to tell the
story and convincing?”
Benchmarking
Involves cross comparing organizations or programs relative to specific aspects of best practices. This
compares performance results in terms of key performance indicators (formulas or ratios) in areas such
as production, marketing, sales, market share and overall financials.
In quantitative tests, procedures on problems of known size are executed. Analysis of results then
establishes equations which can be used to predict performance on planned workload
It is a technique developed by economists for judging the net social benefit or cost of a project or policy
involves assessing the cost effectiveness of implementing or maintaining programs or services.
A cost-benefit analysis is a process businesses use to analyze decisions. The business or analyst sums the
benefits of a situation or action and then subtracts the costs associated with taking that action.
For example, the analysis of a decision to construct a facility in a particular city could include
quantitative factors, such as the amount of tax breaks that can be obtained, as well as qualitative
factors, such as the rating of the schools in that city to which workers would send their children.
A cost-benefit analysis (CBA) should begin with compiling a comprehensive list of all the costs and
benefits associated with the project or decision.
17
Opportunity costs such as alternative investments, or buying a plant
versus building one.
An analyst or project manager should apply a monetary measurement to all of the items on the cost-
benefit list, taking special care not to underestimate costs or overestimate benefits.
For projects that involve small to mid-level capital expenditures and are short to intermediate in terms
of time to completion, an in-depth cost-benefit analysis may be sufficient enough to make a well-
informed, rational decision.
Using existing data from large databases to understand a phenomenon and how it affects a population,
for example. Types of existing data include institutional data (PULSE survey, Senior Survey, etc.),
national data from research projects, etc.
Mixed-Methods
A research approach that uses two or more methods, with at least one being quantitative and one being
qualitative in nature. A mixed-method evaluation systematically integrates two or more evaluation
methods, potentially at every stage of the evaluation process, usually drawing on both quantitative and
qualitative data. Mixed-method evaluations may use multiple designs, for example incorporating both
randomized control trial experiments and case studies.
Evaluators can use a convergent design to compare findings from qualitative and quantitative data
sources. It involves collecting both types of data at roughly the same time; assessing information using
parallel constructs for both types of data; separately analyzing both types of data; and comparing results
through procedures such as a side-by-side comparison in a discussion, transforming the qualitative data
set into quantitative scores, or jointly displaying both forms of data.
For example, the investigator can gather qualitative data to assess the personal experiences of patients
while also gathering data from survey instruments measuring the quality of care. The two types of data
can provide validation for each other and also create a solid foundation for drawing conclusions about
the intervention.
18
Use qualitative data to explore quantitative findings
This explanatory sequential design typically involves two phases: (1) an initial quantitative instrument
phase, followed by (2) a qualitative data collection phase, in which the qualitative phase builds directly
on the results from the quantitative phase. In this way, the quantitative results are explained in more
detail through the qualitative data.
For example, findings from instrument data about costs can be explored further with qualitative focus
groups to better understand how the personal experiences of individuals match up to the instrument
results. This kind of study illustrates the use of mixed methods to explain qualitatively how the
quantitative mechanisms might work.
Yet another mixed methods study design could support the development of appropriate quantitative
instruments that provide accurate measures within a context. This exploratory sequential
design involves first collecting qualitative exploratory data, analyzing the information, and using the
findings to develop a psychometric instrument well adapted to the sample under study. This instrument
is then, in turn, administered to a sample of a population.
For example, a PCMH study could begin with a qualitative exploration through interviews with primary
care providers to assess what constructs should be measured to best understand improved quality of
care.
An outcomes study, for example a randomized, controlled trial, with qualitative data collection and
analysis added, is called an embedded design. Within this type of an outcomes study, the researcher
collects and analyzes both quantitative and qualitative data. The qualitative data can be incorporated
into the study at the outset (for example, to help design the intervention); during the intervention, and
after the intervention (for example, to help explain the results). In this way, the qualitative data
augment the outcomes study, which is a popular approach within implementation and dissemination
research.
The multiple phases all address a common objective of assessing and refining the model. This design
would involve primary care providers and staff, patients, and other providers and individuals in the
community in the research process. Key stakeholders participate as co-researchers in a project,
providing input about their needs, ways to address them, and ways to implement changes.
Advantages of Mixed-Methods
19
randomized trials, to elucidate more information than can be obtained in
only quantitative research.
Collects rich, comprehensive data. Mixed methods also mirror the way
individuals naturally collect information—by integrating quantitative and
qualitative data. For example, sports stories frequently integrate
quantitative data (scores or number of errors) with qualitative data
(descriptions and images of highlights) to provide a more complete story
than either method would alone.
Rubric
Scoring guide for evaluating performance, ability, or effectiveness, for a specific domain made up of
definitions of quality work, well-defined criteria for measuring quality work, and scoring method (using
numbers) to indicate level of performance.
Satisfaction
Satisfaction research helps the company to determine their customer’s satisfaction towards their
products and services. In order for the research to be trustworthy and practical, it has to have validity,
reliability, objectivity and has to be economically profitable.
Having a wrong target group, the research does not cover the whole
sample or there is not a valid register and is focused on certain types of
respondents.
Data Processing
Data processing occurs when data is collected and translated into usable information. Usually
performed by a data scientist or team of data scientists, it is important for data processing to be done
correctly as not to negatively affect the end product, or data output.
Data processing starts with data in its raw form and converts it into a more readable format (graphs,
documents, etc.), giving it the form and context necessary to be interpreted by computers and utilized
by employees throughout an organization. Activities involve when processing data
1. Questionnaire Checking
20
A questionnaire is a research instrument consisting of a series of questions for the purpose of gathering
information from respondents. Questionnaires can be thought of as a kind of written interview. They
can be carried out face to face, by telephone, computer or post.
The initial step in questionnaire checking involves reviewing all questionnaire for completeness and
interviewing or completion quality.
2. Editing
Editing is the review of the questionnaires with the objective of increasing accuracy and precision. It
consist of screening questionnaire to identify illegible, incomplete, inconsistent or ambiguous responses.
Editing looks to correct illegible, incomplete, inconsistent and ambiguous answers.
Assigning Missing Values. It returning the questionnaires to the field is not feasible,
the editor may assign missing values to unsatisfactory responses.
3. Coding
Coding typically assigns alpha or numeric codes to answers that do not already have them so that
statistical techniques can be applied. Coding is the process of assigning a code, usually a number, to
each possible response to each question. The code includes an indication of the column position (field)
and data record will occupy.
Only a few (10% or less) of the responses should fall into the “other” category
Category codes should be assigned for critical issues even if no one has mentioned
them.
21
4. Data Classification
The method of arranging data into homogeneous classes according to some common features
present in the data is called classification.
Raw data cannot be easily understood, and it is not fit for further analysis and interpretation. This
arrangement of data helps users in comparison and analysis.
A planned data analysis system makes fundamental data easy to find and recover. This can be of
particular interest for legal discovery, risk management and compliance. Written methods and set
of guidelines for data classification should determine what levels and measures the company will
use to organize data and define the roles of employees within the business regarding input
stewardship. Once a data-classification scheme has been designed, security standards that stipulate
proper approaching practices for each division and storage criteria that determine the data’s
lifecycle demands should be discussed.
Variable refers to quantity or attribute whose value varies from one investigation to another.
Derived from the word “vary” which means to differ or change. Variable is the characteristic which
varies or differ or changes from person to person, time to time, place to place etc.
Kinds of Variable
Discrete Variable
Variables which are capable of taking an only exact value and not any fractional value are termed as
discrete variables.
[22]
For example, a number of workers or number of students in a class is a discrete variable as they
cannot be in fraction. Similarly, a number of children in a family can be 1, 2 or so on, but cannot be
1.5, 2.75.
Continuous Variable
Those variables which can take all the possible values (integral as well as fractional) in a given
specified range are termed as continuous variables.
To aid comparison.
Grade I 82 34
Grade II 74 43
Grade III 92 27
Grade IV 87 30
Grade V 90 25
Grade VI 75 22
Example of classification in a table Gender-wise and class-wise information about students in School
[23]
5. Tabulation
Tabulation is a systematic & logical presentation of numeric data in rows and columns to facilitate
comparison and statistical analysis. It facilitates comparison by bringing related information close to
each other and helps in further statistical analysis and interpretation.
In other words, the method of placing organized data into a tabular form is called as tabulation. It
may be complex, double or simple depending upon the nature of categorization.
1. To simplify the complex Data. It reduces the bulk information, i.e. raw data in a
simplified and meaningful form so that it could be easily by a common man in less
time.
2. To bring out essential features of the Data. It brings out the chief/main
characteristics of data, and it presents fact clearly and precisely without textual
explanation.
4. To Facilitate Statistical Analysis. Tables serve as the best source of organized data
for further statistical analysis. The task of computing average, dispersion,
correlation, etc. becomes easier if data is presented in the form of table.
5. Saving of Space. A table presents facts in a better way than the textual form.
Types of Tabulation
For Example: The one-way table below showcases data on three hat color choices of 10 men
surveyed.
5 3 2
For Example: The two-way table below shows data on the preferred leisure activity of 50 adults,
with preferences broken down by gender.
Leisure
Dance Sports TV Total
Activity
[24]
Men 2 10 8 20
Women 16 6 8 30
Total 18 16 16 50
[25]
6. Graphical Representation
Graphical representation refers to the use of intuitive charts to clearly visualize and simply data sets.
Data is ingested into graphical representation of data software and then represented by a variety of
symbols, such as lines on a line chart, bars on a bar chart, or slices on a pie chart, from which users can
gain greater insight than by numerical analysis alone.
Representational graphics can quickly illustrate general behavior and highlight phenomenon, anomalies,
and relationships between data points that may otherwise be overlooked, and may contribute to
predictions and better, data-driven decisions. The types of representational graphics used will depend
on the type of data being explored.
Data charts are available in a wide variety of maps, diagrams, and graphs that typically include textual
titles and legends to denote the purpose, measurement units, and variables of the chart. Choosing the
most appropriate chart depends on a variety of different factors -- the nature of the data, the purpose
of the chart, and whether a graphical representation of qualitative data or a graphical representation of
quantitative data is being depicted.
1. Bar Graph
Contains a vertical axis and horizontal axis and displays data as rectangular bars with lengths
proportional to the values that they represent; a useful visual aid for marketing purposes
2. Choropleth
Use a Choropleth to compare aggregate values across regions. Choropleths are useful for spotting
outliers, but are not intended to provide detail on the values within a region.
Countries with higher turnout (green) and a lower turnout (red) than the EU28 average (68%) at their
last national elections. In comparison, the turnout at the US election 2016 was 61.4%.
[26]
3. Heat map
A colored, two-dimensional matrix of cells in which each cell represents a grouping of data and each
cell’s color indicates its relative value. Cell color indicates the relative value of the cells, from one end of
the spectrum to the other.
Heat maps are ideal for spotting outliers, which show up vividly on the color spectrum. They work best
when the number of groupings is not too large, since large numbers of groupings cause the heat map to
exceed the viewport, making comparison harder.
A geographical heat map or geo heat map represents areas of high and low density of a certain
parameter (for instance, population density, network density, etc.) by displaying data points on a real
map in a visually interactive manner. Industries like real estate, travel, food, and so on can greatly
benefit from the usage of geographical heat maps.
4. Histogram
[27]
Frequency distribution and graphical representation uses adjacent vertical bars erected over discrete
intervals to represent the data frequency within a given interval; a useful visual aid for meteorology and
environment purposes.
A histogram is a graphical display of data using bars of different heights. In a histogram, each bar groups
numbers into ranges. Taller bars show that more data falls in that range. A histogram displays the shape
and spread of continuous sample data.
Example of a Histogram
Jeff is the branch manager at a local bank. Recently, Jeff’s been receiving customer feedback saying that
the wait times for a client to be served by a customer service representative are too long. Jeff decides to
observe and write down the time spent by each customer on waiting. Here are his findings from
observing and writing down the wait times spent by 20 customers:
5. Line Graph
Line Graph displays continuous data; ideal for predicting future events over time; a useful visual aid for
marketing purposes. A line graph is usually used to show the change of information over a period of
[28]
time. This means that the horizontal axis is usually a time scale, for example minutes, hours, days,
months or years.
Example: The table shows the daily earnings of a store for five days.
Example: The table shows the daily sales in RM of different categories of items for five days.
6. Pie Chart
[29]
A Pie Chart is a type of graph that displays data in a
circular graph. The pieces of the graph are proportional
to the fraction of the whole in each category. In other
words, each slice of the pie is relative to the size of that
category in the group as a whole. The entire “pie”
represents 100 percent of a whole, while the pie “slices”
represent portions of the whole. Shows percentage
values as a slice of pie; a useful visual aid for marketing
purposes.
7. Scatter plot
The Scatter Plot displays unaggregated, row-level data as points, plotting the points along an x and y
axis. Each axis represents a quantitative measure. You can use additional measures to change the size or
color points, making the scatter plot capable of representing up to four measures for each group (x, y,
size, and color).
A diagram that shows the relationship between two sets of data, where each dot represents individual
pieces of data and each axis represents a quantitative measure. Scatter plots resemble Bubble charts,
but are used to view unaggregated data, while Bubble charts aggregate data.
Use a scatter plot chart to study the correlation between two measures, or to spot outliers or clusters in
the distribution of data. You can use a Scatter Plot to visualize any dataset, but they are most useful for
exploring large amounts of data.
Example, the local ice cream shop keeps track of how much ice cream they sell versus the noon
temperature on that day. Here are their figures for the last 12 days:
A stacked bar graph (or stacked bar chart) is a chart that uses bars to show comparisons between
categories of data, but with ability to break down and compare parts of a whole. Each bar in the chart
represents a whole, and segments in the bar represent different parts or categories of that whole.
Stacked bars do a good job of featuring the total and also providing a hint as to how the total for each
category value is divided into parts. Stacked Bar graph can have one category axis and up to two
numerical axes. Category axis describes the types of categories being compared, and the numerical axes
represent the values of the data.
Stacked Bar graph can be used to represent: Ranking, Nominal Comparisons, Part-to-whole, Deviation,
or Distribution.
[31]
Example of Ranking
9. Timeline Chart
A long bar labelled with dates paralleling it that display a list of events in chronological order, a useful
visual aid for history charting purposes. A timeline chart is an effective way to visualize a process using
chronological order. Since details are displayed graphically, important points in time can be easy seen
and understood. Often used for managing a project’s schedule, timeline charts function as a sort of
calendar of events within a specific period of time.
Standard timeline charts illustrate events accompanied by explanatory text or images. They’re used for a
variety of purposes, one of which is to narrate historical events.
Example. This standard timeline graph shows the launch year of each of the popular social media
networks. Image source: Venngage.com
[32]
Gantt Chart
A Gantt chart uses bars of varying sizes spread across a timeline to represent a task’s start date,
duration, and finish date.
A tree diagram is a new management planning tool that depicts the hierarchy of tasks and subtasks
needed to complete and objective. The tree diagram starts with one item that branches into two or
more, each of which branch into two or more, and so on. The finished diagram bears a resemblance to a
tree, with a trunk and multiple branches.
It is used to break down broad categories into finer and finer levels of detail. Developing the tree
diagram helps you move your thinking step by step from generalities to specifics.
[33]
11. Venn Diagram
A Venn diagram is an illustration that uses circles to show the relationships among things or finite
groups of things. Circles that overlap have a commonality while circles that do not overlap do not share
those traits.
Venn diagram help to visually represent the similarities and differences between two concepts
Example 1. Below, we can see that there are orange fruits (circle B) such as persimmons and tangerines
while apples and cherries (circle A) come in Sky-blue colors. Peppers and tomatoes come in both red and
orange color, as presented by the overlapping areas of the two circles.
Example 2. Below, we see that Car A is a sedan that’s powered by gasoline and gets 20 miles per gallon,
while Car B is a hybrid, gets 40 miles-per-gallon for mileage, and is a hatchback.
[34]
7. Data Cleaning
Data cleaning is the process of preparing data for analysis by removing or modifying data that is
incorrect, incomplete, irrelevant, duplicated, or improperly formatted.
This data is usually not necessary or helpful when it comes to analyzing data because it may hinder the
process or provide inaccurate results. There are several methods for cleaning data depending on how it
is stored along with the answers being sought.
Data cleaning is not simply about erasing information to make space for new data, but rather finding a
way to maximize a data set’s accuracy without necessarily deleting information.
For one, data cleaning includes more actions than removing data, such as fixing spelling and syntax
errors, standardizing data sets, and correcting mistakes such as empty fields, missing codes, and
identifying duplicate data points. Data cleaning is considered a foundational element of the data science
basics, as it plays an important role in the analytical process and uncovering reliable answers.
Most importantly, the goal of data cleaning is to create data sets that are standardized and uniform to
allow business intelligence and data analytics tools to easily access and find the right data for each
query.
1. It removes major errors and inconsistencies that are inevitable when multiple sources of data
are being pulled into one dataset.
2. Using tools to clean up data will make everyone on your team more efficient as you’ll be able to
quickly get what you need from the data available to you.
4. It allows you to map different data functions, and better understand what your data is intended
to do, and learn where it is coming from.
[35]
Data Cleaning in six step
Here are some best practices when it comes to create a data cleaning process:
1. Monitor errors
Keep a record of trends where most of your errors are coming from. This will make it a lot easier to
identify and fix incorrect or corrupt data. Records are especially important if you are integrating other
solutions with your fleet management software, so that your errors don’t clog up the work of other
departments.
Once you have cleaned your existing database, validate the accuracy of your data. Research and invest
in data tools that allow you to clean your data in real-time. Some tools even use AI or machine
learning to better test for accuracy.
Identify duplicates to help save time when analyzing data. Repeated data can be avoided by researching
and investing in different data cleaning tools that can analyze raw data in bulk and automate the process
for you.
After your data has been standardized, validated and scrubbed for duplicates, use third-party sources to
append it. Reliable third-party sources can capture information directly from first-party sites, then clean
and compile the data to provide more complete information for business intelligence and analytics.
Share the new standardized cleaning process with your team to promote adoption of the new protocol.
Now that you’ve scrubbed down your data, it’s important to keep it clean. Keeping your team in the
loop will help you develop and strengthen customer segmentation and send more targeted information
to customers and prospects.
8. Data Adjusting
You’re strategic and tactical quantitative research work – designing, programming, and fielding an online
questionnaire – result in raw data files containing all the respondents’ answers to your survey. Typically,
some form of data preparation must be completed before your analysis begins. Neglecting to carefully
prepare your raw data may jeopardize the statistical results and bias your interpretations and
subsequent findings.
Sometimes, your data must be statistically adjusted to become representative of your target population.
While this is not always necessary, it can enhance the quality of your data.
There are three techniques at your disposal: weighting, variable re-specification, and scale
transformations.
[36]
1. Weighting Data
Weighting is a statistical adjustment made by assigning a weight to each respondent in the database to
reflect that respondent’s importance relative to the other respondents. The purpose of weighting is to
increase or decrease the number of respondents in the sample that have certain characteristics so that
the sample data is more representative of the target population.
The analysis compares three primary statistical methods for weighting survey data: raking, matching and
propensity weighting.
Ranking
Matching
[37]
Propensity weighting
A key concept in probability-based sampling is that if survey respondents have different probabilities of
selection, weighting each case by the inverse of its probability of selection removes any bias that might
result from having different kinds of people represented in the wrong proportion.
2. Variable Respecification
Variable re-specification involves the transformation of data to create new variables or modify existing
variables. Respecification of variables is the process of modifying existing variables in order to create
new variables that better answer research questions.
For example, you ask respondents about purchase intent on a 7-point scale, so you have 7 different
response categories in your survey that you collapse into 3 or 4 total categories in the dataset (e.g.,
[38]
collapsing by respondents “most likely to buy” – those who select 7, 6, or 5 on the scale –, “neutral” –
those who select 4 –, and “least likely to buy” – those who select 3, 2 or 1).
Alternatively, you could create new variables that are the combination of several other variables. You
can also create new variables by taking a ratio among two existing variables.
The use of dummy variables is another type of respecification technique that uses variables that take
only two values, typically 0 or 1, to respecify categorical values. Dummy variables, also called binary,
dichotomous, instrumental, or qualitative variables, are helpful when the category coding is not
meaningful for statistical analysis. Instead, you can represent the categories with dummy variables.
For example, if you have heavy, light, and non-users coded as 3, 2, 1 respectively, you can represent
these with the dummy variables X3, X2, X1. Heavy users (X3) would = 1 in the data sheet, and the others
would = 0. Light users (X2) would = 1 in the datasheet, and all others = 0. And non-users (X3) would = 1,
with all others = 0.
3. Scale Transformation
Scale transformation involves a manipulation of scale values to ensure comparability with other scales
or otherwise make the data suitable for analysis. Frequently different scales are employed for measuring
different variables.
For example, image variables may be measured on a 7-point semantic differential scale altitude
variables on a continuous rating scale, and lifestyle variables on a 5-point Likert scale.
Therefore, it would not be meaningful to make comparisons across the measurement scales for any
respondent. To compare attitudinal scores with lifestyle or image scores, it would be necessary to
transform the various scales. Even if the same scale is employed for all the variables, different
respondents may use the scale differently. For example, some respondents consistently use the upper
end of a rating scale whereas others consistently use the lower end. These differences can be corrected
by appropriately transforming the data.
[39]