Unit 1

om 3
Introduction Data visualization

The ways we structure and visualize information are changing rapidly and getting more complex with each
passing day. Thanks to the rise of social media, the ubiquity of mobile devices, and service digitaliza-
tion, data is available on any human activity that utilizes technology. The generated information is
hugely valuable and makes it possible to analyze trends and patterns, and to use big data to draw connections
between events. Thus, data visualization can be an effective mechanism for presenting the end user with
understandable information in real time.

Every company has data, be it to communicate with clients and senior managers or to help manage
the organization itself. It is only through research and interpretation that this data can acquire meaning and
be transformed into knowledge.
What is data visualization?
Data visualization is the process of acquiring, interpreting and comparing data in order to clearly communicate complex ideas, thereby facilitating the identification and analysis
of meaningful patterns.

Data visualization can be essential to strategic communication: it helps us interpret available data; detect patterns, trends, and anomalies; make
decisions; and analyze inherent processes.
All told, it can have a powerful impact on the business world.

The data
visualization process

Several different fields are involved in the data

visual- ization process, with the aim of simplifying
or revealing existing relationships, or discovering
something new within a data set.

Visualization process1

Filtering & processing. Refining and cleaning data

to convert it into information through analysis,
interpreta- tion, contextualization, comparison, and

Translation & visual representation. Shaping

the visual representation by defining graphic
resources, language, context, and the tone of the
representation, all of which are adapted for the

Perception & interpretation. Finally, the

visualization becomes effective when it has a
perceptive impact on the construction of
Why is data
All of this indicates that human beings are better Identifying the evolution of sales over the course of
visualization so at processing visual information, which is lodged the year isn’t easy. However, when we present the

important in in our long-term memory. same information in a visual, the results are much
clearer (see the graph below).
reports and Consequently, for reports and statements, a visual
rep- resentation that uses images is a much more The graph takes what the numbers cannot communi-
statements? effective way to communicate information than text cate on their own and conveys it in a visible,
or a table; it also takes up much less space. memorable way. This is the real strength of data
We live in the era of visual information, and visual
content plays an important role in every moment of This means that data visuals are more

our lives. A study by SH!FT Disruptive Learning attractive, simpler to take in, and easier to

demon- strated that we typically process images remember.

60,000 Graphical excellence is that which gives

times faster than a table or a text, and that our Try it for yourself. Take a look at this table: to the viewer the greatest number of
brains typically do a better job remembering them in ideas in the shortest time with the
Month Jan Feb Mar Apr May Jun
the long term. That same research detected that least ink in the smallest space.”
after three days, analyzed subjects retained between Sales 45 56 36 58 75 62 - Edward Tufte (2001)
10% and 20% of written or spoken information,
compared with 65% of visual information.

The rationale behind the
power of visuals:
• The human mind can see an image for just 13
mil- liseconds and store the information, 75 62
56 58
provided that it is associated with a concept. 60
Our eyes can take in 36,000 visual messages 45
per hour. 40

• 40% of nerve fibers are connected to the
Jan Feb Mar Apr May Jun
Data types, 2 kinds of data
relationships, and Before we talk about visuals themselves, we must first understand the

visualization different kinds of data that can be visualized and how they relate to one

formats The most common kinds of data are4:

There are a number of methods and approaches to

creating visuals based on the nature and 1)Quantitative (numeric) 2) Qualitative (categoric)
complexity of the data and the information.
Different kinds of graphics are used in data Data that can be quantified and measured. This This kind of data is divided into categories based
visualizations, including representations of kind of data explains a trend or the results of on non-numeric characteristics. It may or may not
statistics, maps, and diagrams. These schematic, research through numeric values. This category of have a logical order, and it measures qualities and
visual representations of content vary in their data can be further subdivided into: generates categorical answers. It can be:
degree of abstraction.

In order to communicate effectively, it is important • Discrete: Data that consists of whole numbers (0, 1, • Ordinal: Meaning it follows an order or sequence.
to understand different kinds of data and to 2, 3...). For example, the number of children in a That might be the alphabet or the months of the
establish visual relationships through the proper family. year.
use of graphics. Enrique Rodríguez (2012), a data • Continuous: Data that can take any value within • Categorical: Meaning it follows no fixed order.
analyst at DataNauta, once explained in an interview an interval. For example, people’s height For example, varieties of products sold.
that... (between 60 - 70 inches) or weight (between 90
and 110 pounds).

A good graphic is one that

synthesizes and contextualizes all of
the information that’s necessary to
understand a situa- tion and decide
how to move forward.” Quantitative Qualitative
Data relationships
Data relationships can be simple, like the progress of a single metric over time (such as visits to a blog over the course of 30 days or the number of users on a
social network), or they can be complex, precisely comparing relationships, revealing structure, and extracting patterns from data. There are seven data
relationships to consider:

Ranking: A visualization that relates two or more

Nominal comparisons: Visualizations that Series over time: Here we can trace the changes in
values with respect to a relative magnitude. For
compare quantitative values from different the values of a constant metric over the course of
example: a company’s most sold products.
subcategories. For example: product prices in time. For example: monthly sales of a product over
various supermarkets. the course of two years.

Correlation: Data with two or more variables that can

demonstrate a positive or negative correlation with one
another. For example: salaries based on level of

Deviation: Examines how each data point relates to

the others and, particularly, to what point its value
differs from the average. For example: the line of
deviation for tickets to an amusement park sold on a
Partial and total relationships: Show a subset of data
rainy versus a normal day.
as compared with a larger total. For example: the per-
centage of clients that buy specific products.
Distribution: Visualization that shows the distribu- tion
of data spatially, often around a central value.
For example: the heights of players on a basketball team.

11 formats 1. Bar chart

Bar charts
There are two types of visualizations: static and interactive. aredepends
Their use one of the
on most popular
the search ways
and of
analysis dimension
They level. Static
are very visualsand
versatile, canthey
analyze data in one dime
visual- izing data because they present a data set
used to compare discrete categories, to analyze
As with any other form of communication, familiar- ity with in athe codeunderstood
quickly and resources thatthat
format are enables
available to us is essential if we’re going to use them successfully our goal. In this p
is listed in order of popularity in the “Visualization Universe” project by Google News Lab and Adioma, changes over time, or to compare parts of a
as of the publication of this report. viewers to identify highs and lows at a glance.
The three variations on the bar chart are:

Vertical column Horizontal column Full stacked column

Used for chronological data, and Used to visualize Used to visualize categories
it should be in left-to-right categories. that collectively add up to
format. 100%.




4,500 Education Jan



Entertainment Feb



Jan Feb Mar Apr May 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%

1. Histograms

Histograms represent a variable in the form of
bars, where the surface of each bar is 350K

proportional to the frequency of the values 300K

represented. They offer an overview of the 101-120
distribution of a population or sample with respect to
a given characteristic.
200K 81-100

The two variations on the histogram are: 150K


• Vertical columns 50K

• Horizontal columns <60

25 30 35 40 45 50 55 60 65 -60 -40 -20 0 20 40 60

Vertical Horizontal columns


2. Pie charts

Pie charts consist of a circle divided into sectors,

each of which represents a portion of the total.
They can be subdivided into no more than five data
groups. They can be useful for comparing discrete or
continuous data. The two variations on the pie chart

• Standard: Used to exhibit relationship between

• Donut: A stylistic variation that facilitates the
inclu- sion of a total value or a design element in
the center.

Standard pie chart Donut pie chart

3. Scatter plots
1.0 30.000

Scatter plots use the spread of points over a 25.000

Car- tesian coordinate plane to show the
relationship between two variables. They also 20.000
help us determine whether or not different groups
of data are correlated.

0. 5.000
0.2 0.4 0.6 0.8 1.0 1.2

Scatter plot Scatter plot with grid

4. Heat

Heat maps represent individual values from a data

set on a matrix using variations in color or color
intensity. They often use color to help viewers com- E
pare and distinguish between data in two different
categories at a glance. They are useful for visualizing D
webpages, where the areas that users interact with most 2
are represented with “hot” colors, and the pages that C
receive the fewest clicks are presented in “cold” colors.
B 0
The two variations on the heat map are:
• Mosaic diagram
• Color map 1 2 3 4 5 6 0% 10% 30% 50% 70% 100%

Mosaic diagram Color map

5. Line charts 6. Bubble charts 7. Radar charts
These are used to display changes or trends in These graphics display three-dimensional data These are a form of representation built around
data over a period of time. They are especially and accentuate data in dispersion diagrams and a regular polygon that is contained within a
useful for showcasing relationships, acceleration, maps. Their purpose is to highlight nominal circle, where the radii that guide the vertices
deceleration, and volatility in a data set. comparisons and classification relationships. The are the axes over which the values are

size and color of the bubbles represent a dimension represented. They are equivalent to graphics with

that, along with the data, is very useful for visually parallel coordinates on polar coordinates. Typically,

stressing specific values. The two variations on the they are used to represent the behavior of a metric

bubble chart are: over the course of a set time cycle, such as the hours
of the day, months of the year, or days of the week.

• The bubble plot: used to show a variable in

three dimensions, position coordinates (x, y)
and size.

Line chart
• Bubble map: used to visualize
three-dimensional values for
geographic regions.

8. Waterfall charts

These help us understand the cumulative
effect of positive and negative values on
variables in a sequential fashion. 300K






Start A B
C D E F G H I J K End

Fall Rise

9. Tree maps

Tree maps display hierarchical data (in a tree

struc- ture) as a set of nested rectangles that 200
occupy sur- face areas proportional to the
value of the variable they represent. Each tree
branch is given a rectangle, which is later placed in B C
80 120
a mosaic with smaller rectangles that represent
secondary branches. The finished prod- uct is an
intuitive, dynamic visual of a plane divided into
areas that are proportional to hierarchical data, D E F G H
30 50 20 40 60
which has been sorted by size and given a color key.
10. Area charts Selecting the right graphic to effectively communicate throug

1.0 Choose a graphic that will capture the viewer’s attention fo

These represent the relationship of a series
over time, but unlike line charts, they can Represent the information in a simple, clear, and precise w
represent volume. The three variations on the
area chart are: Make it easy to compare data; highlight trends and differe
Establish an order for the elements based on the quantity t
• Standard area: used to display or compare a
pro- gression over time. Give the viewer a clear way to explore the graphic and
• Stacked area: used to visualize relationships as 0.2
part of the whole, thus demonstrating the
contribution of each category to the cumulative 0
1 2 3 4 5 6
• 100% stacked area: used to communicate the
dis- tribution of categories as part of a whole, Standard area
where the cumulative total does not matter.

0 1 2 3 4 5 6 0
0 1 2 3 4 5 6


Stacked area
100% stacked area

