0% found this document useful (0 votes)
14 views12 pages

DSS Chapter SEVEN

Chapter 7 of HCS 410 discusses the importance of data visualization in Big Data analytics, emphasizing its role in simplifying complex data for better understanding and decision-making. It outlines various visualization techniques such as line plots, bar charts, and scatter plots, while also addressing the challenges and benefits associated with visualizing large datasets. The chapter highlights the need for appropriate visualization methods based on data type and the information being conveyed.

Uploaded by

nomore chikosi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

DSS Chapter SEVEN

Chapter 7 of HCS 410 discusses the importance of data visualization in Big Data analytics, emphasizing its role in simplifying complex data for better understanding and decision-making. It outlines various visualization techniques such as line plots, bar charts, and scatter plots, while also addressing the challenges and benefits associated with visualizing large datasets. The chapter highlights the need for appropriate visualization methods based on data type and the information being conveyed.

Uploaded by

nomore chikosi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

HCS 410 DECISION SUPPORT SYSTEMS

CHAPTER 7: DATA VISUALISATION

7.1. INTRODUCTION
Big Data analytics plays a key role through reducing the data size and
complexity in Big Data applications. Visualisation is an important approach to
helping Big Data get a complete view of data and discover data values. Big
Data analytics and visualisation should be integrated seamlessly so that they
work best in Big Data applications. Data Visualisation is used to communicate
information clearly and efficiently to users by the usage of information graphics
such as tables and charts. It helps users in analysing a large amount of data in a
simpler way. It makes complex data more accessible, understandable, and usable

In addition, Data visualisation is representing data in some systematic form


including attributes and variables for the unit of information. Visualisation-
based data discovery methods allow business users to mash up disparate data
sources to create custom analytical views. Advanced analytics can be integrated
in the methods to support creation of interactive and animated graphics on
desktops, laptops, or mobile devices such as tablets and smart phones.

7.2. VISUALISING BIG DATA


Today, organisations generate and collect data each minute. The huge amount of
generated data, known as Big Data, brings new challenges to visualisation
because of the speed, size and diversity of information that must be taken into
account. The volume, variety and velocity of such data requires from an
organisation to leave its comfort zone technologically to derive intelligence for
effective decisions. New and more sophisticated visualisation techniques based
on core fundamentals of data analysis take into account not only the cardinality,
but also the structure and the origin of such data.

Data visualisation is applied in practically every field of knowledge. Scientists in


various disciplines use computer techniques to model complex events and

1
visualise phenomena that cannot be observed directly, such as weather patterns,
medical conditions or mathematical relationships. Data visualisation provides an
important suite of tools and techniques for gaining a qualitative understanding.

7.2. TECHNIQUES FOR DATA VISUALISATION


The basic techniques are the following plots:

Line Plot
The simplest technique, a line plot is used to plot the relationship or dependence
of one variable on another. To plot the relationship between the two variables, we
can simply call the plot function.

Bar Chart
Bar charts are used for comparing the quantities of different categories or groups.
Values of a category are represented with the help of bars and they can be
configured with vertical or horizontal bars, with the length or height of each bar
representing the value.

2
Pie and Donut Charts
There is much debate around the value of pie and donut charts. As a rule, they are
used to compare the parts of a whole and are most effective when there are
limited components and when text and percentages are included to describe the
content. However, they can be difficult to interpret because the human eye has a
hard time estimating areas and comparing visual angles.

3
Histogram Plot
A histogram, representing the distribution of a continuous variable over a given
interval or period of time, is one of the most frequently used data visualization
techniques in machine learning. It plots the data by chunking it into intervals
called ‘bins’. It is used to inspect the underlying frequency distribution, outliers,
skewness, and so on.

Scatter Plot
Another common visualisation technique is a scatter plot that is a two-
dimensional plot representing the joint variation of two data items. Each marker
(symbols such as dots, squares and plus signs) represents an observation. The
marker position indicates the value for each observation. When you assign more
than two measures, a scatter plot matrix is produced that is a series of scatter plots
displaying every possible pairing of the measures that are assigned to the
visualization. Scatter plots are used for examining the relationship, or
correlations, between X and Y variables.

4
Kernel Density Estimation for Non-Parametric Data
If we have no knowledge about the population and the underlying distribution of
data, such data is called non-parametric and is best visualised with the help of
Kernel Density Function that represents the probability distribution function of a
random variable. It is used when the parametric distribution of the data does not
make much sense, and you want to avoid making assumptions about the data.

5
Box and Whisker Plot for Large Data
A binned box plot with whiskers shows the distribution of large data and easily
see outliers. In its essence, it is a graphical display of five statistics (the
minimum, lower quartile, median, upper quartile and maximum) that summarises
the distribution of a set of data. The lower quartile (25th percentile) is represented
by the lower edge of the box, and the upper quartile (75th percentile) is
represented by the upper edge of the box. The median (50th percentile) is
represented by a central line that divides the box into sections. Extreme values are
represented by whiskers that extend out from the edges of the box. Box plots are
often used to understand the outliers in the data.

Word Clouds and Network Diagrams for Unstructured Data


The variety of big data brings challenges because semi-structured and
unstructured data require new visualisation techniques. A word cloud visual
represents the frequency of a word within a body of text with its relative size in
the cloud. This technique is used on unstructured data as a way to display high- or
low-frequency words.

6
Another visualisation technique that can be used for semi-structured or
unstructured data is the network diagram. Network diagrams represent
relationships as nodes (individual actors within the network) and ties
(relationships between the individuals). They are used in many applications, for
example for analysis of social networks or mapping product sales across
geographic areas.

7
Correlation Matrices
A correlation matrix allows quick identification of relationships between
variables by combining big data and fast response times. Basically, a correlation
matrix is a table showing correlation coefficients between variables: Each cell in
the table represents the relationship between two variables. Correlation matrices
are used as a way to summarise data as an input into a more advanced analysis,
and as a diagnostic for advanced analyses.

8
Data visualisation may become a valuable addition to any presentation and the
quickest path to understanding your data. Besides, the process of visualising data
can be both enjoyable and challenging. However, with the many techniques
available, it is easy to end up presenting the information using a wrong tool. To
choose the most appropriate visualisation technique you need to understand the

9
data, its type and composition, what information you are trying to convey to your
audience, and how viewers process visual information. Sometimes, a simple line
plot can do the task saving time and effort spent on trying to plot the data using
advanced Big Data techniques. Understand your data — and it will open its hidden
values to you.

7.3. REPRESENTING DATA VISUALLY


 Number all diagrams
 Label all diagrams
 Ensure that units of measurement on axes are clearly labelled
 Place any explanatory information in footnotes below the visual
 Check layouts to ensure maximum clarity

7.4. BENEFITS OF DATA VISUALISATION


The table below shows the benefits of data visualisation.

7.5. Problems with data visualisation


There are also following problems for big data visualisation:
 Visual noise: Most of the objects in dataset are too relative to each other.
Users cannot divide them as separate objects on the screen.
 Information loss: Reduction of visible data sets can be used, but leads to
information loss.

10
 Large image perception: Data visualisation methods are not only limited
by aspect ratio and resolution of device, but also by physical perception
limits.
 High rate of image change: Users observe data and cannot react to the
number of data change or its intensity on display.
 High performance requirements: It can be hardly noticed in static
visualisation because of lower visualisation speed requirements--high
performance requirement.
 It can misrepresent information – if an incorrect visual representation is
made.
 It can be distracting – if the visual data is distorted or excessively used.

7.4. Uses of data visualisation


Data visualisation has many uses. Each type of data visualisation can be used in
different ways. We’ll get into the different types in a moment, but for now, here
are some of the most common ways data visualisation is used.

Changes over time


This is perhaps the most basic and common use of data visualisation, but that
does not mean it is not valuable. The reason it is the most common is because
most data has an element of time involved. Therefore, the first step in a lot of data
analyses is to see how the data trends over time.

Determining frequency
Frequency is also a fairly basic use of data visualisation because it also applies to
data that involves time. If time is involved, it is logical that you should determine
how often the relevant events happen over time.

Determining relationships (correlations)


Identifying correlations is an extremely valuable use of data visualisation. It is
extremely difficult to determine the relationship between two variables without a
visualisation, yet it is important to be aware of relationships in data. This is a
great example of the value of data visualisation in data analysis.

11
Examining a network
An example of examining a network with data visualisation can be seen in market
research. Marketing professionals need to know which audiences to target with
their message, so they analyse the entire market to identify audience clusters,
bridges between the clusters, influencers within clusters, and outliers.

Scheduling
When planning out a schedule or timeline for a complex project, things can get
confusing. A Gantt chart solves that issue by clearly illustrating each task within
the project and how long it will take to complete.

Analysing value and risk


Determining complex metrics such as value and risk requires many different
variables to be factored in, making it almost impossible to see accurately with a
plain spreadsheet. Data visualisation can be as simple as color-coding a formula
to show which opportunities is valuable and which are risky.

7.5. DIMENSIONALITY
Dimensionality in statistics refers to how many attributes a dataset has. For
example, healthcare data is notorious for having vast amounts of variables (e.g.
blood pressure, weight, cholesterol level). In an ideal world, this data could be
represented in a spreadsheet, with one column representing each dimension. In
practice, this is difficult to do, in part because many variables are inter-related
(like weight and blood pressure).

7.5.1. High Dimensional Data


High Dimensional means that the number of dimensions is staggeringly high —
so high that calculations become extremely difficult. With high dimensional data,
the number of features can exceed the number of observations. For example,
microarrays, which measure gene expression, can contain tens of hundreds of
samples. Each sample can contain tens of thousands of genes.

Task: Read more on the main challenges of High Dimensional Data.

12

You might also like