UNIT 5 BDT.pptx

Download as pdf or txt
Download as pdf or txt
You are on page 1of 132

CSP31A Big Data

Analytics
School of Computer Engineering and Technology
Big Data Analytics(PE-1)

(Computer Engineering and Technology)


(TYB.Tech)
UNIT V
Big Data Visualization Techniques
UNIT V
Big Data Visualization Techniques:
Introduction to data visualization, Data visualization factors, Challenges
in Data Visualization, Analytics Techniques: Basic charts scatter plots,
Histogram, advanced visualization Techniques Tree Map Circle packing,
Sunburst Circular Network Diagram Parallel Coordinates Streamgraph,
Plots, Graphs, Networks, Hierarchies, Reports, Introduction to D3.js
Case study: Google Analytics /Twitter Analytics
PowerBI
Introduction to Big Data Visualization
• Big data visualization is a crucial aspect of working with large and
complex datasets.
• It involves representing data in visual forms like charts, graphs, maps,
and other visual elements to help users understand and extract insights
from the data more effectively.
• As the volume, variety, and velocity of data continue to increase,
visualization techniques become essential for making sense of the
information contained within big data sets.
Introduction to Big Data Visualization

•Big data often contains vast amounts of information that can be


challenging to comprehend through traditional methods.
•Visualization helps in identifying patterns, trends, outliers, and
relationships within the data.
•It simplifies complex data for decision-makers, making it easier to draw
meaningful conclusions.
Why is big data visualization important?
The short answer is that humans don’t have the capability to quickly make sense of large volumes of raw
statistical information. Our eyes are not drawn to numbers, but to colors and patterns, so if we see a chart, we
can quickly identify trends and patterns, and understand the meanings behind them.

The longer answer is that it enables us to:


1.Review large amounts of data
-The graphical form enables us to quickly make sense of large amounts of data — much faster than going
over raw numbers
2.Spot trends
-Spotting trends within data is extremely complex, but techniques for big data visualization can make it
much easier and faster — that’s important because a trend that is spotted early is an opportunity that can
be acted upon
3.Identify correlations
-Big data visualization enables us to explore entire data sets to gather insights — identifying patterns and
relationships in data can provide businesses with huge competitive advantages
4.Present the data to others
-Visualization techniques are also an effective way to communicate any insights to others — they convey
meaning very quickly and in a way that it is easy to understand
Data visualization factors

• Clarity Ensure the dataset is complete and relevant. This enables the
Data Scientist to use the new patterns obtained from the data in the
relevant places
• Accuracy Ensure you use appropriate graphical representation to
convey the intended message
• Efficiency Use efficient visualization techniques that highlight all the
data points
There are some basic factors that one needs to be aware of before
visualizing the data:
• The visual effect includes the usage of appropriate shapes, colors, and sizes
to represent the analyzed data
• The coordinate system helps organize the data points within the provided
coordinates
• The data types and scale choose the type of data, for example, numeric or
categorical
• The informative interpretation helps create visuals in an effective and easily
interpretable manner using labels, title, legends, and pointers
• Diversity and heterogeneity in big data creates a big problem while
visualizing that data
• Analysis speed is most challenging factor in Big data Analysis
• Handling Big data scalability the cloud computing and advanced GUI are
combined with the big data
• Usually data is unstructured, to visualize Tables, texts, trees,
graphs, and other meta data is used
• Providing huge Parallelization is a challenge in big data
Visualization
• High Complexity and High dimensionality during Discovery
process due to huge amount of data
• It is difficult to design new big data visualization tool which
results efficiency
• Due to the large size and dimensions of big data the visualization
becomes more challenging
Challenges to Big Data visualization
• Data Volume:
• Big data sets can be massive, containing billions or even trillions of data points. Visualizing such large volumes of
data can strain the computational resources and slow down rendering times.
• Data Variety:
• Big data often includes a wide variety of data types, including structured and unstructured data, text, images,
sensor data, and more. Visualizing this diverse data requires flexibility in visualization techniques.
• Data Velocity:
• Real-time or near-real-time data streams are common in big data applications. Visualizations need to handle
constant updates and present information as it arrives, which can be challenging for traditional visualization tools.
• Data Quality:
• Big data can suffer from data quality issues such as missing values, outliers, noise, and inconsistencies.
Visualizations need to address and possibly filter out these issues while providing accurate insights.
• Scalability:
• Traditional data visualization tools may not scale well to accommodate large and growing data sets.
Scalable visualization solutions are required to handle big data effectively.
• Interactivity:
• Users often expect interactive features in big data visualizations to explore data from different
perspectives. Building interactive features that perform well with large data sets can be complex.
• Comprehension and Cognitive Load:
• With large and complex data sets, there's a risk of overwhelming users with too much information. Effective big
data visualizations must strike a balance between showing detailed insights and avoiding cognitive overload.
• Performance and Rendering:
• Rendering large data sets in real-time can strain both hardware and software resources. Achieving acceptable
performance while visualizing big data is a constant challenge.
• Data Integration:
• Big data often comes from diverse sources and platforms. Integrating these data sources for meaningful
visualization can be complex and may require data preprocessing and transformation.
• Security and Privacy:
• Big data may contain sensitive or confidential information. Protecting data security and privacy while visualizing
data is crucial, especially in regulated industries.
• Dimensionality Reduction:
• Visualizing high-dimensional data can be challenging. Techniques for reducing the dimensionality while
preserving important information are necessary for effective big data visualization.
• Tool Selection:
• Choosing the right visualization tools and software for big data can be a challenge. Not all tools are well-suited
for handling large and complex data sets, so making the right selection is crucial.
• User Expertise:
• Users may lack the expertise to interpret complex visualizations, especially in domains with specialized
knowledge requirements. Designing visualizations that are understandable to the intended audience is essential.
Analytical techniques used in Big Data visualization
• Analytical techniques play a crucial role in extracting meaningful insights from big data visualizations.
These techniques help analysts and data scientists uncover patterns, relationships, and trends within large
and complex datasets.
• Descriptive Analytics:
• Descriptive analytics involve summarizing and aggregating data to provide an overview of its
characteristics. This includes basic statistics like mean, median, mode, range, and measures of central
tendency.
• Exploratory Data Analysis (EDA):
• EDA techniques help in uncovering patterns and trends within the data. Common EDA methods
include data profiling, scatter plots, box plots, and histograms to gain initial insights into the data
distribution.
• Correlation Analysis:
• Correlation analysis helps identify relationships between variables. Techniques like Pearson's
correlation coefficient and Spearman's rank correlation are used to measure the strength and direction
of relationships.
• Regression Analysis:
• Regression analysis is used to model and understand the relationship between a dependent variable
and one or more independent variables. It's valuable for predicting outcomes and understanding
causality.
• Cluster Analysis:
• Cluster analysis is used to group data points into clusters based on their similarities. Techniques like k-means
clustering and hierarchical clustering can be applied to find hidden patterns in the data.
• Principal Component Analysis (PCA):
• PCA is a dimensionality reduction technique that helps reduce the complexity of high-dimensional data while
retaining as much relevant information as possible.
• Time Series Analysis:
• Time series analysis is used for data that varies with time. Techniques like moving averages, exponential
smoothing, and autoregressive integrated moving average (ARIMA) are employed to model and forecast
time-series data.
• Text Analysis and Natural Language Processing (NLP):
• When dealing with textual data, NLP techniques are used to extract insights. These techniques include sentiment
analysis, topic modeling, and text classification.
• Machine Learning and Predictive Analytics:
• Machine learning algorithms are employed for predictive analytics, classification, regression, and anomaly
detection. Techniques such as decision trees, random forests, support vector machines, and neural networks are
used.
• Graph Analysis:
• Graph analysis techniques are applied to data with complex relationships, such as social networks or network
data. Graph algorithms like centrality, community detection, and network connectivity analysis are used.
• Geospatial Analysis:
• Geospatial analytics is crucial when working with location-based data. It includes techniques for spatial
interpolation, hotspot analysis, and spatial clustering.
• Time Series Forecasting:
• Time series forecasting techniques like exponential smoothing, ARIMA, and Prophet are
used to make predictions based on historical time-series data.
• Anomaly Detection:
• Anomaly detection techniques identify outliers or irregularities in the data, which can be
indicative of errors, fraud, or other significant events.
• Simulation and Monte Carlo Analysis:
• Monte Carlo simulations are used to model complex systems, analyze risks, and estimate
probabilities by generating multiple random samples of data.
• Data Mining:
• Data mining techniques, including association rule mining and frequent pattern mining,
help discover hidden patterns and relationships in large datasets.
Big Data Visualization Methods
Big Data Visualization Methods include
▪ Bar Charts and Histograms:
▪ Bar charts are used to represent categorical data, while histograms display the distribution of continuous data
by dividing it into bins or intervals. They are effective for summarizing data and identifying patterns.
▪ Line Charts:
▪ Line charts are ideal for visualizing trends over time. They connect data points with lines, making it easy to
see how values change continuously.
▪ Scatter Plots:
▪ Scatter plots display individual data points as dots on a two-dimensional graph. They are useful for
identifying relationships, correlations, and outliers in the data.
▪ Heatmaps:
▪ Heatmaps use color-coding to represent data density and relationships. They are commonly used in fields
like biology and finance to visualize large datasets.
▪ Tree Maps:
▪ Tree maps are hierarchical visualizations that display data in nested rectangles. They are effective for
showing the structure and distribution of data within categories and subcategories.
▪ Choropleth Maps:
▪ Choropleth maps use color shading to represent data values on geographic maps. They are frequently used
in applications involving regional or spatial data.
▪ Network Graphs:
▪ Network graphs visualize relationships between data points in a network or graph structure. Nodes
represent entities, and edges represent connections or interactions between them.
▪ Parallel Coordinates Plots:
▪ Parallel coordinates plots are used for visualizing high-dimensional data. Each axis represents a different
variable, and lines connecting points across the axes reveal patterns and relationships.
▪ Sankey Diagrams:
▪ Sankey diagrams show the flow of data or resources from one category to another. They are often used to
depict energy flows, financial transactions, or process analysis.
▪ Word Clouds:
▪ Word clouds display text data, with word size and color indicating word frequency. They are commonly
used to summarize and highlight significant terms in textual data.
▪ Dendrogram:
▪ Dendrograms are hierarchical tree-like visualizations used in clustering and taxonomy analysis. They show
the grouping and relationships between data points.
• Streamgraphs:
• Streamgraphs are used to display data over time, showing the evolution of multiple variables as stacked, flowing
streams. They provide insights into the changing composition of data.
• 3D Visualizations:
• 3D visualizations add an extra dimension to data representation, allowing for more complex and spatial data
exploration. They are used for various applications, such as geospatial data analysis.
• Virtual Reality (VR) and Augmented Reality (AR):
• Emerging technologies like VR and AR enable immersive data exploration, providing new ways to interact with
and visualize big data in three dimensions.
• Dashboard and Data Storytelling:
• Dashboards combine multiple visualizations to provide an integrated view of key metrics and data. Data
storytelling involves creating a narrative around visualizations to convey insights effectively.
• Interactive Visualizations:
• Interactive visualizations allow users to explore data by interacting with charts and graphs, changing parameters,
and filtering data dynamically.
• Real-time Visualization:
• Real-time visualizations continuously update data as it streams in, providing immediate insights and allowing for
monitoring and decision-making in real-time.
Bar chart
• Bar charts are similar to column charts — compared to them,
bar charts have reversed axes and the number of bars can be
much larger.
Line chart
• Line charts are used to show resulting data relative to a
continuous variable — in most cases either time or money.
Pie chart
• Pie charts are used to compare the parts of a whole with the
angle and the arc being proportional to the value represented —
they are most effective when combined with text and
percentages to describe the content.
Scatter plot
• Scatter plots show two variables in the form of points on a
coordinate system — by observing the distribution of the data
points, we can deduct the correlation between the variables.
Box plot

• Box plots display a distribution of data across groups based on


a five-number summary — minimum, first quartile, median, third
quartile and maximum.
Heatmap

• Heatmaps or choropleths show the relationship between two


variables and provide a rating — this is displayed through
various colors or color saturation.
Histogram

• Histograms represent the distribution of a continuous variable


over a given period of time — they give an estimate as to where
the values are concentrated, what are the extremes and
whether there are any gaps or unusual values.
Advanced visualization Techniques:
Tree Map
• A Tree Map is a hierarchical visualization that represents data as
nested rectangles, with each level of the hierarchy corresponding to a
different level of nesting.
• The size and color of the rectangles can be used to convey additional
information about the data within each hierarchy.
• TreeMap is represented by a root rectangle, divided into groups, also
represented by the smaller rectangles which correspond to data
objects from a set.
• This method of visualization is used for hierarchical data
two-dimensional.
• The treemap method can be applied to large data volumes; iteratively
representing data layers for each level of thehierarchy.
• This method satisfies the large data volume criterion. However, the
method can only show two data dimensions presented by size and
color shapes. And the data representation appears at one moment in
time. So the criterion data variety and dynamicity are not met in this
method
Tree Map:
• Advantages:
(i) Hierarchical grouping clearly shows data relations;
(Ii)Extreme outliers are immediately visible using special color.
• Disadvantages:
(i) Data must be hierarchical and, even more, Tree Maps are better for analyzing data
sets where there is at least one important quantitative dimension with wide variations;
(ii) Not suitable for examining historical trends and time patterns;
(iii) The factor used for size calculation cannot have negative values
Circle Packing
• Circle packing is a visualization technique that represents
hierarchical data as a series of nested circles, where each
circle's size and position indicate the hierarchy and
relationships among data elements.
• The sizes of circles are proportional to the values of the data
points they represent.
• it is an alternative to Treemap that uses circles instead of
rectangles.
• The Primitive shape is a circle; which can include circles as
presented in the figure.
• The most advantage of this method is the possibility to place
and percept a lot of objects with many levels of hierarchy.
• The area of each circle presents an attribute such as quantity.
Color may be used to present the second fact.
• This method looks more beautiful, but it is not as
space-efficient as a Treemap, as there is a lot of empty spaces
within the circles.
Circle Packing:
• Advantages:
(i) space-efficient visualization method compared to Tree map
• Disadvantages:
(i) Data must be hierarchical and, even more, tree maps are better for analyzing data sets
where there is at least one important quantitative dimension with wide variations;
(Ii) Not suitable for examining historical trends and time patterns;
(Iii) The factor used for size calculation cannot have negative values
Sunburst:
• A Sunburst diagram is a radial
representation of hierarchical data.
• It's useful for displaying data with
multiple levels of categorization.
• Each ring represents a level in the
hierarchy, and segments within the
rings show data distribution.
• This method is a directive of treemap:
it converted to a polar coordinate
system.
• It is more flexible and allows repaint
the whole diagram by changing the
radius and arc length.
• It allows understanding large
amounts of data using efficient and
intuitive graphic.
Sun Burst:
• Advantages:
i) easily perceptible by most humans
• Disadvantages:
(i) Data must be hierarchical and, even more, Tree Maps are better for
analyzing data sets where there is at least one important quantitative
dimension with wide variations;
(ii) Not suitable for examining historical trends and time patterns;
(iii) The factor used for size calculation cannot have negative values
Let’s visualize the tabular data using Sunburst Charts in ChartExpo.
Categor Sales
Months Country
y Orders
Sales June USA 75
Sales June United Kingdom 67
Sales June Germany 90
Sales July USA 75
Sales July United Kingdom 67
Sales July Germany 90
Sales April USA 67
Sales April United Kingdom 90
Sales April Germany 75
Sales May USA 67
Sales May United Kingdom 90
Sales May Germany 75
•To visualize the data (above) with Sunburst (one of the hierarchical data visualization charts), copy the table above into your Google
Sheets.
•Click the Add-on button>ChartExpo — Best Data Visualization Tool button>Open.
•Click the Create New Chart button to access your fully stocked library of charts.
•Click the Search Box and type “Sunburst Charts.” It should pop up together with other charts.
•Select the sheet holding your data and select the Metrics option. Fill in the numerical numbers (in our scenario, we’ll fill in sales Order)
•Select the Dimensions button and fill in the dimensional data (in our example, we’ll use category, months, and country)
•Visualizing your data with hierarchy charts does not have to be complex. ChartExpo makes the whole process seamless and easy.
•Finish the simple process by clicking the Create Chart button.
Circular Network Diagram:
• This chart visualizes the inter-relationships
between entities.
• Data object are placed around a circle and
linked by curves based on the rate of their
relativeness.
• Color can be used to group the data into
different categories, which aids in making
comparisons and distinguishing groups.
• So, this method directly links several
objects and shows how relative it is.
• It is an elegant and compact way to show
networks of relations between items such
as products, individuals or groups.
Migration flows among regions of the world for
four five-year periods between 1990 and 2010.
Migrated to are hanging and migrated from are
originating from a specific color.
Circular Network Diagram :
• advantages:
(i) allows us to make relative data representation, which can be easily
percepted
(ii) within the circle, the resolution varies linearly, increasing with radial
position. This makes the center of the circle ideal for compactly displaying
summary statistics or indicating points of interest.
• disadvantages:
(i) method may end in imperceptible representation formandmay need
regrouping of data objects on the screen
(ii) objects with the smallest parameter weight can be suppressed by larger
ones, ending up in total mess onto the diagram
Parallel Coordinates:
• “It is a widely used visualization
technique for multivariate data and
high-dimensional geometry.“
• Parallel Coordinates is a technique
for visualizing high-dimensional
data by drawing a series of parallel
axes, each representing a different
data dimension
• Data points are connected by lines
to reveal relationships or patterns
in the data.
Parallel Coordinates :
• advantages:
(i) factors ordering does not influence total diagram perceptions
(ii) method allows us to analyze both whole data set of objects at once and individual data
objects
• disadvantages:
(i) method has limitation to the number of factors, shown at once
(ii) visualization dynamic data end up in changing whole data representation
Streamgraph:
• A Streamgraph is a stacked area chart that visualizes
the change in composition of data over time.
• It shows how different categories contribute to a
whole over a period, with smooth flowing areas.
• It is a “type of a stacked area graph, which is
displaced around a central axis, resulting in flowing
and organic shape.”
• Series of similar events are displayed in the timeline.
• Unstructured text is supported by this method.
• This method supports one data dimension, but it can
be applied to large data.
This beautiful streamgraph, created
by data journalist Talia Bronshtein,
plots the nationality of different
immigrants to the United States over
200 years (1820 to 2015). And its
findings jump right out. For instance,
we can immediately see that during
the wartime period (1939-1945)
immigration to the US almost
stopped.

We can also see that while most


immigrants before WW2 came from
countries like Austria-Hungary, Italy,
and Russia; by the late 2000s, the
bulk of immigration was coming
from Asian and South American
countries.
Big Data Visualization Methods

Comparison of Big Data Visualization Methods based on various


data, large volumes data, and handles changes in time data

+ indicates satisfying the said criteria


Plots:
Plots are graphical representations of data, often used to visualize trends,
relationships, and distributions. Common types of plots include scatter plots, line
plots, bar charts, and histograms.

•Example: A scatter plot can be used to visualize the relationship between two
variables, such as plotting student scores on a math test (x-axis) against scores on
a science test (y-axis). Each point on the plot represents a student, and the pattern
of points can reveal the correlation between the two subjects.
Graphs:
Graphs are visual representations of data that emphasize the relationships
and connections between data points. They are used to model and analyze
networks, dependencies, and interactions.

•Example: A social network graph illustrates how people are connected


on social media. Each individual is a node, and connections (friendships
or interactions) between individuals are represented as edges or links.
Graphs can help analyze network structures and identify influential
nodes.
Networks:
• Networks are specialized graphs used to model complex relationships,
such as social networks, transportation systems, and the World Wide
Web. They can also represent biological, economic, or communication
networks.
• Example: A transportation network graph represents a city's roads,
highways, and public transportation routes. Nodes can represent
intersections, bus stops, or train stations, while edges denote the
connections between them. Network analysis helps optimize
transportation routes and infrastructure.
Reports:
• Reports are textual or visual summaries of data that provide detailed
insights, analysis, and recommendations. Reports often include tables,
charts, and visualizations to convey information effectively.
• Example: A financial report for a company may include tables
showing revenue, expenses, and profit figures, along with line charts
depicting revenue growth over time. Reports help stakeholders make
data-driven decisions by presenting information in a structured and
informative manner.
• Each of these components plays a unique role in data visualization,
and the choice of which to use depends on the nature of the data and
the objectives of the analysis.

• They can be combined and integrated to create comprehensive and


insightful data presentations, aiding in data exploration, analysis, and
decision-making.
Introduction to D3.js
• D3.js, or Data-Driven Documents, is a JavaScript library commonly
used for creating dynamic and interactive data visualizations in web
browsers.
• D3.js provides a powerful set of tools for binding data to DOM
(Document Object Model) elements and creating visualizations that
can be easily updated as the data changes.
• It is a JavaScript library to manipulate documents based on data,
which-
• Draws chart
• Visualizes data
• can be used to develop real time dashboards and
• Does not provide pre-defined charts
Features of D3.js
• Extremely flexible
• You can draw any data driven shape for visualization
• Very fast and easy to use
• It makes use of existing web technologies like HTML, SVG and CSS
• Works with large datasets
• Declarative programming
• Supports data driven transformations to the document
• e.g. you can use to generate HTML table from array of numbers or you can generate bar
charts applying different transformations
• Promotes Code reusability
• Supports wide variety of curve generating functions
• Allows manipulation of Document Object Model(DOM)
with easy to use APIs
Benefits of D3.js
• Data-Driven Approach:
• D3.js is built around a data-driven approach. This means visual elements are
directly tied to data, making it easy to bind data to DOM elements and create
dynamic visualizations. When the data changes, the visualization can be
updated seamlessly.
• Flexibility and Customization:
• D3.js provides a high level of flexibility and customization. Developers have
full control over the appearance and behavior of visual elements, allowing for
the creation of unique and tailored visualizations. This flexibility is crucial for
handling diverse datasets and accommodating specific design requirements.
• Versatility of Visualizations:
• D3.js supports a wide range of visualization types, from basic charts like bar
charts and line charts to more complex visualizations like hierarchical
visualizations, choropleth maps, and force-directed graphs. This versatility
makes D3.js suitable for various data representation needs.
• Active Community and Resources:
• D3.js has a large and active community of developers. This community
contributes to a wealth of resources, tutorials, examples, and plugins that are
available online. The active community also means that developers can find
support and solutions to common challenges.
• Open Source and Extensible:
• D3.js is an open-source library, which means it is freely available and can be
modified to suit specific project requirements. The open nature of D3.js
encourages collaboration and allows developers to contribute to its
development.
• Data Transformation and Manipulation:
• D3.js includes a range of functions for transforming and manipulating data.
These functions can be used to scale data, create axes, calculate layouts,
and perform other data-related tasks, providing a comprehensive toolkit for
data manipulation.
• Compatibility with Web Standards:
• D3.js is designed to work seamlessly with web standards, making it
compatible with modern web browsers. It can be integrated into web
applications and used alongside other web technologies.
• Interactive Capabilities:
D3.js makes it easy to add interactivity to visualizations. Developers can
respond to user interactions, such as mouse clicks or hovers, to provide
additional information or enable exploration of the data. This interactivity
enhances user engagement and understanding of the data.
• Scalable Vector Graphics (SVG):
• D3.js leverages SVG for drawing visual elements. SVG is a
web-standard format for vector graphics, providing a flexible and
scalable way to create shapes, lines, and other graphical elements
directly in the browser. SVG graphics are resolution-independent and
can be easily styled with CSS.
• Transition Animations:
1. D3.js facilitates smooth transitions between different states in
visualizations. Transitions can be used to create animated effects when
data changes, enhancing the user experience and making
visualizations more engaging and informative.
Modules of D3.js
1.Selections (d3-selection):
•The d3-selection module is fundamental to D3.js, enabling the selection of DOM elements and
their manipulation. It includes methods for selecting, modifying, appending, and removing
elements in the DOM.

2.Scale (d3-scale):
•The d3-scale module provides functions for creating scales, which are used to map data values to
visual representation attributes, such as position and color. It includes linear, logarithmic, and
ordinal scales.

3.Axes (d3-axis):
•The d3-axis module facilitates the creation of axes for visualizations. It includes functions to
generate axes based on scale configurations and supports various orientations (top, bottom, left,
right).

4.Shapes (d3-shape):
•The d3-shape module assists in creating common shapes for visualizations, such as lines, areas,
curves, and symbols. It includes generators for these shapes, providing an easy way to represent
data graphically.

5.Transition (d3-transition):
•The d3-transition module handles transitions between different states in visualizations. It provides
6.Hierarchy (d3-hierarchy):
•The d3-hierarchy module is useful for working with hierarchical data structures. It
includes functions for creating tree layouts, partition layouts, and other hierarchical
visualizations.

7.Force Simulation (d3-force):


•The d3-force module is used for creating force-directed graphs. It provides a physics
simulation engine that models forces such as gravity, charge, and links to arrange
nodes in a visually pleasing way.

8.Color (d3-color and d3-scale-chromatic):


•The d3-color module provides utilities for working with colors, including color
interpolation and conversion functions. The d3-scale-chromatic module includes
pre-built color scales for creating visually appealing color mappings.

9.Time (d3-time):
•The d3-time module includes functions for working with time-based data. It provides
time scales, formats, and utilities for handling dates and times.
10.Geo (d3-geo and d3-geo-projection):
•The d3-geo module supports the creation of geographic visualizations, including map
projections, path generators, and utilities for working with GeoJSON data. The
d3-geo-projection module extends this functionality with additional map projections.

11.Voronoi Diagram (d3-voronoi):


•The d3-voronoi module helps in creating Voronoi diagrams, which partition a plane into
regions based on proximity to a set of points. This is useful for spatial data analysis.

12.Brush (d3-brush):
•The d3-brush module provides support for creating brushable areas, allowing users to
interactively select regions within a visualization. It's commonly used for creating
interactive charts with zooming and panning.
Basic Web Concepts

• HTML
• Used to structure the content of the web page
• Document Object Model (DOM)
• After reading HTML, its converted into hierarchical structure
• Cascading Style Sheets (CSS)
• CSS styles make the web pages pleasant with colors, sizes, fonts etc.
• Style sheet language describes the presentation of the HTML/XML document
• Scalable Vector Graphics (SVG)
• Is a way to render images on web
• Its not an image but a way to create images using text input
• Images created with SVG don’t distort on resizing browsers
• e.g. <rect x = "100" y = "50" width = "300" height = "200" fill="red"></rect>
• JavaScript
• Loosely typed client side scripting language which executes in local browser
• Provides interactivity to the web user interface
• Implements ECMSScript Standards
Basic Web Concepts
D3.js
• D3.js Selections
• We can select elements and apply various transformations on them.
• Data Binding
• we can populate or manipulate DOM elements in real-time.
• Creating SVG Elements
• Scalable Vector Graphics (SVG) is a way to render graphical elements and
images in the DOM.
• As SVG is vector-based, it’s both lightweight and scalable.
• D3 uses SVG to create all its visuals, and therefore it is a core building
block of the library.
• Event Handling
• D3 also supports built-in and custom events which we can bind to any
DOM element with its listener.

56
UNIT IV- Big Data Visualization Techniques
D3.js selections
• It is based on CSS selectors
• Allows us to select DOM elements (i.e.paragraph, div, head, body,
attributes, classes etc.) within the web page
• select()
• Selects one DOM element based on the CSS selector
• e.g. d3.select("body") => Selects the “body” element from the DOM
• Different manipulations can be done to the selected DOM elements
• e.g. d3.select("body").style("background-color", "black");
• d3.select("div.myclass").append("span");
• selectAll()
• Selects all elements based on CSS selector
• e.g. d3.selectAll("p") => Selects all paragraphs
• Different manipulations can be done on the selected DOM elements
• e.g. d3.selectAll("p").style("color", "blue"); => Applies style to all the selected
paragraphs
• Elements may be selected using a variety of predicates, including
containment, attribute values, class and ID
57
UNIT IV- Big Data Visualization Techniques
D3.js joins
• It is a way of joining DOM elements to the data
• e.g. If we want to join circle with each data element
• So instead of telling D3 to create circles, we are telling D3 that selection circle
should correspond to data, this concept is called data join
• Data points joined with existing elements produce “update”
section in the above image (inner overlapping section)
• Leftover unbound data produce the “enter” selection
• Any remaining unbound elements produce “exit” selection (which
represents remove elements)

58
UNIT IV- Big Data Visualization Techniques
D3.js joins

59
UNIT IV- Big Data Visualization Techniques
D3.js enter & Exit
• Below code recomputes the data join and maintains the desired correspondence between elements
and data.
• If the new dataset is smaller than the old one, the surplus elements end up in the exit selection and
get removed.
• If the new dataset is larger, the surplus data ends up in the enter selection and new nodes are added.
• If the new dataset is exactly the same size, then all the elements are simply updated with new
positions, and no elements are added or removed.
• If a given enter, update or exit selection happens to be empty, the corresponding code is a no-op

60
UNIT IV- Big Data Visualization Techniques
Introduction to SVG
• SVG stands for Scalable Vector Graphics
• XML based vector graphics format which provides ways to draw shapes like line, circle, eclipse
etc.
• Features of SVG
• Vector based image format and is text based
• Similar in structure to HTML
• SVG properties can be specified as attributes
• Works in browsers

61
UNIT IV- Big Data Visualization Techniques
Introduction to SVG
• The code with the output is as follows

62
UNIT IV- Big Data Visualization Techniques
SVG Transformations

• SVG supports transformation with the help of attribute “transform”


• Following transformations are supported
• Translate : takes 2 values tx and ty which refers to translation along x and y axis respectively
e.g. translate(20 20)
• Rotate: takes 3 options cx, cy and angle. cx and cy specifies the center of rotation in x and y
direction and angle specifies the angle of rotation. e.g rotate(60) rotates he figure by 60
degrees with cx=cy=0 (origin)
• Scale: takes sx and sy as input which refers to scaling factor along x and y axis respectively
• Skew(SkewX and SkewY): It takes single option; skew angle refers to the angle along x-axis
for skewX e.g. skewX(20)

63
UNIT IV- Big Data Visualization Techniques
Example SVG transform

64
UNIT IV- Big Data Visualization Techniques
Example SVG transform

65
UNIT IV- Big Data Visualization Techniques
Transition

• Transition is process of changing from one state to another state


• This method supports most of the selection methods like attr(), style() etc.
• Doesn’t support append and data methods, so need to be called before
transition

66
UNIT IV- Big Data Visualization Techniques
Transition Example

67
UNIT IV- Big Data Visualization Techniques
Transition Example

After Transition
Before Transition

68
UNIT IV- Big Data Visualization Techniques
Animation

• Transitions are limited form of Key Frame Animation with only 2 key frames
start and end
• duration() method
• duration method allows the transition to occur over a period of duration specified in duration
method
• e.g. Animation.html
• The values from start to end are interpolated using internal interpolate methods
• D3 also supports following interpolate methods
• interpolateNumber for numbers
• interpolateRgb for colors (rgb)
• interpolateString for strings

• delay() method
• delay method delays the transition by the amount specified in delay method

69
UNIT IV- Big Data Visualization Techniques
D3 Charts

• D3.js can be used to draw following charts


• Bar Chart
• Circle Chart
• Pie Chart
• Donut Chart
• Line Chart
• Bubble Chart etc.

70
UNIT IV- Big Data Visualization Techniques
D3 Charts

• Bar Chart
• Bar chart is used to show the comparison of the values, frequency or measure of
something for different descrete categories/groups
• e.g. Quarterly sales figures of any organization
• Bar charts can be drawn horizontally or vertically

71
UNIT IV- Big Data Visualization Techniques
D3 Charts

• Pie Chart
• Important functions
• d3.pie() – creates pie chart
• d3.arc() – to draw the arc of the pie chart
• d3.csv( filename, callbckfunction()) – used to load csv file data

72
UNIT IV- Big Data Visualization Techniques
D3 Charts

• Line Chart- Imp functions


• d3.scaleTime() - used for scaling time especially on x axis)
• d3.scaleLinear() – used for linear scaling (used on y axis)
• range() – specifies the range of the available canvas for drawing
• d3.line() – used to define line chart
• Uses .x and .y accessors to access the x and y data
• d3.csv( filename, callbckfunction()) – used to load csv file data
• x.domain – specifies range of data values on x axis
• y.domain – specifies range of data values on y axis
• d3.path – used to actually draw the lines on the canvas (takes data and line chart as
input )
• d3.axisBottom(x) – add x axis at the bottom
• d3.axisLeft(y) – add y axis at the bottom

73
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• There are numerous web analytics tools available today. Most of


these web analytics tools are part of hosted web analytics services
offered by handful of companies such as Google, IBM, and Mint
etc.
• While most features of these web analytics tools are offered as
free services, some feature are offered in premium subscription

74
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Features of Google Analytics- Google Analytics offers varied features


for businesses to get the hold over user behavior on web site, which helps businesses,
formulate their web strategies.
• Advertising and Campaign Performance
• Analysis and Testing
• Audience Characteristics and Behavior
• Cross-device and cross-platform measurement
• Product Integrations
• Sales and Conversions
• Site and App Performance

75
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Google Analytics has various products under its


umbrella such as:
• Google Analytics
• Google Analytics 360
• Google Tag Manager
• Google Big Query etc.
• These products have assisted many big brands to
achieve their milestones with their new and innovative
approach.

76
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Example: Dominos Google Analytics


• Realizes an immediate 6% increase in monthly revenue
• Saves 80% YOY in ad serving and operations costs
• Increases agility with streamlined tag management
• Obtains easy access to powerful reporting and customized
dashboards
• Achieved this by making use of Google Analytics
Premium, Google Tag Manager, and BigQuery to
integrate digital data sources and CRM data

77
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Attribution models are used to assign credit to touchpoints in the customer journey.
• Attribution aims to help marketers get a better picture of when and how various marketing
channels play contribute to conversion events. That information can then be used to inform
future budget allocations.

78
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

79
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics
• Last-click attribution.
• With this model, all the credit goes to the customer’s last touchpoint before converting.
• This model doesn’t take into consideration any other engagements the user may with the
company’s marketing efforts leading up to that last engagement.
• First-click attribution.
• It gives 100 percent of the credit to the first action the customer took on their conversion
journey.
• It ignores any subsequent engagements the customer may have had with other marketing
efforts before converting.
• Linear attribution.
• This multi-touch attribution model gives equal credit to each touchpoint along the user’s
path.

80
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics
• Time decay attribution.
• This model gives the touchpoints that occurred closer to the time of the
conversion more credit than touchpoints further back in time.
• The closer in time to the event, the more credit a touchpoint receives.
• U-shaped attribution.
• The first and last engagement get the most credit and the rest is assigned equally to the
touchpoints that occurred in between.
• In Google Analytics, the first and last engagements are each given 40 percent of the
credit and the other 20 percent is distributed equally across the middle interactions.

81
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Funnel Reporting
Example: Trump Excel sales funnel report
(Funnel Leakage

Example: Chandoo ,using the two


major elements to every sales
funnel report (i.e.the value of
opportunities/opportunities count,
and the phases)

82
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Google Analytics offers five different visualizations to


analyze your data in every report by default.
• Tabular Reports
• Pie Charts
• Performance
• Comparison
• Pivot tables

83
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Tabular Reports
• Core reporting APIs gives you access to most of the report
data in Google Analytics.
• With the Core Reporting API you can:
• Build custom dashboards to display Google Analytics data.
• Save time by automating complex reporting tasks.
• Integrate your Google Analytics data with other business applications.
• There are 3 fundamental concepts underlying the Core
Reporting API:
• How reports relate to users and views (profiles).
• The structure of a report and how to build queries
• Working with the API response

84
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Tabular Reports

85
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Tabular Reports
Tables in Google Analytics are selected as the default way to display data and is easy to understand at first
glance.
The first column displays a dimension and the rest shows the metrics.

86
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Tabular Reports
Adding a secondary dimension to the report.
Example: Add ‘source’ to see traffic source as shown below:

87
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Tabular Reports
Viewing specific page from the site (Filtering)
Example: Add ‘basket’ keyword from the URL and place it into search box as shown below:

88
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Tabular Reports
Using a regex expression to filter multiple pages, as well as, applying an advanced filter for including or
excluding multiple pages using a regex.

89
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Pie Charts
Using a regex expression to filter multiple pages, as well as, applying an advanced filter for including or
excluding multiple pages using a regex.

90
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Pie Charts
The pie chart report only consists of a single metric which makes it easier to consume and digest.
For example, To see the bounce rate for each page on website, select the option from the drop-down on the pie type
report, and the resulting pie chart will show you the bounce rate for all pages respectively in a single display.

91
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Pie Charts
To see which page resulted in a high exit rate

92
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Performance
The performance report displays the total percentage of pageviews for each URL. The first column shows the total
number and the second column shows the percentage contribution of a page to the total.

93
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Comparison
How bounce rate is changing for each specific page, this report will help. The image below shows how the bounce
rate is changing for pages with most pageviews.

94
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Pivot Table

95
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Pivot Table To see the performance of your online campaigns by each country.

96
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• In marketing, we have the concept of a purchase funnel. There are


different stages within the funnel
• that describe customer interactions. A basic purchase funnel includes
the following steps:
• Acquisition involves building awareness and acquiring user interest
• Behavior is when users engage with your business
• Conversion is when a user becomes a customer and transacts with
your business

• We can track what online behavior led to purchases and use that data to
make informed decisions about how to reach new and existing
customers.

97
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Different kinds of businesses can benefit from digital analytics:

✔ Publishers can use it to create a loyal, highly-engaged audience and to


better align on-site advertising with user interests.
✔ Ecommerce businesses can use digital analytics to understand
customers’ online purchasing behavior and better market their
products and services.
✔ Lead generation sites can collect user information for sales teams to
connect with potential leads.

98
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• How Google Analytics work?

• Tracking a Website
▪ To track a website, you first have to create a Google Analytics account. Then you
need to add a small piece of Javascript tracking code to each page on your site.
▪ Every time a user visits a web page, the tracking code will collect anonymous
information about how that user interacted with the page.
▪ The tracking code could show how many users visited a page or how many users
bought an item by tracking whether they made it to the purchase confirmation
page.

99
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Javascript Tracking Code


• Tracking a Website
▪ The tracking code will also collect information from the browser like:

▪ Language: the browser is set to.


▪ Type of Browser: like chrome, explorer, etc.
▪ Device
▪ Operating System
▪ Traffic Source: what brought users to the site in the first place This might be a
search engine, an advertisement they clicked on, or an email marketing campaign.

100
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Javascript Tracking Code


• Establish a User ID and password at google.com/analytics
• To find the tracking ID and code snippet:
• Sign in to your Analytics account.
• Click Admin.
• Select an account from the menu in the ACCOUNT column.
• Select a property from the menu in the PROPERTY column.
• Under PROPERTY, click Tracking Info > Tracking Code.

• Once you have successfully installed the Analytics tracking code, it


can take up to 24 hours for data such as traffic-referral information,
user characteristics, and browsing information to appear in your
reports.

101
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Javascript Tracking Code

102
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Processing and Reporting


• When the tracking code collects data, it packages that information up and sends
it to Google Analytics to be processed into reports.
• When Analytics processes data, it aggregates and organizes the data based on
particular criteria like whether a user’s device is mobile or desktop, or which
browser they’re using.
• Once Analytics processes the data, it’s stored in a database where it can’t be
changed.

103
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Google Analytics Setup

104
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Google Analytics Interface

105
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Goals
• At the View level, you can set Google Analytics “Goals.”
• Goals are a simple way to track conversions (or business objectives) from your
website.
• For example: A goal could be how many users signed up for an email
newsletter, or how many users purchased a product.

106
UNIT IV- Big Data Visualization Techniques
Case study: Google Analytics

• Permissions in GA

107
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Twitter is now the third most popular social network, behind


Facebook and MySpace.
• Twitter has over 100 million active users and about 50 million of
them log in every day.
• In Several opinion Twitter is not a much of social network and it
is rather a place for marketers.

108
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Twitter Analytics Tools:


• Twitter Analytics Tools are one of the best way for measuring your online
presence on twitter.
• Twitter Analytics Tools help website owners understand how much traffic
they receive from Twitter and the effectiveness of Twitter integration on
their sites.
• Twitter tools are designed to add value by presenting a different way to
visualize or analyze tweets, the people in the network, and the tweets from
the people in users’ network.

109
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

highlights for each month which includes


your top tweet, mention, follower, media
tweet and card tweet, as well as
summaries.

110
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

Twitter displays analytics for your tweets


over the last 28 days
but you can select a different time period
by clicking the Last 28 Days button on
the top right side of the page.

111
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

In the graph, you can click through a


menu of your tweets arranged by four
categories:
• Tweets – your tweets in
reverse-chronological order
• Top tweets – the tweets that got the
most impressions
• Tweets and replies – your tweets
and replies by other Twitter users
• Promoted – any promoted tweets
that you published

112
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Twitter Analytics Metrics:


• Impressions – the number of times a user is served a Tweet in timeline or
search results
• Engagements – the total number of times a user has interacted with a
tweet, includes clicks, retweets, replies, follows, likes, links, Twitter cards,
hashtags, embedded media, Twitter username, profile photo or expanding
the tweet
• Engagement rate – the number of engagements divided by the number of
impressions

113
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• 7 insights you can get from Twitter Analytics:


• Tweet Impressions
• What did you do differently in a month with higher impressions?
• Did you Tweet more frequently?
• cumulative overview to compare monthly activity
• Tweet engagements and engagement rate
• If your tweets are getting more or little tweets
• Engagement Rate- Engagements/Impressions
• Top Tweets
• aggregate the learnings and see what they have in common.
• Are they all adopting the same brand voice?
• Do they all have an emoji in them?

114
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• 7 insights you can get from Twitter Analytics:


• Follower Growth
• Did your followers are increasing or decreasing
• How many new followers you received
• On what day you had many followers
• Profile visits
• No of visit to your profile
• Graph includes comparison w.r.t. 28days
• Mentions
• View @mentions
• Video content performance
• How people are responding to your videos.

115
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Few Twitter tools listed down here as:


• TweepsMap
• Audiense
• Keyhole
• Twitter Counter
• Twenty Feet

116
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• TweepsMap
• It is a Twitter tool for analyzing and visualizing your Twitter network
• Useful in showing how your followers are distributed on a map, in terms
of percentages.

117
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• TweepsMap

118
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• This data can be viewed at state level or even city level

119
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Audiense has some of the features as:


• Explore Your Twitter Community
• Determine the Best Time to Tweet
• Identify Influencers
• Target Your Audience with Precision
• Discover and Easily Follow Targeted Twitter Users

120
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Audiense:

121
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Audiense:

122
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Keyhole
• Keyhole is a Twitter Analytics tool that enables you to tap into
Instagram data as well.
• With Keyhole you can track hashtags, influencers, high
impact data, and more.

123
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

124
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Twitonomy
• It is a powerful Twitter analytics platform
• This free service is actually very robust

125
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Twitonomy

126
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Twitter Counter
• It is a way to visualize and track the growth of your own
followers, and even compare your growth to the growth of
other users

127
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

Graph showing follower


growth of the user during the
last month.

128
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics
Profile on Twitonomy->

129
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

• Twenty Feet
• Twenty Feet is a powerful analytics platform that tracks and
graphs stats like Twitter mentions, followers, retweets, and
more.
• Twenty feet also integrates with other services like Facebook,
bitly, Google Analytics, YouTube, and more.

130
UNIT IV- Big Data Visualization Techniques
Case study: Twitter Analytics

131
UNIT IV- Big Data Visualization Techniques
References
URLs:
1. https://www.scnsoft.com/blog/big-data-visualization-techniques
2. https://www.klipfolio.com/resources/articles/what-is-data-visualization
3. https://www.import.io/post/9-ways-make-big-data-visual/
4. https://chezvoila.com/blog/parallel/
5. https://datavizcatalogue.com/methods/parallel_coordinates.html
6. https://www.data-to-viz.com/graph/streamgraph.html
7. https://marketlytics.com/blog/google-analytics-data-visualizations/
8. https://twittertoolsbook.com/10-awesome-twitter-analytics-visualization-tools/
9. https://twittertoolsbook.com/10-awesome-twitter-analytics-visualization-tools/
10. https://mumbaiunivercity.academia.edu/MCTA
11. https://business.twitter.com/en/blog/7-useful-insights-Twitter-analytics.html

• Technical Papers:
1. Analytical Review of Data Visualization Methods in Application to Big Data, Hindawi Publishing Corporation, Journal
of Electrical and Computer Engineering, Volume 2013, Article ID 969458, http://dx.doi.org/10.1155/2013/969458
2. Big Data and Visualization: Methods, Challenges and Technology Progress, Digital Technologies, 2015, Vol. 1, No. 1,
33-38, DOI:10.12691/dt-1-1-7
3. Google Analytics - Case study by Suraj Chande

132
132

You might also like