UNIT V
Data visualization
Basics of data visualization, Key points supported with data, Evolution of a graph, Common
representation methods, How to clean up a graphic, additional considerations.
Data Visualization
Importance of Data Visualization
As data volume and complexity increase, clear and impactful visuals are essential
for:
o Presentations
o Applications
o Business Intelligence (BI)
Goal: Communicate key ideas and rich data simply and effectively
Popular Data Visualization Tools
Open Source Tools:
R (Base, ggplot2, lattice) – Widely used for professional data visualization
GGobi / Rggobi – For visualizing complex data; command-line and R integration
Gnuplot – Command-line tool, popular in scientific computing
Processing – Java-like environment for customizable, high-quality graphics
D3.js – JavaScript library for interactive, web-based visualizations using HTML,
SVG, CSS
Modest Maps / OpenLayers – For creating and embedding interactive maps
Inkscape – Open source tool for enhancing visualizations; similar to Illustrator
Weave – Web-based data visualization platform
Commercial Tools:
Tableau – Easy-to-use BI and visualization tool
Spotfire (TIBCO) – Combines BI and interactive visual analytics
QlikView – User-friendly BI platform with strong visualization capabilities
Adobe Illustrator – Not a data visualization tool by itself, but used to enhance and
polish visualizations created in other tools
Key points supported with data
Importance of Visualization Over Tables
Charts are more effective than tables for emphasizing key insights.
Tables make it harder to quickly digest and interpret data.
To underscore this point, in , Gene Zelazny mentions that to highlight
data, it is best to create a visual representation— such as a chart, graph, or other data
visualization.
On the flip side, using tables can downplay data, drawing less attention to it.
Visual Design Matters
The color scheme, labels, and sequence in a visual:
o Directly influence what viewers perceive as the main message.
Poorly designed visuals can mislead or confuse the audience.
Forty-five years of store opening data
Table with 45 years of store data – cluttered and hard to read
Thirty-five years of store opening data
Table with 35 years of store data - Still challenging to read even after removing 10 years of
data.
Better Visualization:
A map of the U.S. showing store locations:
o More effective for showing growth, spread, and store types.
o Uses color and shading to differentiate between store types.
o Ideal for sponsor audiences – quickly communicates the big picture
Forty-five years of store opening data, shown as map
Evolution of a graph:
Visualization helps to convey data in a clear, intuitive, and compelling way— more
effective than tables and also enables data exploration and interaction, aiding analysts in
better understanding patterns and performance.
Step-by-Step Example: Pricing Model Analysis
1. Initial Distribution View
Frequency distribution of user scores
Raw pricing data shows a right-skewed distribution.
Hard to interpret the spread of user scores (0– 5).
2. Log Transformation
Frequency distribution with log of user score
Applying a log transformation gives a less skewed view.
Frequency distribution of new user scores
Rescaled version, centers scores around median = 2.0.
This creates a new user index for price sensitivity.
3. Stability Over Time
Graph of stability analysis for pricing
Analyzed how pricing scores change over time.
Found that scores are stable, mostly staying between 2 and 3, showing no
significant time-based variation.
4. Customer Loyalty and Pricing
Graph comparing the price in U.S. dollars with a customer loyalty score
Scatter plot shows positive correlation between loyalty score and price.
More loyal customers are less price sensitive and pay higher prices.
5. Rug Plot Enhancement
Graph comparing the price in U.S. dollars with a customer loyalty score (with rug
representation)
Figure adds a rug plot showing customer density by loyalty.
Most customers lie between loyalty scores of 1 to 3, paying consistently high
prices.
6. Proposed Dynamic Pricing Model
New proposed pricing model compared to prices in U.S. dollars with rug
Figure shows a curved pricing model:
→ Higher prices for loyal customers, lower for less loyal ones.
Rationale:
o Loyal customers tolerate higher prices.
o Less loyal customers need price incentives to stay.
Goal: Maximize revenue, reduce churn.
Tailoring Visuals to Audiences
For Technical Audiences:
Evolution of a graph, analyst example with supporting points
Shows detailed pricing distribution with log scale.
Annotations on:
o Model robustness
o Precision
o Production performance
For Sponsor Audiences:
Uses simple bar charts to show pricing by customer loyalty segment.
Focuses on:
o Business impact
o Cost savings
o Actionable insights
Common Representation Methods in Data Visualization
Purpose of Choosing the Right Chart
Different chart types are better suited for different types of data and messages.
Misused visuals can confuse the audience rather than inform them.
Key: Support the message, not distract from it.
Chart Types and Their Best Use Cases
Chart Type Tips&Best Practices
Pie Charts
Best for showing simple part-to-whole relationships.
Use sparingly: max of 2–3 categories.
Intended for high-level sponsor audiences.
Most commonly misused chart type.
Bar Charts
Excellent for comparing categories or quantities.
Vertical bars: great for small labels (e.g., years).
Horizontal bars: better when labels are long.
Line Charts
Ideal for tracking trends over time.
Great for time series data.
Histograms
Show distribution and spread of data.
Helpful for initial exploration in modeling.
Scatterplots
Useful to visualize correlations between two variables.
Helps identify patterns and relationships.
Audience Matters
Tailor visualizations to audience sophistication:
o Sponsors → high-level, simple visuals (e.g., pie, bar).
o Analysts/Data Scientists → detailed, technical visuals (e.g., histograms,
scatterplots).
Choosing the right type of chart is just as important as the data itself—it ensures clarity, supports
your narrative, and improves decision-making.
How to Clean Up a Graphic
Simplify visuals to highlight key insights.
Reduce "chart junk" — unnecessary elements that clutter the message without adding
value.
Common Types of Chart Junk (from Figure 12-28)
How to clean up a graphic, example 1 (before)
1. Horizontal Grid Lines
o Do not provide meaningful context in many cases.
o Can distract from data if overused.
2. Chunky Data Points
o Large shapes (e.g., square blocks) draw unnecessary attention.
o Offer no added value beyond showing the data point.
3. Excessive Emphasis Colors&Borders
o Thick borders and overly bold lines steal focus from the data.
o Trend lines should be easy to read but not overpowering.
4. No Context or Labels
o Missing legends or line labels leave the viewer guessing.
o Lack of a clear title or annotation weakens understanding.
5. Crowded Axis Labels
o Too many labels = visual clutter.
o Labeling every 2 or 5 units is overkill for general trends.
Best Practices to Clean a Graphic
How to clean up a graphic, example 1 (after)
Use of Emphasis&Color
Emphasis colors (e.g., bright green) should highlight key trends.
Use neutral or lighter tones for background data.
o Ex: SuperBox stores in green (highlighted), BigBox in gray (background).
Add Clear Labels&Titles
Add:
o Title that communicates the chart’s message.
o Legend to identify trends/lines.
o Axis labels that are readable and not overdone.
Minimize Visual Noise
Remove:
o Unnecessary borders
o Extra grid lines
o Non-informative markers
Increase White Space
White space gives the graphic room to breathe.
Helps the eye focus on what's important.
Alternate Visualization
How to clean up a graphic, example 1 (alternate “after” view)
Instead of showing two separate lines, graph the difference between SuperBox and
BigBox stores.
Focused, cleaner visual for comparing growth rates.
Choose the version based on the core message being communicated.
Additional Considerations in Data Visualization
Core Principle: Simplicity
Keep charts clean, simple, and free of distractions.
Always aim to support the key message with minimal visual clutter.
Minimize Chart Junk
Avoid unnecessary elements like:
o Heavy borders
o Overuse of color
o Excess labels or gridlines
Data-Ink Ratio
Definition: Proportion of ink used to show actual data vs. decorative or structural
elements.
Formula:
Data-Ink Ratio = (Data Ink) / (Total Ink Used)
Goal: Maximize data-ink to create data-rich, distraction-free visuals.
Use only essential ink that contributes directly to communicating the data.
Avoid 3D Charts
3D effects (depth, angles, shading) often make graphics:
o Harder to read
o Visually misleading
o Inaccurate in scale interpretation
Example Comparison:
Simple bar chart, with two dimensions
Simple 2D vertical bar chart
o Clear, readable, high-contrast bars
o Emphasis color (dark blue for SuperBox stores) aligns with the message
Misleading bar chart, with three dimensions
Same chart, shown in 3D
o Angled perspective distorts scale
o Makes it difficult to judge bar heights accurately
o Focus shifts from data to visual effects