Data Visualization With Matplotlib

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

UNIT – 4

Data Visualization with Matplotlib


Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way to see
and understand trends, outliers, and patterns in data.

Through data visualization, complex data can be communicated simply, helping people
understand and analyze information quickly.

Importance of Data Visualization

1. Enhanced Understanding: Visuals make it easier to understand data, especially large


and complex datasets, by converting numbers and statistics into visual formats that
highlight trends, patterns, and insights.
2. Better Decision-Making: Visualized data is often used in business, science, and
technology to make informed decisions based on clear, visual insights, facilitating
quicker response times and data-driven strategies.
3. Improved Data Accessibility: Data visualization makes data accessible to a broader
audience, allowing non-technical individuals to grasp complex data without requiring
specialized analytical skills.
4. Identifying Patterns and Trends: Through line charts, histograms, scatter plots, and
more, it becomes easier to see trends over time, identify seasonal patterns, or spot
correlations between variables.
5. Highlighting Outliers and Anomalies: Visualizations make it easy to identify anomalies
that may indicate errors, areas of interest, or opportunities for further analysis.

Types of Data Visualization

Data visualization can be broadly divided into four major categories based on the purpose and
type of insights they provide. These categories are:

1. Comparison Visualizations

These visualizations are used to compare values across different categories or groups,
highlighting the similarities, differences, and relative rankings of data points.

▪ Bar Charts and Column Charts: Display categorical data with bars of varying lengths.
▪ Line Charts: Used primarily for time series data, showing trends over time.
▪ Dot Plots: Show individual data points, good for smaller data sets and comparisons.
2. Composition Visualizations

Composition visualizations help illustrate the parts that make up a whole. They show how
individual data points or categories contribute to the total, often in percentage terms.

▪ Pie Charts: Display proportional data, commonly in percentage form.


▪ Stacked Bar/Column Charts: Show how sub-categories contribute to the whole, often
over different categories or time.
▪ Area Charts: Used to show cumulative values over time or categories.
▪ Tree Maps: Use nested rectangles to show hierarchical data in a proportionate manner.

3. Distribution Visualizations

Distribution visualizations reveal the spread of data points across a range, allowing you to see
patterns, concentrations, and outliers within a data set.

▪ Histograms: Display frequency distribution of data, ideal for continuous data.


▪ Box Plots: Show data spread with quartiles, medians, and outliers, making it easy to see
variability and anomalies.
▪ Density Plots: A smooth, continuous representation of data distribution, especially useful
for probability distribution analysis.
▪ Violin Plots: Combine density and box plots for more detailed distribution insights, often
for multiple categories.

4. Relationship Visualizations

Relationship visualizations examine the connections or correlations between two or more


variables, helping to identify trends, correlations, or clusters within data.

▪ Scatter Plots: Show relationships between two variables, useful for identifying
correlations and outliers.
▪ Bubble Charts: A variation of scatter plots where bubble size represents an additional
variable.
▪ Heat Maps: Use color to represent values in a matrix or grid layout, often used in
correlation matrices.
▪ Network Graphs: Display connections between nodes in a network, good for visualizing
relationships in social or communication networks.
Matplotlib

Matplotlib is a powerful and versatile Python library for creating static, animated, and interactive
visualizations.

It's a fundamental tool in data analysis and scientific research, allowing you to transform data
into insights by generating plots, charts, and other visual representations.

Matplotlib's flexibility and extensive customization options make it one of the most widely used
plotting libraries in Python.

Key Features of Matplotlib

1. Wide Variety of Plot Types: Matplotlib offers a range of plotting options, including line
charts, bar charts, histograms, scatter plots, pie charts, and more. This variety enables
users to represent data in multiple ways suited to different types of analysis.
2. Highly Customizable: Each aspect of a plot can be customized, from colors, line styles,
and markers to titles, labels, and legends. You can adjust every part of a figure to make it
more informative and visually appealing.
3. Integration with Other Libraries: Matplotlib works seamlessly with other Python
libraries like NumPy, Pandas, and SciPy, which allows for efficient handling and
visualization of large datasets.
4. Subplotting and Multi-Axis Plotting: It supports the creation of subplots (multiple plots
in a single figure), enabling users to compare multiple data visualizations in one view.
5. Interactive Capabilities: With matplotlib.pyplot, it can be used interactively in
Jupyter Notebooks or within Python scripts, and it supports user interaction for zooming,
panning, and real-time updates to plots.
6. High-Quality Output: Matplotlib can produce publication-quality visualizations and
export plots in various formats, including PNG, PDF, and SVG.

Basic Components of Matplotlib

• Figure: The entire canvas or window where one or more plots are drawn.
• Axes: The actual plotting area within a figure, where data is displayed with x and y-axis.
You can have multiple axes in one figure.
• Pyplot Interface: matplotlib.pyplot is a module within Matplotlib that provides
functions for creating and customizing plots. It's often imported as plt.
Getting Started with Matplotlib

To get started with Matplotlib, you can install it via pip if it’s not already installed:

! pip install matplotlib

Basic Example of a Line Plot

Here’s a simple example that uses Matplotlib to create a line chart:

import matplotlib.pyplot as plt

# Sample data

x = [1, 2, 3, 4, 5] # X-axis values

y = [1, 4, 9, 16, 25] # Y-axis values (squares of x values)

# Create the plot

plt.plot(x, y)

# Display the plot


plt.show()

• plt.plot(x, y) plots a line connecting each (x, y) pair.


• plt.show() displays the plot.

A Simple Interective Chart


Creating a simple interactive chart in Matplotlib allows users to interact with the plot, such as
zooming in, panning, and seeing additional data information. This is often used in exploratory
data analysis to better understand data patterns and trends.

Benefits of Interactive Charts

• Exploratory Analysis: Interactive charts help users investigate data in real time.
• Flexibility: Users can adjust parameters and immediately see changes in data
representation.
• Enhanced Insights: Allows deeper analysis by focusing on specific data points, ranges,
or patterns dynamically.

Set the Properties of the Plot


In Matplotlib, you can set and customize various properties of the plot, including the title, labels,
axis properties, line styles, colors, and more. These customizations make your plots more
informative and visually appealing.

1. Set Title and Axis Labels


You can add a title to your plot and labels for both the x-axis and y-axis using plt.title(),
plt.xlabel(), and plt.ylabel().

2. Set Gridlines

You can display gridlines on the plot with plt.grid(), which makes it easier to read values
from the plot.

3. Customize Line Style, Marker, and Color

You can customize the appearance of the lines (like style, width, and color) and markers using
the plot() function's arguments.

4. Set the Legend

Use plt.legend() to add a legend to the plot to describe different data series or lines.

5. Adjust Axis Limits

Use plt.xlim() and plt.ylim() to manually set the range of the x and y axes.

6. Font Properties

You can modify the font size, style, and family for titles, labels, and legends.

Example: Customizing the Plot's Properties

import matplotlib.pyplot as plt # Importing the required library

# Data to plot
x = [0, 1, 2, 3, 4] # x-axis values
y = [0, 1, 4, 9, 16] # y-axis values

# Create a plot
plt.plot(x, y) # Plotting x vs y

# Setting the properties of the plot


plt.title("Simple Plot") # Setting the title of the plot
plt.xlabel("X-axis Label") # Label for the x-axis
plt.ylabel("Y-axis Label") # Label for the y-axis
plt.grid(True) # Enabling gridlines for better visualization
plt.xlim(-1, 5) # Setting the x-axis limits (from -1 to 5)
plt.ylim(-1, 20) # Setting the y-axis limits (from -1 to 20)
# Display the plot
plt.show() # Show the plot to the user

import matplotlib.pyplot as plt # Importing the required library

# Data to plot
x = [0, 1, 2, 3, 4] # x-axis values
y = [0, 1, 4, 9, 16] # y-axis values

# Create a plot with customized line and markers


plt.plot(x, y, color='purple', linestyle='--', marker='o',
markerfacecolor='red', markeredgewidth=2, markersize=8) # Custom line
style and markers

# Setting the properties of the plot


plt.title("Customized Plot", fontsize=16, color='green',
fontweight='bold') # Custom title with font size, color, and weight
plt.xlabel("X-axis Label", fontsize=12, color='blue') # Custom label for
x-axis
plt.ylabel("Y-axis Label", fontsize=12, color='blue') # Custom label for
y-axis
plt.grid(True, which='both', color='gray', linestyle='-', linewidth=0.5)
# Customized gridlines for both major and minor ticks
plt.xlim(-1, 5) # Setting the x-axis limits (from -1 to 5)
plt.ylim(-1, 20) # Setting the y-axis limits (from -1 to 20)
# Customize ticks on x and y axes
plt.xticks([0, 1, 2, 3, 4], rotation=45, fontsize=10, color='darkred') #
Custom x-tick labels with rotation and color
plt.yticks(fontsize=10, color='darkred') # Custom y-tick labels

# Display the plot


plt.show() # Show the plot to the user

Matplotlib with NumPy

matplotlib is a powerful Python library for creating static, animated, and interactive
visualizations. It works seamlessly with numpy, a library for numerical operations, enabling
efficient and flexible data handling and visualization.

How matplotlib and numpy Work Together:

• numpy helps in generating, manipulating, and performing operations on data (e.g.,


creating arrays, applying mathematical functions).
• matplotlib is then used to visualize this data. When combined, they provide a smooth
workflow to handle large datasets and plot them effectively.

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100) # Creates 100 points between 0 and 10
y = np.sin(x) # Apply a sine function to generate y values
plt.plot(x, y) # Plot y vs x (sin curve)
plt.xlabel("X-axis") # Label for the x-axis
plt.ylabel("Y-axis") # Label for the y-axis
plt.title("Sine Wave") # Title of the plot
plt.grid(True) # Display gridlines
plt.show() # Display the plot

Benefits of Using numpy with matplotlib:

1. Efficient Data Handling: numpy arrays are faster and more memory-efficient than
Python lists, especially for large datasets.
2. Mathematical Operations: numpy provides a wide range of mathematical functions
(e.g., sine, cosine, logarithms) to transform your data before plotting.
3. Flexibility: You can easily modify data, apply transformations, and visualize the results
using matplotlib.

Working with Multiple Figures and Axes


In matplotlib, a Figure is essentially a container for all the elements of a plot, such as axes, titles,
labels, and legends.
An Axes represents a single plot or graph in the figure, and you can have multiple Axes within a
single Figure to create subplots or different charts.

When working with multiple plots or figures, you may want to create multiple figures and
multiple axes within each figure to visualize different datasets or to compare multiple plots side
by side.

Creating Multiple Figures:

• You can create multiple figures using plt.figure(). Each figure is independent and can
have its own axes and properties.

import matplotlib.pyplot as plt

# Create the first figure


plt.figure(1) # Figure 1
plt.plot([1, 2, 3], [4, 5, 6])
plt.title('First Figure')

# Create the second figure


plt.figure(2) # Figure 2
plt.plot([1, 2, 3], [6, 5, 4])
plt.title('Second Figure')

# Show the figures


plt.show()
Working with Multiple Axes within a Single Figure:

• You can place multiple axes (plots) within a single figure using plt.subplot() or
plt.subplots().

import matplotlib.pyplot as plt

# Create a figure with 2 rows and 2 columns of subplots


plt.subplot(2, 2, 1) # First subplot in a 2x2 grid
plt.plot([1, 2, 3], [4, 5, 6])
plt.title('Plot 1')

plt.subplot(2, 2, 2) # Second subplot in a 2x2 grid


plt.plot([1, 2, 3], [6, 5, 4])
plt.title('Plot 2')

plt.subplot(2, 2, 3) # Third subplot in a 2x2 grid


plt.plot([1, 2, 3], [1, 4, 9])
plt.title('Plot 3')

plt.subplot(2, 2, 4) # Fourth subplot in a 2x2 grid


plt.plot([1, 2, 3], [9, 16, 25])
plt.title('Plot 4')

# Show the plots


plt.tight_layout() # Adjust spacing between subplots
plt.show()
Adding Text
In Matplotlib, you can add text annotations to your plots using the text() method or annotate() for
advanced annotations. Adding text can help highlight specific points or provide extra
information directly on the graph.

Key Parameters:

• x, y: Coordinates where the text is placed.


• s: The text string.
• Additional styling such as fontsize, color, rotation.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Linear Growth')


plt.text(3, 6, "Mid Point", fontsize=12, color='red')
plt.title("Adding Text Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()
Adding a Grid
Grids are added using the grid() method. Grids help improve the readability of plots by aligning
the data points with reference lines.

Key Parameters:

• visible: Whether the grid is visible.


• color: The color of grid lines.
• linestyle: Style of grid lines ('--', ':', etc.).
• linewidth: Width of the grid lines.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Linear Growth')


plt.grid(visible=True, color='gray', linestyle='--', linewidth=0.5)
plt.title("Adding Grid Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()
Adding a Legend
Legends describe the elements of a plot, especially when multiple datasets are visualized. They
are added using the legend() method.

Key Parameters:

• loc: Position of the legend (e.g., 'upper right', 'lower left').


• fontsize: Font size of legend text.
• title: Optional title for the legend box.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]

plt.plot(x, y1, label='Line 1', color='blue')


plt.plot(x, y2, label='Line 2', color='orange')
plt.legend(loc='upper left', fontsize=10, title='Legend')
plt.title("Adding Legend Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
Saving the Charts
The savefig() method is used to save plots as image files (e.g., PNG, JPG, PDF). You can specify
the filename and optional parameters such as resolution (dpi) and background transparency.

Key Parameters:

• fname: Filename or path to save the file.


• dpi: Resolution of the saved image.
• bbox_inches: Can be set to 'tight' to ensure the plot fits well.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Linear Growth')


plt.title("Saving Chart Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.savefig("chart_example.png", dpi=300, bbox_inches='tight')
plt.show()
Line Chart
A line chart is one of the simplest and most commonly used types of data visualization. It
represents data points (markers) connected by straight lines, which help visualize trends,
changes, or comparisons over a continuous variable, often time.

Key Features of Line Charts

1. X-axis and Y-axis:


o X-axis: Represents the independent variable (e.g., time, categories, or labels).
o Y-axis: Represents the dependent variable (e.g., values or measurements).
2. Use Cases:
o Visualizing trends over time (e.g., stock prices, temperature changes).
o Comparing multiple datasets on the same chart.
3. Customization Options:
o Line color, style, and width.
o Adding markers to emphasize data points.
o Legends to differentiate multiple lines.

import matplotlib.pyplot as plt

# Data for the chart


x = [1, 2, 3, 4, 5] # X-axis values
y = [10, 20, 15, 25, 30] # Y-axis values

# Plot the line chart


plt.plot(x, y, color='blue', marker='o', linestyle='-', linewidth=2,
label='Sales')

# Add title and labels


plt.title("Monthly Sales Over 5 Months", fontsize=14)
plt.xlabel("Month (1-5)", fontsize=12)
plt.ylabel("Sales (in $1000s)", fontsize=12)

# Add a legend
plt.legend()

# Show the chart


plt.show()
Bar Chart
A bar chart is a graphical representation of data where rectangular bars are used to display
values.

The length (or height) of each bar corresponds to the value it represents.

Bar charts are commonly used for comparing different categories or showing the distribution of
categorical data.

Key Features of Bar Charts

1. Orientation:
o Vertical Bar Chart: Bars are vertical, with the X-axis showing categories and the
Y-axis showing values.
o Horizontal Bar Chart: Bars are horizontal, with the X-axis showing values and
the Y-axis showing categories.
2. Use Cases:
o Comparing sales across products.
o Displaying population sizes for different regions.
o Visualizing categorical data (e.g., survey results).
3. Customization Options:
o Bar color, width, and edge styling.
o Adding labels to bars for clarity.
o Grouped or stacked bars for multi-category comparison.

import matplotlib.pyplot as plt


# Data for the chart
categories = ['A', 'B', 'C', 'D', 'E'] # X-axis categories
values = [15, 30, 45, 10, 25] # Y-axis values

# Plot the bar chart


plt.bar(categories, values, color='skyblue', edgecolor='black', width=0.6)

# Add title and labels


plt.title("Category-wise Values", fontsize=14)
plt.xlabel("Categories", fontsize=12)
plt.ylabel("Values", fontsize=12)

# Show the chart


plt.show()

Pie Chart
A pie chart is a circular graph divided into slices, where each slice represents a proportion of the
whole. It is commonly used to display part-to-whole relationships, making it easy to visualize
percentages or relative sizes.

Key Features of Pie Charts

1. Proportions:
o Each slice of the pie corresponds to a percentage or fraction of the total.
2. Customization Options:
o Slice colors, labels, and percentages.
o Exploding slices to emphasize certain segments.
o Adding a shadow for a 3D effect.
3. Use Cases:
o Market share analysis.
o Budget distribution.
o Survey results (e.g., favorite activities, product preferences).

import matplotlib.pyplot as plt

# Data for the chart


labels = ['A', 'B', 'C', 'D'] # Categories
values = [25, 35, 20, 20] # Values corresponding to categories

# Plot the pie chart


plt.pie(values, labels=labels, autopct='%1.1f%%', startangle=90,
colors=['gold', 'lightblue', 'lightgreen', 'pink'])

# Add a title
plt.title("Basic Pie Chart Example", fontsize=14)

# Show the chart


plt.show()
Histogram
A histogram is a type of bar chart used to visualize the distribution of numerical data by
dividing the data into intervals, called bins.

It represents the frequency of data points within each bin, making it a powerful tool for
understanding the shape and spread of a dataset.

Key Features of Histograms

1. Bins:
o The X-axis is divided into intervals (bins), each representing a range of data.
o The Y-axis shows the frequency or count of data points in each bin.
2. Continuous Data:
o Histograms are ideal for continuous numerical data (e.g., height, weight,
temperature).
3. Shape Analysis:
o Used to identify patterns such as skewness, modality (uni/bi/multi), and outliers.

import matplotlib.pyplot as plt

# Data for the histogram


data = [15, 18, 20, 25, 25, 30, 35, 40, 40, 45, 50, 55, 55, 60]

# Plot the histogram


plt.hist(data, bins=5, color='skyblue', edgecolor='black')

# Add title and labels


plt.title("Simple Histogram Example", fontsize=14)
plt.xlabel("Data Intervals (Bins)", fontsize=12)
plt.ylabel("Frequency", fontsize=12)

# Show the histogram


plt.show()

You might also like