0% found this document useful (0 votes)
1 views27 pages

03 EDA - Matplotlib

The document provides an overview of Matplotlib, a Python library for data visualization, detailing its features, types of plots, and customization options. It includes examples of creating line plots, scatter plots, histograms, and 3D plots, as well as using error bars and annotations. Additionally, it introduces Basemap for geographic data visualization and Seaborn for enhanced statistical graphics.

Uploaded by

freakingepic69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views27 pages

03 EDA - Matplotlib

The document provides an overview of Matplotlib, a Python library for data visualization, detailing its features, types of plots, and customization options. It includes examples of creating line plots, scatter plots, histograms, and 3D plots, as well as using error bars and annotations. Additionally, it introduces Basemap for geographic data visualization and Seaborn for enhanced statistical graphics.

Uploaded by

freakingepic69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

24CAI509 Data Analysis and

Visualization
EDA – matplotlib

2
Introduction to matplotlib
• Matplotlib is a powerful plotting library for Python that allows users to
create static, animated, and interactive visualizations. It is widely used in
data science, machine learning, and scientific computing to visualize data
effectively.
• Key Features
• Supports various types of plots: line plots, bar charts, scatter plots,
histograms, etc.
• Highly customizable with labels, titles, legends, and annotations.
• Works well with NumPy, Pandas, and other scientific computing libraries.
• Provides an object-oriented API for embedding plots into applications.
• Matplotlib provides a module called pyplot that simplifies the plotting
process.
• import matplotlib.pyplot as plt
Simple Line Plot
• A line plot is a type of chart used to visualize the relationship
between two numerical variables. It is commonly used to
show trends over time.
Key Components
• X-axis: Represents the independent variable (e.g., time,
categories).
• Y-axis: Represents the dependent variable (e.g., values,
observations).
• Line: Connects data points to show trends or patterns.
Creating a Simple Line Plot in Python
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4, 5]
y1 = [10, 15, 7, 20, 25]
y2 = [8, 13, 5, 18, 23]
# Creating the line plot
plt.plot(x, y1, marker='o', linestyle='-', color='b', label="Sample Line 1")
plt.plot(x, y2, marker='s', linestyle=':', color='r', label="Sample Line 2")
# Adding labels and title
plt.xlabel("X-axis (Time)")
plt.ylabel("Y-axis (Value)")
plt.title("Simple Line Plot")
# Displaying the legend
plt.legend()

# Show the plot


plt.show()

5
Simple Scatter Plot
• A scatter plot is a type of graph used to visualize the
relationship between two numerical variables. Each point
represents an observation, showing how one variable
changes with another.
Key Components
• X-axis: Represents the independent variable (e.g., hours
studied).
• Y-axis: Represents the dependent variable (e.g., exam score).
• Data Points: Each dot represents one observation.
Creating a Simple Scatter Plot in Python
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [10, 12, 25, 30, 35, 45, 50, 55, 60, 70]
# Creating the scatter plot
plt.scatter(x, y, color='blue', marker='^', label="Data Points",s=50)
# Adding labels and title
plt.xlabel("X-axis (Independent Variable)")
plt.ylabel("Y-axis (Dependent Variable)")
plt.title("Simple Scatter Plot")
# Displaying the legend
plt.legend()
# Show the plot
plt.show()

7
Visualizing errors
• Visualizing errors is crucial in data analysis and machine learning
to understand uncertainty, model accuracy, and variability in
predictions. It helps in detecting patterns of error distribution and
improving models.
Types of Errors
• Residual Errors: Difference between observed and predicted
values.
• Standard Deviation: Measures the spread of data points around the
mean.
• Confidence Intervals: Range in which true values are expected to
fall.
• Error Bars: Used in plots to show variability or uncertainty in data
points.
Error bars in Line Plot
import numpy as np
import matplotlib.pyplot as plt
# Sample data
x = np.arange(1, 6)
y = np.array([10, 20, 15, 25, 30])
error = np.array([2, 3, 1, 4, 2]) # Error values
# Line plot with error bars
plt.errorbar(x, y, yerr=error, fmt='-o', color='b',
capsize=5, label="Data with Error")
# Labels and title
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Line Plot with Error Bars")
plt.legend()
plt.show()

9
Error bars in Scatter Plot
plt.scatter(x, y, color='r', label="Data
Points")
plt.errorbar(x, y, yerr=error, fmt='o',
color='black', capsize=3)

plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Error Bars")
plt.legend()
plt.show()

10
Histogram
• A Histogram is a graphical representation of the distribution
of numerical data. It divides data into intervals (bins) and
counts the number of observations in each bin, helping to
understand the shape, spread, and skewness of the data.
• Key Features
• Visualizes the frequency distribution of a dataset.
• Helps detect skewness, outliers, and central tendency.
• Useful in Exploratory Data Analysis (EDA) to understand data
distribution.
Basic Histogram example
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
data = np.random.randn(1000) # 1000 random
numbers from a normal distribution
# Create histogram
plt.hist(data, bins=30, color='blue',
edgecolor='black', alpha=0.7)
# Labels and title
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram Example")
# Show plot
plt.show()

12
Histogram- Customize options
• Customizing Histograms
• We can modify histograms using parameters such as:
• bins → Number of intervals (default is 10).
• color → Changes the bar color.
• edgecolor → Defines the border color of bars.
• alpha → Controls transparency (1 = opaque, 0 = transparent).
• density=True → Normalizes the histogram (useful for
probability density).
• plt.hist(data, bins=20, color='green', edgecolor='black',
alpha=0.5, density=True)
Subplots
• Subplots in Matplotlib allow you to create multiple plots
within a single figure. This is useful when comparing multiple
visualizations side by side.
• Key Features
• The subplot() function creates multiple plots within a single
figure.
• plt.subplot(nrows, ncols, index)
• nrows: Number of rows of subplots
• ncols: Number of columns of subplots
• index: Position of the subplot (starts from 1)
Basic Subplots example
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
plt.figure(figsize=(10, 5))
# First subplot (Line Plot)
plt.subplot(2, 2, 1)
plt.plot(x, np.sin(x), color='b')
plt.title("Sine Wave")
# Second subplot (Cosine Plot)
plt.subplot(2, 2, 2)
plt.plot(x, np.cos(x), color='r')
plt.title("Cosine Wave")
# Third subplot (Scatter Plot)
plt.subplot(2, 2, 3)
plt.scatter(x, np.sin(x), color='g')
plt.title("Sine Scatter")
# Fourth subplot (Histogram)
plt.subplot(2, 2, 4)
plt.hist(np.random.randn(100), bins=10, color='purple')
plt.title("Random Histogram")
plt.tight_layout() # Adjusts spacing to prevent overlap
plt.show() 15
Text and Annotation in Matplotlib
• Text and annotations allow adding labels, titles, and
descriptions to plots.
• Methods:
• plt.text(x, y, 'text'): Places text at specified coordinates.
• plt.title('Title'): Adds a title to the plot.
• plt.xlabel('X-axis Label'), plt.ylabel('Y-axis Label'): Label the
axes.
• plt.annotate('Text', xy=(x, y), xytext=(x_offset, y_offset),
arrowprops={'arrowstyle': '->'}): Adds an annotation with an
arrow.
Text and Annotation example
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
plt.plot(x, y, marker='o', linestyle='--', color='r')
# Adding title and labels
plt.title("Simple Line Plot with Text and
Annotation")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
# Adding text
plt.text(2.5, 12.5, "Midpoint", fontsize=12,
color='blue')
# Adding an annotation with an arrow
plt.annotate("Highest Value", xy=(5, 25), xytext=(3.5,
20 ),
arrowprops=dict(facecolor=‘black',arrowstyle='->'))
17
Customization in Matplotlib
• Customization helps in improving the visual appeal and
readability of plots.
• Key Customization Options:
• Colors and Styles: Use color, linestyle, marker.
• Figure Size and DPI: plt.figure(figsize=(width, height),
dpi=resolution).
• Legend Customization: plt.legend(loc='best', fontsize=size).
• Grid and Ticks: plt.grid(True), plt.xticks() & plt.yticks().
• Subplots Customization: plt.subplot(rows, cols, index),
plt.subplots_adjust().
• Themes: Use plt.style.use('style_name') for predefined styles.
Customization example
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(8, 5), dpi=100) # Setting figure
size and resolution
plt.plot(x, y, color='green', linestyle='-', linewidth=2,
label="Sine Wave")
plt.title("Customized Sine Wave", fontsize=14,
fontweight='bold')
plt.xlabel("X-axis", fontsize=12)
plt.ylabel("Y-axis", fontsize=12)
plt.legend(loc='upper right', fontsize=10) # Custom
legend
plt.grid(True, linestyle='--', alpha=0.5) # Custom
grid
plt.show()
19
3D Plotting in Matplotlib
• Uses the mpl_toolkits.mplot3d module.
• Common 3D Plots
• Line Plot: ax.plot3D(x, y, z).
• Scatter Plot: ax.scatter3D(x, y, z).
• Surface Plot: ax.plot_surface(X, Y, Z, cmap='viridis’).
• Wireframe Plot: ax.plot_wireframe(X, Y, Z).
• Contour Plot: ax.contour3D(X, Y, Z, levels).
• Steps to Create a 3D Plot
• Import Axes3D from mpl_toolkits.mplot3d.
• Create a figure and add a 3D subplot: fig.add_subplot(111,
projection='3d’).
• Use plotting functions (plot3D, scatter3D, plot_surface, etc.).
• Customize using labels, colors, and views.
3D Plotting example
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d’)
# Generating data
X = np.linspace(-5, 5, 50)
Y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(X, Y)
Z = np.sin(np.sqrt(X**2 + Y**2))
# Creating a surface plot
ax.plot_surface(X, Y, Z, cmap='viridis’)
# Adding labels
ax.set_xlabel('X-axis’)
ax.set_ylabel('Y-axis’)
ax.set_zlabel('Z-axis’)
ax.set_title('3D Surface Plot’)
plt.show()
21
Geographic Data Visualization with Basemap
• Basemap (from mpl_toolkits.basemap) is used to plot maps
and geospatial data.
• Key features
• Projection types ('merc', 'ortho', 'moll', etc.).
• Plotting coastlines, countries, states, and rivers.
• Adding scatter points, contour maps, and labels.
• Common functions
• m = Basemap(projection='merc', llcrnrlat, urcrnrlat, llcrnrlon,
urcrnrlon, resolution='c'): Defines the map.
• m.drawcoastlines(), m.drawcountries(), m.drawstates(): Adds
map details.
• m.scatter(lons, lats, latlon=True, marker='o', color='r'): Plots
points on the map.
Basemap example
!pip install basemap basemap-data
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 7))
# Creating a world map with Mercator projection
m = Basemap(projection='merc', llcrnrlat=-60,
urcrnrlat=80,
llcrnrlon=-180, urcrnrlon=180, resolution='c')

m.drawcoastlines() # Drawing coastlines


m.drawcountries() # Drawing country borders
m.drawstates() # Drawing states
# Plotting a point (Example: New York)
lon, lat = -74.006, 40.7128
x, y = m(lon, lat)
m.scatter(x, y, marker='o', color='red', s=100, label="New
York")

plt.legend()
plt.title("World Map with a Marked Location")
plt.show()
23
Visualization with seaborn
• Seaborn is a high-level statistical data visualization library
built on top of Matplotlib.
• Advantages
• Built-in themes and aesthetics.
• Simplifies the creation of complex visualizations.
• Works seamlessly with Pandas DataFrames.
• Common seaborn plots
• Relational Plots: sns.scatterplot(), sns.lineplot().
• Categorical Plots: sns.barplot(), sns.boxplot(), sns.violinplot().
• Distribution Plots: sns.histplot(), sns.kdeplot(), sns.distplot().
• Heatmaps: sns.heatmap(data, annot=True,
cmap='coolwarm').
Seaborn example
import seaborn as sns
import pandas as pd
import numpy as np

# Generating sample data


data = pd.DataFrame({
"A": np.random.normal(50, 15, 100),
"B": np.random.normal(30, 10, 100),
"C": np.random.normal(70, 20, 100),
"D": np.random.choice(["Group1", "Group2"], 100)
})

# Pairplot for relationships


sns.pairplot(data, hue="D", palette="coolwarm")
plt.show()

# Histogram with Kernel Density Estimation (KDE)


sns.histplot(data["A"], kde=True, color='blue', bins=20)
plt.title("Distribution of A")
plt.show()
25
Thank You

26

You might also like