Skip to content

Revise scatter_masked.py #24409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 62 additions & 15 deletions examples/lines_bars_and_markers/scatter_masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,77 @@
Scatter Masked
==============

Mask some data points and add a line demarking
masked regions.
A NumPy masked array (see `numpy.ma`) can be passed to `.Axes.scatter` or
`.pyplot.scatter` as the value of the *s* parameter in order to exclude certain
data points from the plot.

This example uses this technique so that data points within a particular radius
are not included in the plot.

"""
import matplotlib.pyplot as plt
import numpy as np

# Fixing random state for reproducibility
# Fix random state for reproducibility
np.random.seed(19680801)


# Create N random (x, y) points
N = 100
r0 = 0.6
x = 0.9 * np.random.rand(N)
y = 0.9 * np.random.rand(N)
area = (20 * np.random.rand(N))**2 # 0 to 10 point radii
c = np.sqrt(area)
r = np.sqrt(x ** 2 + y ** 2)
area1 = np.ma.masked_where(r < r0, area)
area2 = np.ma.masked_where(r >= r0, area)
plt.scatter(x, y, s=area1, marker='^', c=c)
plt.scatter(x, y, s=area2, marker='o', c=c)
# Show the boundary between the regions:
theta = np.arange(0, np.pi / 2, 0.01)
plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))

# Create masked size array based on calculation of x and y values
size = np.full(N, 36)
radius = 0.6
masked_size = np.ma.masked_where(radius > np.sqrt(x ** 2 + y ** 2), size)

# Plot data points using masked array
subplot_kw = {
'aspect': 'equal',
'xlim': (0, max(x)),
'ylim': (0, max(y))
}
Comment on lines +31 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know we could do this 😄 and I would maybe not put the comment here 'cause I think it makes it seem like this line has something to do w/ the masking specifically

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice clean approach if you want to create two different plots with the same axes properties, rather than calling methods to set them 🙂

Fair point, what if I moved the comment like this:

subplot_kw = {
    'aspect': 'equal',
    'xlim': (0, max(x)),
    'ylim': (0, max(y))
}
fig, ax = plt.subplots(subplot_kw=subplot_kw)

# Plot data points using masked array
ax.scatter(x, y, s=masked_size, marker='^', c="mediumseagreen")

fig, ax = plt.subplots(subplot_kw=subplot_kw)
ax.scatter(x, y, s=masked_size, marker='^', c="mediumseagreen")

# Show the boundary between the regions
theta = np.arange(-0, np.pi * 2, 0.01)
circle_x = radius * np.cos(theta)
circle_y = radius * np.sin(theta)
ax.plot(circle_x, circle_y, c="black")

plt.show()

###############################################################################
# This technique can also be used to plot a decision boundary, rather than
# masking certain data points so that they don't appear at all. This example
# uses the same data points and boundary as the example above, this time in the
# style of a decision boundary.

Comment on lines +46 to +52
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tend to just be specific about what you are doing, but use general language. If you want the technical term yo think people will search on, by all means include it. Maybe something like:

Suggested change
###############################################################################
# This technique can also be used to plot a decision boundary, rather than
# masking certain data points so that they don't appear at all. This example
# uses the same data points and boundary as the example above, this time in the
# style of a decision boundary.
###############################################################################
# This technique can be used to plot two groups of data separated by a boundary (sometimes
# termed a "decision boundary"). This example uses two sets of masks to color the data on either
# side of the boundary differently, and adds a shape to define the boundary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok gotcha, you're saying to use more general language to describe what the plotting code is doing. Not to explain the ML concept in more general language. That makes sense!

# Create a masked array for values within the radius
masked_size_2 = np.ma.masked_where(radius <= np.sqrt(x ** 2 + y ** 2), size)

# Plot solution regions
fig, ax = plt.subplots(subplot_kw=subplot_kw)
ax.patch.set_facecolor('#D8EFE2') # equivalent of 'mediumseagreen', alpha=0.2
ax.fill(circle_x, circle_y, color='#FFF7CC') # equivalent of 'gold', alpha=0.2

# Plot data points using two different masked arrays
ax.scatter(x, y, s=masked_size, marker='^', c='mediumseagreen')
ax.scatter(x, y, s=masked_size_2, marker='o', c='gold')

# Plot boundary
ax.plot(circle_x, circle_y, c='black')

plt.show()

###############################################################################
#
# .. admonition:: References
#
# The use of the following functions, methods, classes and modules is shown
# in this example:
#
# - `matplotlib.axes.Axes.scatter` / `matplotlib.pyplot.scatter`
# - `matplotlib.axes.Axes.plot` / `matplotlib.pyplot.plot`
# - `matplotlib.axes.Axes.fill` / `matplotlib.pyplot.fill`