Description
Problem
Plotting a 2-dim histogram using the hist2d
function can leave a lot of excess space in the plot when using the argument cmin
to exclude bins with insufficient numbers of data-points. (There's presumably a similar problem when using the argument cmax
).
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# Generate random x/y values.
rng = np.random.default_rng(1234)
size = 100000
x = rng.normal(scale=10, size=size)
y = rng.normal(size=size)
# This plot has properly adjusted axis-limits.
plt.hist2d(x, y, bins=100, cmin=1);
# This plot has a lot of excess space.
plt.hist2d(x, y, bins=100, cmin=20);
Proposed solution
The following is a quick solution that seems to work. But it may be possible to make a better solution when integrating it into the hist2d
function. It should be made as an optional argument such as hist2d(adapt_lim=True, ...)
to adapt both x and y-axis, or specify a given axis such as hist2d(adapt_lim='x', ...)
I don't really have time to add this to Matplotlib's code-base myself, and make sure everything is made according to your standards etc. So hopefully someone else can do it, if you like the feature. Thanks!
# Plot 2-dim histogram and get bins and edges.
h, xedges, yedges, _ = plt.hist2d(x, y, bins=100, cmin=20);
# Boolean mask whether a bin is actually used in the 2-dim grid.
mask = ~np.isnan(h)
# Flattened boolean masks whether a bin is actually used for each axis.
mask_flat_x = np.any(mask, axis=1)
mask_flat_y = np.any(mask, axis=0)
# Get x-axis min/max for bins that are actually used.
x_min = xedges[:-1][mask_flat_x].min()
x_max = xedges[1:][mask_flat_x].max()
# Get y-axis min/max for bins that are actually used.
y_min = yedges[:-1][mask_flat_y].min()
y_max = yedges[1:][mask_flat_y].max()
# Adjust x-axis limits to have a little padding.
x_pad = 0.01 * (x_max - x_min)
plt.xlim(x_min - x_pad, x_max + x_pad)
# Adjust y-axis limits to have a little padding.
y_pad = 0.01 * (y_max - y_min)
plt.ylim(y_min - y_pad, y_max + y_pad)
plt.show();