-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Improve tick subsampling in LogLocator. #29698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/matplotlib/ticker.py
Outdated
def tick_values(self, vmin, vmax): | ||
if self.numticks == 'auto': | ||
if self.axis is not None: | ||
numticks = np.clip(self.axis.get_tick_space(), 2, 9) | ||
nt = np.clip(self.axis.get_tick_space(), 2, 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep the long names numticks
and numdecs
? The abbreviations are IMHO harder to read and rember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the names numticks and numdec are actually not that nice either, and I suspect they were actually a bit a source of confusion. For example, I think it is easier to reason in terms of "the number of decade ticks without striding" (i.e. emax - emin + 1: there's two ticks for an axis from 1 to 10), but numdecs looks more like "the number of decades" to me (i.e. emax - emin: there's one decade between 1 at 10). Likewise there's different "number of ticks" lying around (desired number of ticks, total number of decade ticks, maximum actually realizable number of ticks).
Overall I found it more confusing to write the logic using names that "suggest" a wobbly semantic and easier to do so using more symbolic names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the actually relevant quantity are intervals (i.e. the numbers -1), because that's what you try to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have 9 decades / 8 intervals (nd=9) and a stride of s=3, then no matter the offset you always end up with 3 subsampled ticks. If you have 10 decades then you can have either 3 or 4 subsampled ticks. In fact the number of subsampled ticks is either floor(nd/s) or ceil(nd/s) (which are equal if s divides nd), which makes me think that nd and s are indeed the "nicer" variables to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to underestand the logic below, I kept having to flick back and forth to where nd
and nt
were defined to remind myself what they were. So I'm very pro giving them more descriptive names. Perhaps nt
could become nticks_requested
, and nd
become nd
become nticks_max
?
This also fixes #25896 |
Looking at this again, I suspect that a possible improvement to the algorithm would be to put ticks on multiples of the stride, if it is possible to do so without decreasing the number of ticks: for example, if the choice is between ticks at 10^-10, 10^-7, 10^-4, 10^-1 or 10^-9, 10^-6, 10^-3, 10^0, then the latter probably looks better (on average...) |
Agreed, multiples of strides is a good secondary criterion, because the exponents are part of a non-offsetted sequence (4, 8, 12) feels more natural than 5, 9, 13. |
@rcomer: With the last changes, #25896 is again no longer fixed; the underlying issues are actually slightly different. The patch (once this PR goes in) is, first, something like diff --git i/lib/matplotlib/ticker.py w/lib/matplotlib/ticker.py
index 5a229fd8e7..1420e4536a 100644
--- i/lib/matplotlib/ticker.py
+++ w/lib/matplotlib/ticker.py
@@ -2519,10 +2519,12 @@ class LogLocator(Locator):
if (len(subs) > 1
and stride == 1
- and ((vmin <= ticklocs) & (ticklocs <= vmax)).sum() <= 1):
+ and (len(decades) - 2 # major
+ + ((vmin <= ticklocs) & (ticklocs <= vmax)).sum()) # minor
+ <= 1):
# If we're a minor locator *that expects at least two ticks per
# decade* and the major locator stride is 1 and there's no more
- # than one minor tick, switch to AutoLocator.
+ # than one major or minor tick, switch to AutoLocator.
return AutoLocator().tick_values(vmin, vmax)
else:
return self.raise_if_exceeds(ticklocs) (this PR is needed because it makes it consistent to always have exactly one extra major tick on both sides (hence the Then there is also something wrong with Overall I'd rather keep that issue separate for clarity. |
@anntzer this looks good to me. Since it's still in draft mode, are you still planning changes? |
There's just a few remaining test failures (due to the actual changes in the algorithm) for which I still need to decide on the best path forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a side-note: Not strictly related to the changes here, but can we run into issues with the assumption that we always have one out-of-limit tick on each side? Say we have a setup such that LogLocator(numticks=2).tick_values(1e300, 1e305)
. I would then need an extra tick at 1e315, which is outside the float64 range.
I don't like the "out-ticking" behavior and would like to get rid of it, but (as detailed in the commit message) contour() currently relies on it. It can indeed lead to overflow, but this was already present before this PR, e.g. with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice improvement. It took me quite a while to understand all the logic though, which doesn't bode well for maintainability. I think better variable names would help a lot - here's what I ended up with when I edited the code locally so I could understand it.
I also think it would be nice to extract the new logic into a function, 1) for readability and 2) because I can imagine this being used in the future in other non-log contexts where one wants nice ticks at only integer locations. But that could always be done after this PR.
diff --git a/lib/matplotlib/ticker.py b/lib/matplotlib/ticker.py
index 5a229fd8e7..f732d07e09 100644
--- a/lib/matplotlib/ticker.py
+++ b/lib/matplotlib/ticker.py
@@ -2414,11 +2414,11 @@ class LogLocator(Locator):
def tick_values(self, vmin, vmax):
if self.numticks == 'auto':
if self.axis is not None:
- nt = np.clip(self.axis.get_tick_space(), 2, 9)
+ nticks_request = np.clip(self.axis.get_tick_space(), 2, 9)
else:
- nt = 9
+ nticks_request = 9
else:
- nt = self.numticks
+ nticks_request = self.numticks
b = self._base
if vmin <= 0.0:
@@ -2436,10 +2436,10 @@ class LogLocator(Locator):
efmin, efmax = self._log_b([vmin, vmax])
emin = math.ceil(efmin)
emax = math.floor(efmax)
- nd = emax - emin + 1 # Total number of decade ticks available.
+ nticks_avail = emax - emin + 1 # Total number of decade ticks available.
if isinstance(self._subs, str):
- if nd >= 10 or b < 3:
+ if nticks_avail >= 10 or b < 3:
if self._subs == 'auto':
return np.array([]) # no minor or major ticks
else:
@@ -2453,49 +2453,55 @@ class LogLocator(Locator):
# Get decades between major ticks. Include an extra tick outside the
# lower and the upper limit: QuadContourSet._autolev relies on this.
if mpl.rcParams["_internal.classic_mode"]: # keep historic formulas
- stride = max(math.ceil((nd - 1) / (nt - 1)), 1)
+ stride = max(math.ceil((nticks_avail - 1) / (nticks_request - 1)), 1)
decades = np.arange(emin - stride, emax + stride + 1, stride)
else:
# Find the largest number of ticks, no more than the requested
- # number, that can actually be drawn (e.g., with 9 decades ticks,
- # no stride yields 7 ticks). For a given value of the stride *s*,
- # there are either floor(nd/s) or ceil(nd/s) ticks depending on the
- # offset. Pick the smallest stride such that floor(nd/s) < nt,
- # i.e. nd/s < nt+1, then re-set nt to ceil(...) if acceptable, else
- # to floor(...) (which must then equal the original nt, i.e. nt is
- # kept unchanged).
- stride = nd // (nt + 1) + 1
- nt1 = math.ceil(nd / stride)
- if nt1 <= nt:
- nt = nt1
+ # number, that can actually be drawn. For a given value of the stride *s*,
+ # there are either floor(nticks_avail/s) or ceil(nticks_avail/s) ticks
+ # within the limits, depending on the offset.
+ #
+ # Pick the smallest stride such that the total ticks [floor(nticks_avail/s)]
+ # is less than the requested ticks [nticks_request],
+ # i.e. nticks_avail/s < nticks_request+1, then re-set nticks_avail
+ # to ceil(...) if acceptable, else to floor(...) (which must then equal
+ # to the original nticks_request, i.e. nticks_request is kept unchanged).
+ stride = 1 + nticks_avail // (nticks_request + 1)
+ nt1 = math.ceil(nticks_avail / stride)
+ if nt1 <= nticks_request:
+ nticks_request = nt1
else:
- assert nt1 == nt + 1
- if nt == 0:
+ assert nticks_request == nt1 - 1
+
+ if nticks_request == 0:
+ # Return two ticks outside limits
decades = [emin - 1, emax + 1]
stride = decades[1] - decades[0]
- elif nt == 1: # Draw single tick close to center.
+ elif nticks_request == 1:
+ # Draw a single tick close to center.
mid = round((efmin + efmax) / 2)
stride = max(mid - (emin - 1), (emax + 1) - mid)
decades = [mid - stride, mid, mid + stride]
else:
- # Pick the largest s that yields this actual nt (e.g., with 15
+ # Calculate the stride
+ #
+ # Pick the largest s that yields this reqested nticks (e.g., with 15
# decades, strides of 5, 6, or 7 *can* yield 3 ticks; picking a
- # larger stride minimizes unticked space at the ends). First
- # try for
- # ceil(nd/s) == nt, i.e. nd/nt <= s < nd/(nt-1),
- # else fallback to
- # floor(nd/s) == nt, i.e. nd/(nt+1) < s <= nd/nt.
+ # larger stride minimizes unticked space at the ends).
+ #
# One of these cases must have an integer solution (given the
# choice of nt above).
- stride = (nd - 1) // (nt - 1)
- if stride < nd / nt: # fallback to second case
- stride = nd // nt
- # For a given s, once including an offset off (0 <= off < s),
- # the actual number of ticks is ceil((nd - off) / s), which
- # must be equal to nt. This leads to olo <= off < ohi, with
- # the values defined below.
- olo = max(nd - stride * nt, 0)
- ohi = min(nd - stride * (nt - 1), stride)
+ stride = (nticks_avail - 1) // (nticks_request - 1)
+ if stride < nticks_avail / nticks_request: # fallback to second case
+ stride = nticks_avail // nticks_request
+ # Calculate a good offset
+ #
+ # For a given stride, once including an offset off (0 <= off < stride),
+ # the actual number of ticks is ceil((nticks_avail - off) / s), which
+ # must be equal to nticks_request. This leads to olo <= off < ohi,
+ # with the values defined below.
+ olo = max(nticks_avail - stride * nticks_request, 0)
+ ohi = min(nticks_avail - stride * (nticks_request - 1), stride)
# Try to see if we can pick an offset so that ticks are at
# integer multiples of the stride while satisfying the bounds
# above; if not, fallback to the smallest acceptable offset.
@@ -2508,7 +2514,7 @@ class LogLocator(Locator):
# anything other than 1.
is_minor = len(subs) > 1 or (len(subs) == 1 and subs[0] != 1.0)
if is_minor:
- if stride == 1 or nd <= 1:
+ if stride == 1 or nticks_avail <= 1:
# Minor ticks start in the decade preceding the first major tick.
ticklocs = np.concatenate([
subs * b**decade for decade in range(emin - 1, emax + 1)])
|
||
|
||
@mpl.style.context('default') | ||
def test_loglocator_properties(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not super obvious to me reading the code what this is testing - would be good to get a comment explaining at the top here.
lib/matplotlib/tests/test_ticker.py
Outdated
if not (decades % stride == 0).all(): | ||
for offset in range(0, stride): | ||
alt_decades = range(lo + offset, hi + 1, stride) | ||
assert len(alt_decades) < len(decades) or len(alt_decades) > numticks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I worked out what everything above was testing, but I'm stumped by this - what behaviour is this testing?
lib/matplotlib/ticker.py
Outdated
def tick_values(self, vmin, vmax): | ||
if self.numticks == 'auto': | ||
if self.axis is not None: | ||
numticks = np.clip(self.axis.get_tick_space(), 2, 9) | ||
nt = np.clip(self.axis.get_tick_space(), 2, 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to underestand the logic below, I kept having to flick back and forth to where nd
and nt
were defined to remind myself what they were. So I'm very pro giving them more descriptive names. Perhaps nt
could become nticks_requested
, and nd
become nd
become nticks_max
?
Renamed to n_request and n_avail (which seems like a decent compromise vs. the longer "nticks_foo"), and extended the comments in the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the variable names 👍
This could get a short what's new note. While not being configurable, we may well advertize improvements. We also prevent possible confusion through "matplotlib behaves differently, but I don't see any relevant changes in the release notes". |
Rewrite the logic for subsampling log-scale ticks, so that 1) there's as many ticks as possible without going over numticks (don't draw a single tick when two would fit); 2) the ticks are spaced as much as possible (when two ticks are drawn, put them on the lowest and highest decade, not somewhere midway); 3) if possible, draw ticks on integer multiples of the stride (0, 3, 9 is better than 1, 4, 7). See comments in implementation for details. For the backcompat-oriented "classic-mode", remove the "stride >= numdec" check (which is a much later addition). Test changes: - test_small_range_loglocator was likely previously wrong (the assert did not cover the values of ticks, and all iterations of the loop ended up testing the same condition). Rewrite it fully (in particular to cover the new implementation). - changes because tick_values() now returns exactly one major ticks outside of the (vmin, vmax) range on both sides: TestLogLocator, test_axes::test_auto_numticks_log, test_colorbar::test_colorbar_renorm. - test_colorbar::test_colorbar_autotickslog's tested tick values didn't match what actually got drawn, because rendering the colorbar (here simulated via draw_without_rendering) would make the (first) colorbar slightly smaller (via _ColorbarAxesLocator) and change the available tick space. Furthermore, the second colorbar was wrongly drawn with 3 ticks at [-12, 0, +12] when there's only space for 2; this arose because np.log(1e±12)/np.log(10) = ±11.999... which led to a wrong stride calculation; the new code explicitly uses log10, which is more accurate, when possible, and correctly yields ticks at [-12, 12]. - test_contour::test_contourf_log_extension: Log-normed contour extension remains a bit iffy, because it is implemented by asking a LogLocator for ticks within data range, and then including an extra tick below and above (see also the comment regarding _autolev in LogLocator.tick_values()). This is not really optimal, e.g. that extra tick can be quite far below or above the actual data limits (depending on the stride). Perhaps a better solution would be to just extend the range to the nearest decade, or even just accept that colorbar ticks (like axes ticks) don't necessarily reach the very end of the data range but can stop slightly below. In the meantime, I chose to simply force the contour levels so that the test results matches the old output.
Sure, added whatsnew. |
Rewrite the logic for subsampling log-scale ticks, so that 1) there's as many ticks as possible without going over numticks (don't draw a single tick when two would fit), and 2) the ticks are spaced as much as possible (when two ticks are drawn, put them on the lowest and highest decade, not somewhere in the middle). See comments in implementation for details. For the backcompat-oriented "classic-mode", remove the "stride >= numdec" check (which is a much later addition).
Test changes:
Closes #29694.
Edit: there's still some more test failures that need to be investigated (I suspect most of them are indeed changes "for the better", but I'll go through them one at a time).
PR summary
PR checklist