Skip to content

Improve tick subsampling in LogLocator. #29698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 19, 2025
Merged

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Mar 2, 2025

Rewrite the logic for subsampling log-scale ticks, so that 1) there's as many ticks as possible without going over numticks (don't draw a single tick when two would fit), and 2) the ticks are spaced as much as possible (when two ticks are drawn, put them on the lowest and highest decade, not somewhere in the middle). See comments in implementation for details. For the backcompat-oriented "classic-mode", remove the "stride >= numdec" check (which is a much later addition).

Test changes:

  • TestLogLocator changes because tick_values() no longer returns major ticks outside of the (vmin, vmax) range.
  • test_small_range_loglocator was likely previously wrong (the assert did not cover the values of ticks, and all iterations of the loop ended up testing the same condition). Rewrite it fully (in particular to cover the new implementation).

Closes #29694.

Edit: there's still some more test failures that need to be investigated (I suspect most of them are indeed changes "for the better", but I'll go through them one at a time).

PR summary

PR checklist

def tick_values(self, vmin, vmax):
if self.numticks == 'auto':
if self.axis is not None:
numticks = np.clip(self.axis.get_tick_space(), 2, 9)
nt = np.clip(self.axis.get_tick_space(), 2, 9)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the long names numticks and numdecs? The abbreviations are IMHO harder to read and rember.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the names numticks and numdec are actually not that nice either, and I suspect they were actually a bit a source of confusion. For example, I think it is easier to reason in terms of "the number of decade ticks without striding" (i.e. emax - emin + 1: there's two ticks for an axis from 1 to 10), but numdecs looks more like "the number of decades" to me (i.e. emax - emin: there's one decade between 1 at 10). Likewise there's different "number of ticks" lying around (desired number of ticks, total number of decade ticks, maximum actually realizable number of ticks).
Overall I found it more confusing to write the logic using names that "suggest" a wobbly semantic and easier to do so using more symbolic names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the actually relevant quantity are intervals (i.e. the numbers -1), because that's what you try to match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have 9 decades / 8 intervals (nd=9) and a stride of s=3, then no matter the offset you always end up with 3 subsampled ticks. If you have 10 decades then you can have either 3 or 4 subsampled ticks. In fact the number of subsampled ticks is either floor(nd/s) or ceil(nd/s) (which are equal if s divides nd), which makes me think that nd and s are indeed the "nicer" variables to work with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to underestand the logic below, I kept having to flick back and forth to where nd and nt were defined to remind myself what they were. So I'm very pro giving them more descriptive names. Perhaps nt could become nticks_requested, and nd become nd become nticks_max?

@anntzer anntzer marked this pull request as draft March 3, 2025 12:21
@rcomer
Copy link
Member

rcomer commented Mar 3, 2025

This also fixes #25896

image

@anntzer
Copy link
Contributor Author

anntzer commented Mar 4, 2025

Looking at this again, I suspect that a possible improvement to the algorithm would be to put ticks on multiples of the stride, if it is possible to do so without decreasing the number of ticks: for example, if the choice is between ticks at 10^-10, 10^-7, 10^-4, 10^-1 or 10^-9, 10^-6, 10^-3, 10^0, then the latter probably looks better (on average...)

@timhoffm
Copy link
Member

timhoffm commented Mar 4, 2025

Agreed, multiples of strides is a good secondary criterion, because the exponents are part of a non-offsetted sequence (4, 8, 12) feels more natural than 5, 9, 13.

@anntzer
Copy link
Contributor Author

anntzer commented Mar 6, 2025

@rcomer: With the last changes, #25896 is again no longer fixed; the underlying issues are actually slightly different. The patch (once this PR goes in) is, first, something like

diff --git i/lib/matplotlib/ticker.py w/lib/matplotlib/ticker.py
index 5a229fd8e7..1420e4536a 100644
--- i/lib/matplotlib/ticker.py
+++ w/lib/matplotlib/ticker.py
@@ -2519,10 +2519,12 @@ class LogLocator(Locator):
 
         if (len(subs) > 1
                 and stride == 1
-                and ((vmin <= ticklocs) & (ticklocs <= vmax)).sum() <= 1):
+                and (len(decades) - 2  # major
+                     + ((vmin <= ticklocs) & (ticklocs <= vmax)).sum())  # minor
+                     <= 1):
             # If we're a minor locator *that expects at least two ticks per
             # decade* and the major locator stride is 1 and there's no more
-            # than one minor tick, switch to AutoLocator.
+            # than one major or minor tick, switch to AutoLocator.
             return AutoLocator().tick_values(vmin, vmax)
         else:
             return self.raise_if_exceeds(ticklocs)

(this PR is needed because it makes it consistent to always have exactly one extra major tick on both sides (hence the -2 in the tick count); note that I guess there could still be inconsistencies if the major and minor locators have been configured to have different numticks but I don't think I can do anything about that...)

Then there is also something wrong with coeff = round(b ** (fx - exponent)) in LogFormatter.__call__, as that misinterprets "12" (coming from AutoLocator) as "10" (due to the rounding of the mantissa); this also needs to be fixed (although the first fix above will simply eliminate the tick at 12 so the second issue will become hidden).

Overall I'd rather keep that issue separate for clarity.

@timhoffm
Copy link
Member

timhoffm commented Mar 7, 2025

@anntzer this looks good to me. Since it's still in draft mode, are you still planning changes?

@anntzer
Copy link
Contributor Author

anntzer commented Mar 7, 2025

There's just a few remaining test failures (due to the actual changes in the algorithm) for which I still need to decide on the best path forward.

@anntzer anntzer marked this pull request as ready for review March 10, 2025 10:39
@anntzer
Copy link
Contributor Author

anntzer commented Mar 10, 2025

@timhoffm This is ready for review now. See commit message (8a80e1b) for discussion about changes to test suite. The remaining test failures are spurious.

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side-note: Not strictly related to the changes here, but can we run into issues with the assumption that we always have one out-of-limit tick on each side? Say we have a setup such that LogLocator(numticks=2).tick_values(1e300, 1e305). I would then need an extra tick at 1e315, which is outside the float64 range.

@anntzer
Copy link
Contributor Author

anntzer commented Mar 10, 2025

I don't like the "out-ticking" behavior and would like to get rid of it, but (as detailed in the commit message) contour() currently relies on it. It can indeed lead to overflow, but this was already present before this PR, e.g. with LogLocator(numticks=2).tick_values(1e300, 1e308).

Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement. It took me quite a while to understand all the logic though, which doesn't bode well for maintainability. I think better variable names would help a lot - here's what I ended up with when I edited the code locally so I could understand it.

I also think it would be nice to extract the new logic into a function, 1) for readability and 2) because I can imagine this being used in the future in other non-log contexts where one wants nice ticks at only integer locations. But that could always be done after this PR.

diff --git a/lib/matplotlib/ticker.py b/lib/matplotlib/ticker.py
index 5a229fd8e7..f732d07e09 100644
--- a/lib/matplotlib/ticker.py
+++ b/lib/matplotlib/ticker.py
@@ -2414,11 +2414,11 @@ class LogLocator(Locator):
     def tick_values(self, vmin, vmax):
         if self.numticks == 'auto':
             if self.axis is not None:
-                nt = np.clip(self.axis.get_tick_space(), 2, 9)
+                nticks_request = np.clip(self.axis.get_tick_space(), 2, 9)
             else:
-                nt = 9
+                nticks_request = 9
         else:
-            nt = self.numticks
+            nticks_request = self.numticks
 
         b = self._base
         if vmin <= 0.0:
@@ -2436,10 +2436,10 @@ class LogLocator(Locator):
         efmin, efmax = self._log_b([vmin, vmax])
         emin = math.ceil(efmin)
         emax = math.floor(efmax)
-        nd = emax - emin + 1  # Total number of decade ticks available.
+        nticks_avail = emax - emin + 1  # Total number of decade ticks available.
 
         if isinstance(self._subs, str):
-            if nd >= 10 or b < 3:
+            if nticks_avail >= 10 or b < 3:
                 if self._subs == 'auto':
                     return np.array([])  # no minor or major ticks
                 else:
@@ -2453,49 +2453,55 @@ class LogLocator(Locator):
         # Get decades between major ticks.  Include an extra tick outside the
         # lower and the upper limit: QuadContourSet._autolev relies on this.
         if mpl.rcParams["_internal.classic_mode"]:  # keep historic formulas
-            stride = max(math.ceil((nd - 1) / (nt - 1)), 1)
+            stride = max(math.ceil((nticks_avail - 1) / (nticks_request - 1)), 1)
             decades = np.arange(emin - stride, emax + stride + 1, stride)
         else:
             # Find the largest number of ticks, no more than the requested
-            # number, that can actually be drawn (e.g., with 9 decades ticks,
-            # no stride yields 7 ticks).  For a given value of the stride *s*,
-            # there are either floor(nd/s) or ceil(nd/s) ticks depending on the
-            # offset.  Pick the smallest stride such that floor(nd/s) < nt,
-            # i.e. nd/s < nt+1, then re-set nt to ceil(...) if acceptable, else
-            # to floor(...) (which must then equal the original nt, i.e. nt is
-            # kept unchanged).
-            stride = nd // (nt + 1) + 1
-            nt1 = math.ceil(nd / stride)
-            if nt1 <= nt:
-                nt = nt1
+            # number, that can actually be drawn. For a given value of the stride *s*,
+            # there are either floor(nticks_avail/s) or ceil(nticks_avail/s) ticks
+            # within the limits, depending on the offset.
+            #
+            # Pick the smallest stride such that the total ticks [floor(nticks_avail/s)]
+            # is less than the requested ticks [nticks_request],
+            # i.e. nticks_avail/s < nticks_request+1, then re-set nticks_avail
+            # to ceil(...) if acceptable, else to floor(...) (which must then equal
+            # to the original nticks_request, i.e. nticks_request is kept unchanged).
+            stride = 1 + nticks_avail // (nticks_request + 1)
+            nt1 = math.ceil(nticks_avail / stride)
+            if nt1 <= nticks_request:
+                nticks_request = nt1
             else:
-                assert nt1 == nt + 1
-            if nt == 0:
+                assert nticks_request == nt1 - 1
+
+            if nticks_request == 0:
+                # Return two ticks outside limits
                 decades = [emin - 1, emax + 1]
                 stride = decades[1] - decades[0]
-            elif nt == 1:  # Draw single tick close to center.
+            elif nticks_request == 1:
+                # Draw a single tick close to center.
                 mid = round((efmin + efmax) / 2)
                 stride = max(mid - (emin - 1), (emax + 1) - mid)
                 decades = [mid - stride, mid, mid + stride]
             else:
-                # Pick the largest s that yields this actual nt (e.g., with 15
+                # Calculate the stride
+                #
+                # Pick the largest s that yields this reqested nticks (e.g., with 15
                 # decades, strides of 5, 6, or 7 *can* yield 3 ticks; picking a
-                # larger stride minimizes unticked space at the ends).  First
-                # try for
-                #     ceil(nd/s) == nt, i.e. nd/nt <= s < nd/(nt-1),
-                # else fallback to
-                #     floor(nd/s) == nt, i.e. nd/(nt+1) < s <= nd/nt.
+                # larger stride minimizes unticked space at the ends).
+                #
                 # One of these cases must have an integer solution (given the
                 # choice of nt above).
-                stride = (nd - 1) // (nt - 1)
-                if stride < nd / nt:  # fallback to second case
-                    stride = nd // nt
-                # For a given s, once including an offset off (0 <= off < s),
-                # the actual number of ticks is ceil((nd - off) / s), which
-                # must be equal to nt.  This leads to olo <= off < ohi, with
-                # the values defined below.
-                olo = max(nd - stride * nt, 0)
-                ohi = min(nd - stride * (nt - 1), stride)
+                stride = (nticks_avail - 1) // (nticks_request - 1)
+                if stride < nticks_avail / nticks_request:  # fallback to second case
+                    stride = nticks_avail // nticks_request
+                # Calculate a good offset
+                #
+                # For a given stride, once including an offset off (0 <= off < stride),
+                # the actual number of ticks is ceil((nticks_avail - off) / s), which
+                # must be equal to nticks_request. This leads to olo <= off < ohi,
+                # with the values defined below.
+                olo = max(nticks_avail - stride * nticks_request, 0)
+                ohi = min(nticks_avail - stride * (nticks_request - 1), stride)
                 # Try to see if we can pick an offset so that ticks are at
                 # integer multiples of the stride while satisfying the bounds
                 # above; if not, fallback to the smallest acceptable offset.
@@ -2508,7 +2514,7 @@ class LogLocator(Locator):
         # anything other than 1.
         is_minor = len(subs) > 1 or (len(subs) == 1 and subs[0] != 1.0)
         if is_minor:
-            if stride == 1 or nd <= 1:
+            if stride == 1 or nticks_avail <= 1:
                 # Minor ticks start in the decade preceding the first major tick.
                 ticklocs = np.concatenate([
                     subs * b**decade for decade in range(emin - 1, emax + 1)])



@mpl.style.context('default')
def test_loglocator_properties():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super obvious to me reading the code what this is testing - would be good to get a comment explaining at the top here.

Comment on lines 1950 to 1960
if not (decades % stride == 0).all():
for offset in range(0, stride):
alt_decades = range(lo + offset, hi + 1, stride)
assert len(alt_decades) < len(decades) or len(alt_decades) > numticks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I worked out what everything above was testing, but I'm stumped by this - what behaviour is this testing?

def tick_values(self, vmin, vmax):
if self.numticks == 'auto':
if self.axis is not None:
numticks = np.clip(self.axis.get_tick_space(), 2, 9)
nt = np.clip(self.axis.get_tick_space(), 2, 9)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to underestand the logic below, I kept having to flick back and forth to where nd and nt were defined to remind myself what they were. So I'm very pro giving them more descriptive names. Perhaps nt could become nticks_requested, and nd become nd become nticks_max?

@anntzer
Copy link
Contributor Author

anntzer commented Mar 15, 2025

Renamed to n_request and n_avail (which seems like a decent compromise vs. the longer "nticks_foo"), and extended the comments in the tests.

Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the variable names 👍

@timhoffm
Copy link
Member

This could get a short what's new note. While not being configurable, we may well advertize improvements. We also prevent possible confusion through "matplotlib behaves differently, but I don't see any relevant changes in the release notes".

Rewrite the logic for subsampling log-scale ticks, so that

1) there's as many ticks as possible without going over numticks (don't
   draw a single tick when two would fit);

2) the ticks are spaced as much as possible (when two ticks are drawn,
   put them on the lowest and highest decade, not somewhere midway);

3) if possible, draw ticks on integer multiples of the stride (0, 3, 9
   is better than 1, 4, 7).

See comments in implementation for details.

For the backcompat-oriented "classic-mode", remove the "stride >=
numdec" check (which is a much later addition).

Test changes:

- test_small_range_loglocator was likely previously wrong (the assert
  did not cover the values of ticks, and all iterations of the loop
  ended up testing the same condition).  Rewrite it fully (in particular
  to cover the new implementation).

- changes because tick_values() now returns exactly one major ticks
  outside of the (vmin, vmax) range on both sides: TestLogLocator,
  test_axes::test_auto_numticks_log, test_colorbar::test_colorbar_renorm.

- test_colorbar::test_colorbar_autotickslog's tested tick values didn't
  match what actually got drawn, because rendering the colorbar (here
  simulated via draw_without_rendering) would make the (first) colorbar
  slightly smaller (via _ColorbarAxesLocator) and change the available
  tick space.  Furthermore, the second colorbar was wrongly drawn with
  3 ticks at [-12, 0, +12] when there's only space for 2; this arose
  because np.log(1e±12)/np.log(10) = ±11.999... which led to a wrong
  stride calculation; the new code explicitly uses log10, which is more
  accurate, when possible, and correctly yields ticks at [-12, 12].

- test_contour::test_contourf_log_extension: Log-normed contour
  extension remains a bit iffy, because it is implemented by asking a
  LogLocator for ticks within data range, and then including an extra
  tick below and above (see also the comment regarding _autolev in
  LogLocator.tick_values()).  This is not really optimal, e.g. that
  extra tick can be quite far below or above the actual data limits
  (depending on the stride).  Perhaps a better solution would be to
  just extend the range to the nearest decade, or even just accept that
  colorbar ticks (like axes ticks) don't necessarily reach the very
  end of the data range but can stop slightly below.  In the meantime,
  I chose to simply force the contour levels so that the test results
  matches the old output.
@anntzer
Copy link
Contributor Author

anntzer commented Mar 18, 2025

Sure, added whatsnew.

@timhoffm timhoffm merged commit c887ecb into matplotlib:main Mar 19, 2025
38 of 41 checks passed
@QuLogic QuLogic added this to the v3.11.0 milestone Mar 19, 2025
@anntzer anntzer deleted the logticks branch March 19, 2025 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: LogLocator sometimes draws fewer ticks than it can
5 participants