Skip to content

ENH ticks sublabel display is now an option #7438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Dec 1, 2016
Merged

ENH ticks sublabel display is now an option #7438

merged 15 commits into from
Dec 1, 2016

Conversation

NelleV
Copy link
Member

@NelleV NelleV commented Nov 10, 2016

This is work in progress.

fixes #7202 and supersedes #7211

Here how the defaults behave:

symlog

Using the formatter's directly yields the expected behaviour, even on colorbar:
log_colorbar_default

Here is the code to reproduce those plots:

import numpy as np
import matplotlib.pyplot as plt

X_1 = np.random.randint(12000, 90000, size=(1000,))
X_1.sort()
X_2 = np.random.randint(1e5, 1e12, size=(1000,))
X_2.sort()
X_3 = np.random.randint(1e5, 1e16, size=(1000,))
X_3.sort()

X = [X_1, X_2, X_3]

fig, ax = plt.subplots(ncols=len(X), figsize=(12, 6))
for i, x in enumerate(X):
    ax[i].plot(x)
    ax[i].set_yscale("log")
import numpy as np

import matplotlib.pyplot as plt
from matplotlib import ticker, colors

X_1 = -np.random.randint(10000, 90000, size=(100, 100))
X_2 = np.random.randint(10000, 1000000, size=(100, 100))

formatter = ticker.LogFormatter(
    10)
sciformatter = ticker.LogFormatterSciNotation(
    10)

X = [X_1, X_2]

fig, axes = plt.subplots(ncols=2, nrows=2, figsize=(10, 10))
for i, x in enumerate(X):
    ax = axes[0, i]
    m = ax.imshow(x, norm=colors.SymLogNorm(1))
    ax.set_title("Log Formatter")

    cb = fig.colorbar(m, format=formatter, ax=ax, shrink=0.9, )
    ax = axes[1, i]
    m = ax.imshow(x, norm=colors.SymLogNorm(1))
    ax.set_title("Log Formatter Sci")
    cb = fig.colorbar(m, format=sciformatter, ax=ax, shrink=0.9, )

@NelleV NelleV added this to the 2.0 (style change major release) milestone Nov 10, 2016
@NelleV NelleV added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Nov 10, 2016
@NelleV
Copy link
Member Author

NelleV commented Nov 10, 2016

attn @efiring
Here is the patch to fix the ticker issues.

We need to find a better name for the option. sublabel_filtering doesn't really reflect what the parameter does…

@NelleV NelleV changed the title [WIP] ENH ticks sublabel display is now an option [MRG] ENH ticks sublabel display is now an option Nov 10, 2016
@NelleV
Copy link
Member Author

NelleV commented Nov 10, 2016

I can't reproduce the documentation build failure.


Parameter
---------
base : float, optional, default: 10.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a slightly better error out of running it locally (only shows up on python2)

Exception occurred:
  File "/home/tcaswell/source/p/matplotlib/lib/matplotlib/contour.py", line 18, in <module>
    import matplotlib.ticker as ticker
  File "/home/tcaswell/source/p/matplotlib/lib/matplotlib/ticker.py", line 811
SyntaxError: Non-ASCII character '\xc2' in file /home/tcaswell/source/p/matplotlib/lib/matplotlib/ticker.py on line 812, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

There is a non-breaking space in here:

             position: 25496 of 73929 (34%), column: 14
            character:   (displayed as  ) (codepoint 160, #o240, #xa0)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0xA0
               script: latin
               syntax: .    which means: punctuation
             category: .:Base, b:Arabic, j:Japanese, l:Latin
             to input: type "C-x 8 RET a0" or "C-x 8 RET NO-BREAK SPACE"
          buffer code: #xC2 #xA0
            file code: #xC2 #xA0 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-simp-Hack-normal-normal-normal-*-15-*-*-*-m-0-iso10646-1 (#x2A9)
       hardcoded face: nobreak-space

Character code properties: customize what to show
  name: NO-BREAK SPACE
  old-name: NON-BREAKING SPACE
  general-category: Zs (Separator, Space)
  decomposition: (noBreak 32) (noBreak ' ')

There are text properties here:
  face                 font-lock-doc-face
  fontified            t

@tacaswell
Copy link
Member

Now lets see if I can successfully push to @NelleV 's branch... So far the score on this is 0 for 2 in favor of gh.

@tacaswell
Copy link
Member

Great, I am now 1/3 😄

@NelleV
Copy link
Member Author

NelleV commented Nov 11, 2016

I had an unbreakable space in the code?!!

Copy link
Contributor

@afvincent afvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments and a more major one (about the is_decade variable).

I apologize, but I am not very used to the LogFormatter part of the basecode so I won't do much more than commenting. Nevertheless, except my is_decade worry, it looks OK to me.

"""
`base` is used to locate the decade tick, which will be the only
one to be labeled if `labelOnlyBase` is ``True``.

Parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter_s_?

base of the logarithm.

labelOnlyBase : bool, optional, default: False
whether to only label decades ticks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The descriptions starts by a 5-space indent, doesn't it?

Sidenote: the last parameter has a description starting with a capital letter while the first two parameters haven't.

isDecade = is_close_to_int(fx)
exponent = np.round(fx) if is_decade else np.floor(fx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it normal to use is_decade here rather than the isDecade define just before? Besides, there is also a function called is_decade at the module level (see l. 1781).

@@ -1016,33 +1039,29 @@ def __call__(self, x, pos=None):
else:
base = '%s' % b

if coeff in self.sublabel:
if not is_decade and self.labelOnlyBase:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is_decade and the following ones are fine (see l.1030 for defining the variable this time) but again, I am not sure it is a good thing to borrow the same name as a module level function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a really bad idea indead…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as @tacaswell found, I tripped over it without noticing...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been fixed. I renamed is_decade and isDecade into is_x_decade
In general I find the variable names of this module very hard to understand. I'll plan on doing some clean up after 2.0

return ''

if not is_decade:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an interest in not "factorizing" this if statement with the previous one (l. 1044)? Something like

if not is_decade:
    if self.labelOnlyBase:
        return ''
    else:
        return self._non_decade_format(sign_string, base, fx, usetex)
else:
    ...

@afvincent
Copy link
Contributor

Looking at the class LogFormatter, I have a general comment that may be out of the scope of this PR. The word decade is used everywhere but mathematically, what it refers to is more the logarithmic "base", doesn't it? Without going up to changing all the names of the variables in the code, maybe it should be emphasized in the docstrings that the term "decade" is in fact a convenient name given to the logarithmic base due to the default value of base.

@NelleV
Copy link
Member Author

NelleV commented Nov 11, 2016

The naming of this module is completely off, but I'd rather wait until after 2.0 to refactor it.

@afvincent
Copy link
Contributor

@NelleV About a new name for sublabel_filtering, what about label_pruning or something close? If True, the option does prune the labels to only a few of them, doesn't it?

@NelleV
Copy link
Member Author

NelleV commented Nov 11, 2016

@afvincent That's already much better than sublabel_filtering

if usetex:
return (r'$%s%s^{%d}$') % (sign_string,
base,
nearest_long(fx))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misaligned.

else:
return ('$%s$' % _mathdefault(
'%s%s^{%d}' %
(sign_string, base, nearest_long(fx))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a case where using .format is definitely clearer.

@NelleV
Copy link
Member Author

NelleV commented Nov 14, 2016

@efiring It could be great if you could have a look at this PR before wednesday

@efiring
Copy link
Member

efiring commented Nov 15, 2016

@NelleV, what I now realize is that I don't like the present implementation of subsetting in the LogFormatter. I have a simple variation that I think works much better, in that it applies subsetting only if more than a given fraction (0.4 works well) of a decade is shown. Right now I have that threshold hardwired, but of course it could also be a kwarg, replacing label_pruning. Setting it to 1 would completely disable the subsetting. Fundamentally, there are actually 2 thresholds, so both could be exposed as kwargs.

I will try to make a PR against your repo to illustrate this.

@efiring
Copy link
Member

efiring commented Nov 15, 2016

@NelleV, I botched the attempt to open a PR against your repo, but I think that failure nevertheless leaves you with access to the one commit that I would like to see added.

@@ -801,7 +801,7 @@ class LogFormatter(Formatter):
Format values for log axis.
"""
def __init__(self, base=10.0, labelOnlyBase=False,
label_pruning=False):
minor_thresholds=(1, 0.4)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the name minor_thresholds as the LogFormatter are minor/major agnostic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to do all of this in a single class may be the original sin here.


label_pruning : bool, optional, default: False
when set to True, label on a subset of ticks.
minor_thresholds : (subset, all), optional, default: (1, 0.4)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this option works from the documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> "subset_thresholds"? I actually had that first, then changed it to "minor" because I thought that in practice it would be closer to the actual meaning.

@NelleV
Copy link
Member Author

NelleV commented Nov 15, 2016

I am going to do an attempt at improving the documentation at some point today (unless you beat me to it)

@efiring
Copy link
Member

efiring commented Nov 15, 2016

I need to work on other things now, so it's all yours.

if self._sublabels is not None and coeff not in self._sublabels:
return ''

if not is_decade:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isDecade above, is one of them a typo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for catching that. It looks like every instance of is_decade as a scalar, as opposed to a function, should be replaced by isDecade.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a type. isDecade is undefined in this function. I'll fix that.

@NelleV
Copy link
Member Author

NelleV commented Nov 16, 2016

I fixed the non defined variable problem, but haven't found time to work on the documentation.
I'm not super happy with the current API but I am not convinced we can do better. If we agree about the API, I think this can be merged and I'll fix the documentation later.

@efiring
Copy link
Member

efiring commented Nov 16, 2016

@NelleV, I think you made the change in the wrong direction. We really need to fix the whole module to remove the shadowing of the function by the scalar, hence the need to switch to isDecade everywhere that the scalar is meant.

@NelleV
Copy link
Member Author

NelleV commented Nov 16, 2016

@efiring I just fixed the issue of using an undefined variable so that the code runs. I haven't changed the naming.

@NelleV
Copy link
Member Author

NelleV commented Nov 16, 2016

There are other places where the variable name is wrong (and by wrong I mean the code cannot work).
I'll fix all the naming of this module at once.

'%s%s^{%d}' %
(sign_string, base, nearest_long(fx))))
else:
if self.labelOnlyBase and not is_decade:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem here as well.


def __call__(self, x, pos=None):
"""
Return the format for tick val `x` at position `pos`.
Return the format for tick val `x`.
"""
b = self._base
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiring Has this been done for performance reasons?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I thought it was misleading. The pos value is never used in this Formatter, and doesn't have to be specified.

@efiring
Copy link
Member

efiring commented Nov 16, 2016

It looks like you and Tom are both unhappy with "minor_thresholds"; do you want to change to "subset_thresholds", or "pruning_breakpoints", or "intermediate_thresholds", or "nonBaseLabelThresholds" (a 3-hump camel, consistent with "labelOnlyBase") or...?

@NelleV
Copy link
Member Author

NelleV commented Nov 16, 2016

@efiring I am more confused about the tuple that we provide to the thresholding tuning, but I possibly just have to get use to it. It just seems to me that it's hard to understand what the first and second values correspond to and easy to do a mistake and provide only one value or too many values (in which case the code will crash with a probably unmeaningful error message).

I am also not coming up with something better at the moment… (I am very sleep deprived).

@efiring
Copy link
Member

efiring commented Nov 16, 2016

@NelleV, get some sleep; you can read this later.

Yes, the use of a tuple here implies it is an "expert option", with the expectation it would rarely be set to other than the default. Most users don't instantiate their own Locators and Formatters. I also left out validation with the idea that it could be added once we are sure we want this API at all, and when we get around to it; we are far from validating every argument to every function, so lack of validation here is not egregious.

The present algorithm takes two parameters that are both thresholds applied to the same variable, so there is some natural appeal to packaging them up in a tuple; it saves having to come up with two names. But it would be easy to switch to using two variables, e.g., "label_all_non_base" and "label_half_non_base". They could be described as upper thresholds applied to the axis range expressed in powers of the base.

@NelleV
Copy link
Member Author

NelleV commented Nov 24, 2016

@dopplershift @tacaswell That patch is ready.

@dopplershift
Copy link
Contributor

LGTM 👍

@NelleV NelleV changed the title [MRG] ENH ticks sublabel display is now an option [MRG+1] ENH ticks sublabel display is now an option Nov 25, 2016
Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about how we want the sublabels to behave, but a couple of clarifications below.

This is normally True for major ticks and False for
minor ticks.

minor_thresholds : (subset, all), optional, default: (1, 0.4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all is the lower bound, would it make sense to swap the order?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could go either way; I chose this because it corresponds to the order in which the thresholds are checked: first see if any minors are labeled, and if so, see if all of them are. The most common case is expected to be that none are labeled.

``labelOnlyBase=True`` to turn off minor ticks.
Parameters
----------
labelOnlyBase : bool, optional, default: True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

if subs is None: # consistency with previous bad API
self._subs = 'auto'
elif cbook.is_string_like(subs):
# TODO: validation ('all', 'auto')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fixable now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

self._linthresh = linthresh
else:
raise ValueError("Either transform or linthresh "
"and base must be provided.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a small clarification because it's not in the docstring and it can be interpreted as either "(transform or linthresh) and base" or "transform or (linthresh and base)".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed.

@NelleV
Copy link
Member Author

NelleV commented Nov 29, 2016

Can someone review this?

@NelleV
Copy link
Member Author

NelleV commented Dec 1, 2016

\o/

@tacaswell
Copy link
Member

Just staring at it a bit longer....

@tacaswell tacaswell merged commit fb4b061 into matplotlib:v2.x Dec 1, 2016
@QuLogic QuLogic changed the title [MRG+1] ENH ticks sublabel display is now an option ENH ticks sublabel display is now an option Dec 1, 2016
@tacaswell
Copy link
Member

I suspect that we will still find some weird corner-cases, but that is because it turns out log labeling and ticking is hard.

Thanks to everyone who worked on this PR!

@efiring efiring deleted the 7202_symlog_colorbar_ branch December 1, 2016 05:32
# Add labels between bases at log-spaced coefficients
c = np.logspace(0, 1, (4 - int(numdec)) + 1, base=b)
self.sublabel = set(np.round(c))
self._sublabels = set(np.linspace(1, b, b))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this third argument supposed to be b? It's a float, which is starting to cause warnings with new NumPy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yes, it is supposed to be b, but it should be an integer value, and it can be cast to an int. If it is not an integer then we shouldn't be trying to put in sublabels at all. I will fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #7583 for a fix. Contrary to my comment above, I decided it is OK to allow sublabels when b is not an integer, though I expect this will be extremely rare in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants