Skip to content

Begin warning on modifying global state of colormaps #16991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 27, 2020

Conversation

greglucas
Copy link
Contributor

@greglucas greglucas commented Apr 1, 2020

PR Summary

This is the start of deprecating global state access for built-in colormaps. It doesn't implement any solutions yet, just begins to warn users about what will happen.

Returning a copy of colormaps is implemented in #16943

Per the dev call, the tentative plan forward is:

3.3

  • still return global
  • warn on set_*
  • make cmap_d private and warn on read/write

3.4:

  • return a copy
  • don’t warn on set_*
  • warn on re-accessing a color map that would have been changed

3.5:

  • either return a copy or make colormaps immutable

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

_cmap_d = _gen_cmap_d()
locals().update(_cmap_d)
# This is no longer considered public API
cmap_d = _cmap_d
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to warn on read/write here. Do I need to create a new Dictionary subclass that overwrites the getters/setters with the warn_deprecated call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess something like #10735 (simpler if Py3.7+) would be needed to fully warn on access (and then you don't need to warn on getitem/setitem I guess)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a nice helper! I think that may be very helpful for deprecating plt.cm usage. I decided to go the quick route and just subclass dict and add in warning on get/set only.

@greglucas
Copy link
Contributor Author

@timhoffm, just wanted to add you here to get your feedback since I can't explicitly request you as a reviewer. Happy to hear your thoughts on this.

@greglucas
Copy link
Contributor Author

Actually, I started searching around a bit to see where/how others are using cmap_d and I think deprecating cmap_d needs a little more careful consideration for a good alternative/replacement. There really is no way to access a list of what all is in the cmap_d dictionary explicitly without either using cmap_d or to use the pyplot interface. We should think about adding a get_registered_cmaps() function or similar to list what is available to the get_cmap() call.

We explicitly use it in this example:
https://matplotlib.org/3.1.0/gallery/images_contours_and_fields/demo_bboximage.html

There is this SO post with the first answer stating to use sorted(cmap_d):
https://stackoverflow.com/questions/47302343/what-names-can-be-used-in-plt-cm-get-cmap

There is also mention of accessing plt.cm.datad! So, the plt.cm registering all public methods is going to be tricky.

@QuLogic
Copy link
Member

QuLogic commented Apr 2, 2020

Maybe removal should be scheduled for 4.0 or whatever sounds like "a long time from now".

@greglucas
Copy link
Contributor Author

@timhoffm, Did you have any thoughts on this? I'd like to try to get some of the warnings in for the 3.3 release to get the deprecation cycle started. No API code is being changed yet.

  • The first commit is warning on set/get from the global cmap registry and I think everyone was in agreement about that.
  • The second commit is warning on accessing cmap_d directly, which was a little harder to warn against using (Discussion above about potentially waiting until 3.7 only for a dependency).

@timhoffm
Copy link
Member

👍 for pushing the deprecations into 3.3.

@greglucas
Copy link
Contributor Author

Thanks, Tim! Great suggestions. I just pushed up a commit with all of those suggestions in it.

I also decided to take your advice for more descriptive names and apply it to the private _cmap_d that I added in. It is now called _cmap_registry. I can change that back if desired, but figure this will help to describe what the dictionary actually is.

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to use _cmap_registry instead of _cmap_d. Sometimes you overlook the obvious.

@@ -104,7 +146,8 @@ def register_cmap(name=None, cmap=None, data=None, lut=None):
raise ValueError("Arguments must include a name or a "
"Colormap") from err
if isinstance(cmap, colors.Colormap):
cmap_d[name] = cmap
cmap._global = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document in the docstring that if passing a Colormap instance, that instance will become immutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in a Notes section to both the get_cmap and register_cmap functions now. Let me know if you want any different wording there. It may be odd to talk about future behavior there considering you can't do anything to get that future behavior yet? But I can't think of any great wording off the top of my head.

@greglucas greglucas force-pushed the get_cmap_warn branch 2 times, most recently from d5a8b91 to 17bb516 Compare April 24, 2020 17:44
"a copy of the requested colormap before modifying it. ",
alternative="To modify a colormap without overwriting the "
"global state, you can make a copy of the colormap "
f"first. cmap = copy.copy(mpl.cm.get_cmap({cmap.name})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty verbose alternative for something that I expect to catch a fair number of users. I am in favor of adding the copy=True kwarg to get_cmap in the same move as we add this warning so we can provide a less verbose (if less back-compatible) option to users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to just change the wording here for now, and put .copy() instead of copy.copy(). I think the wording was just a little long, so I tidied it up. The alternative is actually less verbose than copy=True. I also don't really know if we should add a keyword for one version that will just be removed a version later when it is unnecessary. Let me know what you think...

@tacaswell tacaswell added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Apr 24, 2020
@tacaswell
Copy link
Member

I labeled as release critical as I think this is very close and do not want it to fall through the cracks as we do the final push to 3.3.

@greglucas
Copy link
Contributor Author

👍
If anyone has ideas on why the docs are failing, let me know... I tried to look for what the Error was and couldn't figure out how to parse the Sphinx output for it. It must be something with docstring formatting?

Good comments on the rest of it. I'll try and push up changes to address them tomorrow.

@tacaswell
Copy link
Member

Can you try rebasing on current master?

We are very cranky about making sure all of the internal references actually work (with a list of allowed errors). Sphinx made some internal changes that affected how references to the nested classes work so we re-generated the allowed list to the new version. Unlike travis which runs against the shadow merge commit GH makes to determine if your branch is mergable, circle runs against your actually commit so if you have not rebased on master from when we fixed that circle will be broken.

@@ -38,7 +38,7 @@
a = np.vstack((a, a))

# List of all colormaps; skip reversed colormaps.
maps = sorted(m for m in plt.cm.cmap_d if not m.endswith("_r"))
maps = sorted(m for m in plt.colormaps() if not m.endswith("_r"))
Copy link
Member

@timhoffm timhoffm Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a remark (not necessarily to be acted upon in this PR):

By making the cmap registry private, we don't have an official way to get the registered colormaps from the cm module, where I would expect such functionality. This is something we should take care of in followup PRs.

I'm not even sure plt.colormaps() is needed. First, It's essentially a lengthy description of colormaps, which should be somewhere in the docs but not in a pyplot docstring. Second, I don't see providing the colormaps as a core functionality of pyplot, but since there's a lot of historic colormap stuff in pyplot, that's maybe ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought: #16991 (comment)

I'm not sure what the best step is to expose the access, but would agree that I don't want to have to go to pyplot if not necessary. Maybe rather than a cmap_d, we could do something like your MutableMapping suggestion and then override the get/set methods to ensure copy/immutable return.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this is a bit unstructured. I just stated typing and ideas unfolded along the way.

There would be two ways:

  1. The cm module currently uses simple functions get_cmap() and register_cmap(). In that spirit, you could add a function get_cmap_names() (naming t.b.d.) but we have to make a distinction between get_cmap returning an instance and that function (probably?) returning strings.

  2. The alternative would be to expose a ColormapRegistry object, which provides all necessary functionality. That would likely be subclassing MutableMapping to look dict-like but still perform all relevant checks (maybe it would alternatively only implement Mapping and provide an explicit register method if we feel the mutation should be more special (i.e. we don't want del registry['jet'] or registry['jet'] = Colormap(...)).
    This would also replace the module level functions; at least their implementation. Whether they will additionally stay for backward compatibility is a separate topic.

In the end 2 is the more complex approach. However, I have a slight preference for it because it properly encapsulates the colormap handling. On the downside it changes the API from calling simple cm functions to an object. But then again, we could move the actual instance to a variable matplotlib.colormaps, which may make more sense and users would usually not need to import matplotlib.cm anymore; and pyplot.colormaps could later even point to the same instance (with some API compatibility because the keys of that mapping would then colormap names, so for name in plt.colormaps() would still work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with pretty much everything you're saying. I also don't think 2 would be that hard to implement. An issue with naming will be found with: #11413 which is already proposing to rename cm to colormaps. This discussion is really more a part of the MEP: https://matplotlib.org/3.2.1/devel/MEP/MEP21.html
Maybe we should do some larger work on an overarching vision there and then decide on how best to get there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we provide colormap registry access from matplotlib directly, cm.get_cmap and cm.register_cmap become obsolete. There won't be a need for the end user to import cm, which in turn reduces the need for renaming (#11413). But we could still find other names for that.

It would be reasonable to detail the overall vision.

@@ -70,6 +70,73 @@ def test_register_cmap():
cm.register_cmap()


@pytest.mark.skipif(matplotlib.__version__ < "3.5",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant adding tests for future intended behavior. We don't know for sure if this will be implemented for the version and with the behavior we anticipate now.

Additionally, the test will be ignored during the complete development phase (master version targeting 3.3 is currently evaluated to 3.2.1.post2103+gb94812c60). The check would only hit very late during the release.

You may keep this, but it needs explanation. If we are hit by the failing test in a year or so, we need a clear way of finding out why this is here, why the tested functionality is desired and what we have to do to implement it. Therefore, please add a comment describing the intention and referring to the modification roadmap discussed on github.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is unfortunate that this wouldn't actually hit until near the release. Is there a way to test against something else? My thought is that leaving these there will make sure this doesn't get lost along the way and it isn't hard to implement these since I've already got the code in another linked PR. I put a link to the gh issue number in as a comment now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tests could be earlier but would still hit at some other time and break CI, when we maybe don't want to implement this right now.

Either leave it like this or remove it here and put the test just into your PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to put the test in the PR where the change is actually made.

There are better ways to document a future test than a skipped test (I don't think this would actually get run until we tagged the rc.

As a side note, In general comparing versions via text is not the best way as it sorts pre-releases as being later than the final releases, distuitls.version.LooseVersion is usually the better option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that suggestion of comparison with distutils. I just removed all of these future tests. I'll hope it doesn't drop off my radar.

self._warn_deprecated()
return self._cmap_registry.__getitem__(key)

def __iter__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not warn for list(mpl.cm.cmap_d) but does warn for list(mlp.cm.cmap_d.items()). Was this an oversight that __iter__ does not warn or a decision that because it is only giving a list of keys that could then go to get_cmap it should not wark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oversight, the iter still warns because of the getitem, but I didn't think of people just listing the colormaps and using it manually themselves elsewhere. I just added it here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for __len__ below? Sorry for the thousand paper cuts at the end...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, easy enough to fix. Should have just done it myself ;)

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right direction to go in terms of removing a foot-cannon, but I am still a bit worried about the amount of users this is going to affect. Hopefully I am over estimating it and in the users we do hit are mostly people we are saving from confusing bugs later.

👍 to merge modulo CI green.

@mperrin
Copy link

mperrin commented Sep 14, 2020

Just wanted to say, bravo for the super clear deprecation message including specific code suggestions for how to update downstream code. I can see that took extra effort to put together and it's much appreciated. 👏

@jklymak
Copy link
Member

jklymak commented Oct 21, 2020

I wish I'd followed this better. I did

cmap = cm.RdBu_r
cmap.set_over('yellow')

and was told I had to do:

MatplotlibDeprecationWarning: You are modifying the state of a globally registered colormap. In future versions, you will not be able to modify a registered colormap in-place. To remove this warning, you can make a copy of the colormap first. cmap = copy.copy(mpl.cm.get_cmap("RdBu_r"))

Thats pretty user hostile. When will this get fixed?

@greglucas
Copy link
Contributor Author

I guess it depends on what you mean by fixed.
#18503 will return a copy of the colormap, which should remove a lot of the Deprecation dance. That will hopefully be in 3.4...

We could update the message pretty easily too if you have any suggestions there.

@jklymak
Copy link
Member

jklymak commented Oct 21, 2020

I guess I mean an alternative that doesn't require an import and copy call. If that is #18503 we should get that in ASAP.

I don't understand why a deprecation was deemed necessary for that PR to go in though. I'd actually vote to revert this for V3.3.3, but maybe I'm not understanding some subtlety.

@greglucas
Copy link
Contributor Author

This was to enable people who may have been (ab)using the ability to modify the global colormap to change before we remove the capability entirely.

In your example, if you keep running the program, or go to some other Jupyter notebook cell and want to get the RdBu colormap again, it would still have the yellow over color, which is probably not desired.

cmap = cm.RdBu_r
cmap.set_over('yellow')

# Later on you get the built-in colormap again
cmap2 = cm.get_cmap('RdBu_r')
cmap2(np.inf)  # yellow

I agree it is somewhat annoying to import copy for now. We had discussed bringing in a copy=True keyword argument, but then we'd just have to deprecate that after a single release once we always return a copy/immutable version of the colormap.

@jklymak
Copy link
Member

jklymak commented Oct 21, 2020

Sure, I agree with the over all strategy, just not the tactics. As of 3.3 there is no other way to get the colormap except to do copy.copy. So now our users who need this, change to copy.copy. And then in 3.4 we offer a way to return the colormap, so now those users have to change again. Why didn't we just introduce the change?

@greglucas
Copy link
Contributor Author

As of 3.3 there is no other way to get the colormap except to do copy.copy

Maybe this is just a poorly worded message then. The only reason to use copy here is if you need to re-get the original RdBu colormap at some later time. You can certainly keep using the cm.RdBu_r throughout your code as you normally would, and the warnings should only happen once when you are setting the over/under/bad on a built-in colormap.

My vote would be to leave this as is for now and make sure 3.4 is returning a copy. I really don't think this is that egregious, more of a minor annoyance along the deprecation path.

@anntzer
Copy link
Contributor

anntzer commented Oct 21, 2020

fwiw I guess #14645 may help too?

@jklymak
Copy link
Member

jklymak commented Oct 21, 2020

The message would be useful if people had to change their code. But so far as I can tell we are going to allow cmap=cm.RdBu_r will in the future return a copy, so there is no code we need to warn the user to change.

The only users who would have to change behaviour are those who were for some reason depending on being able to access the colormap from the global state.

cmap = cm.RdBu
cmap.set_over_color([0, 1., 1., 1.])

.....
cmap=cm.RdBu
assert cmap.get_over_color() == [0, 1., 1., .1.]

which I imagine is exceedingly rare. The rest get a misleading Deprecation message that makes it sound like they won't be able to call set_over_color on the object returns by cm.RdBu any more.

@tacaswell
Copy link
Member

Did we decide to go down the route of module properties (which is py37 so we have it on main)? I do not remember any discussion of it and am not finding it in the text of this issue.

If we do not, then we can't make cmap = cm.RdBu to return a copy as it is Python assignment and we don't have a hook to make the copy.

@greglucas
Copy link
Contributor Author

Good question. I think that was brought up as a possibility and an alternative would be to make the cmap object's immutable (Tim's proposal). I think we could get away with something like defining our own getattr at the module level for the "return a copy" case:

def __getattr__(name):
    return ColormapRegistry[name]

All of this discussion leads me to think we should really have a clear end-goal in mind and lay that out. Then, we can start shimming the deprecations in. There is already MEP21, which looks to be stagnant. Mind if I revive that MEP with some new text and then we can start linking issues/fixes to a more central location?

@jklymak
Copy link
Member

jklymak commented Oct 22, 2020

... or use the Projects feature...

@anntzer
Copy link
Contributor

anntzer commented Oct 22, 2020

Making attribute access return new (mutable) copies seems rather contrary to standard python semantics. In particular this means that something like

from matplotlib import cm
cm.jet.set_bad(...)
plt.imshow(..., cmap=cm.jet)

would silently "fail", whereas

from matplotlib.cm import jet
jet.set_bad(...)
plt.imshow(..., cmap=jet)

would succeed... but I doubt that having both snippets behave differently is a good idea.
(Unless I've completely misunderstood the idea here.)

@tacaswell
Copy link
Member

I agree with @anntzer , while cute there are too many weird cases. Just because you can hijack and change something in Python does not mean you should.

We need to decide if we only want to deprecate mutating the global state when getting the colormap through get_cmap and let people continue to mutate global state with e.g. cm.viridis (which then means we will have to decide if get_cmap('foo') returns a fresh copy of the default state (which would make cm.viridis and `cm.get_cmap('viridis') decoupled) or a copy of the currently accessible one at the module level (which would defeat the purpose of trying to protect the user from spooky action-at-a-distance)) or make the attribute access colormaps actually immutable (which will break a bunch of people).

I think when we wrote the warning we were not thinking about the attribute access case at all. Do we document / recommend that usage anywhere?

@jklymak
Copy link
Member

jklymak commented Oct 22, 2020

I find it useful in these cases to think about what the ideal situation would be and only then figure out how we would get there from where we are (if possible). So, how would this look if we were to start over?

@anntzer
Copy link
Contributor

anntzer commented Oct 22, 2020

My 50c would be that we should just make all colormaps immutable (after merging the with_extremes PR (or similar functionality) and possibly with a very long deprecation period, given that we are basically breaking everyone who way setting over/under/bad before with essentially no way to write code that compatible both with old and new Matplotlib versions, as with_extremes did not exist before).

Especially if we are going to push in the direction of encouraging the use of the global cmap registry (via register_cmap, which I didn't particularly like but I guess that discussion is closed), any colormap can potentially become global state, e.g. via

# in foo.py
foo_cmap = LinearSegmentedColormap(...)
register_cmap("foo", foo_cmap)
...
# in module2.py
imshow(..., cmap="foo")
...
# in module3.py
from foo import foo_cmap
foo_cmap.set_over(...)  # oops, this will affect module2.py if run before it.

@jklymak
Copy link
Member

jklymak commented Oct 22, 2020

It seems to me that the original problem was defining globals like cm.jet in the first place. If we had just stuck with jet=cm.get_cmap('jet') then there would not be a problem.

Given the state we are in, it seems sensible to me that if you modify a global like cm.jet (i.e. cm.jet.set_over) you get what you asked for, and cm.jet is changed everywhere. However, if you call jet=cm.get_cmap('jet') you should get a local copy that does not affect the global state.

It would be nice if jet = cm.jet also returned a copy. Is it even possible to make that return a copy?

TLDR: it would be nice if we could return copies whenever possible, because I don't think it is at all clear to users that cm.jet is a global.

@anntzer
Copy link
Contributor

anntzer commented Oct 22, 2020

Actually, now that I look at it again, I guess a solution for the "sneaky global" problem I mentioned just above would be for register_cmap to be in charge of registering a copy of the colormap, so that later changes to the colormap passed as parameter to register_cmap don't affect the global registry. I think this is similar to how when you do y = [...]; plt.plot(y), later changes to y don't affect the plotted line.

@timhoffm
Copy link
Member

@anntzer it's rarely the problem that one registers a colormap instance and then modifies the instance, thus modifying the registered colormap. While that's a theoretical issue I'm not seriously worried about that. Few people register colormaps themselves, and even fewer will hold on to instances some times and use the registry other times.

The far more common problem is obtaining a reference to the registered colormap and modifying that, without knowing that it affects the registered state.

@anntzer
Copy link
Contributor

anntzer commented Oct 23, 2020

I'd guess making a copy at registration time is simple enough to be worth considering, but won't insist on that either.

@tacaswell
Copy link
Member

The module attribute access has been with us from at least 2004 (e34a333). I think clawing those back is off the table.

So if we were to make the attribute access return a copy:

Pro: same behavior as get_cmap
Con: very surprising, leads to cases where the code looks right, but does not do what it expects

I think the case of

cm.viridis.set_under('w')  # expect to mutate in place
ax.imshow(..., cmap=cm.viridis)  # does not work!!

is enough to take that option out of consideration (even if we could make it work)

If we back off making the attribute access immutable:

Pro: leave the foot-cannon that people are using in place
Con: but the foot-cannon will only half work as things like

cm.jet.set_under('k')
ax.imshow(..., 'jet')

which is a variation on the issue with making cm.jet return a copy. The things that make me angriest when using APIs are when it runs, does not do what I want, but the most "obvious" (and yes I know "obvious" is super subjective) mental model of how it works inside says it should do what I want.

It also will make cm.<name> and cm.get_cmap(name) give possibly different results which seems too confusing to do

If we make the attribute access immutable (which is the status quo)

Pro: avoids all of the cons above
Con: will require a wider net of changes than we discussed going into this.

Backing off of making the global versions of the cmap objects immutable is also an option, but I think that the action-at-a-distance and general global state issues are worth cleaning up for all of the reasons we had when we started this.


On net, I think we should go with making the attribute access immutable and not try to return a copy. It may not be ideal, but I think it is the least-bad of the options we have.


I agree about the copy in cmap_register (I checked, we do not currently do that but we do flag them as global).

@timhoffm
Copy link
Member

It!'s reasonable to do. I'm just clarifying that it does not solve our main issue.

@jklymak
Copy link
Member

jklymak commented Oct 23, 2020

At some point @efiring wanted to change the name of this module anyhow. Is yet another possibility just writing a new way to do this from scratch and leaving cm the way it is, and soft-deprecating it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. topic: color/color & colormaps
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants