Skip to content

Proposal: Implement RcParams using ChainMap and remove dict inheritance #25617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

chahak13
Copy link
Contributor

@chahak13 chahak13 commented Apr 4, 2023

PR Summary

The current implementation of RcParams inherits both MutableMapping and dict. Inheriting from dict has a few problems IMO,

  • Any changes in dict get propagated to RcParams directly. (Can't import matplotlib #12601)
  • Subclassing dict means RcParams can be modified using dict.__*__ methods. As a configuration class for matplotlib, I think all updates to RcParams should be possible only using methods defined by the class. (Data access API for rcParams #24730, RcParams should not inherit from dict #12577 )
  • As it is basically a dictionary, keys can be deleted from RcParams which would affect the working of the library.
  • It should not be possible to use RcParams as a store for keys other than the configuration keys with validators.

Proposed Implementation:

  • Remove the dict inheritance and just inherit MutableMapping to main that interface.
  • Introduce namespaces by restructuring the parameters ingested as a dictionary of ChainMaps.
    For example:
namespace_maps = {
    "backends": ChainMap({}, default_backends_values),
    "lines": ChainMap({}, default_lines_values),
    "fonts": ChainMap({}, default_fonts_values),
}

Where default_<namespace>_values are the default values of the RcParams under that namespace. For example.

default_backends_values = {"webagg.port": 8080, "webagg.address": "127.0.0.1", ...}

Advantages:

  • Allows concrete grouping of keys, not just visually by key as is the case right now (dotted keys).
  • Contextually related keys can be accessed easily using namespaces.
  • No need to maintain an extra RcParamsDefault object as these defaults will be present in the base default values of the ChainMaps.
  • Contexts can be implemented more elegantly using new_child and parents in ChainMaps as compared to the the current method of creating and deleting RcParams objects.
with rc_context(rc={"backends.key1": 123}):
    print("\nwithin context")
    print(rcParams['backends'])

print("\nafter context")
print(rcParams['backends'])

would output

within context
ChainMap({'key1': 123}, {}, {'key1': 'abc', 'key2': 'xyz'})

after context
ChainMap({}, {'key1': 'abc', 'key2': 'xyz'})

Disadvantages:

  • popitem() method for MutableMapping is slightly unruly to implement. (Though, I would argue that it should not work on RcParams in any case). Also, relevant discussion here (RcParams should not inherit from dict #12577 (comment))
  • Tinkering with a long-existing part of the code?

Doing this will also allow us to move some extra functions that exist in the matplotlib namespace to the RcParams class instead, clearing up the namespace a little bit. (#2052)

Performance Impacts

Introducing this structure instead of the entire rcParams being a dictionary makes it a little slower. The timings for a get operation on both the methods were measured using %timeit for 1,000,000 loops each.

Current Implementation (dict):

In [3]: %timeit -n1000000 mpl.rcParams.get("animation.convert_args", [])
216 ns ± 16.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Proposed Implementation (ChainMaps):

In [12]: %timeit -n1000000 mpl.rcParams.get("animation.convert_args", [])
918 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

For completion, while disabled right now, the timing for mpl.rcParams["animation"]["convert_args"] is also measured

In [15]: %timeit -n1000000 mpl.rcParams["animation"]["convert_args"]
467 ns ± 15 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Older description

I removed the dict inheritance from the RcParams class and reimplemented its internals using ChainMaps. The interface is still compatible with MutableMapping interface. Using ChainMaps allows namespace access too, for example rcParams['text'] will return all the properties under text.*. I also like the way it allows context management using new_child() and parents. The main goal of the PR is just to see what a different implementation might look like per #24585 and have some more discussion.

This implementation can be made efficient. I also have changed a few things like backend to default.backend which is something we might not want in the actual implementation but I have changed here because the focus was on having an implementation and not worrying way too much about changes. It's mainly adding default. to the settings that did not fall under any "category".

I did not implement the find_all method initially because from a quick github search, it isn't used anywhere, but it can now be implemented fairly easily by searching on the keys. The __repr__ and __str__ differ from the current implementation as they have a very simple implementation right now.

PR Checklist

Documentation and Tests

  • Has pytest style unit tests (and pytest passes)
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).
  • New plotting related features are documented with examples.

Release Notes

  • New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
  • API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
  • Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

@chahak13 chahak13 changed the title Chainmap rcparams Implement RcParams using ChainMap and remove dict inheritance Apr 4, 2023
@tacaswell
Copy link
Member

Interesting. I see the benefit for doing the context (and it might be a better way to keep the default defaults aronud!)

However I'm not sure how the ChainMap helps with the namespacing? That seems to mostly be hnadled by the custom get/set methods?

@chahak13
Copy link
Contributor Author

chahak13 commented Apr 5, 2023

Yeah. It started as updating the ChainMap getters/setters but I was checking the code and the final ChainMap for self._mapping might not be needed as the class itself is acting as a "ChainMap".

Though, I'm not entirely sure how to implement popitem(). The current implementation is definitely wrong but if we're adding namespaces then we'll have to specifically keep track of the last updated key to remove it since popitem() pops the last key.

@chahak13 chahak13 marked this pull request as draft April 6, 2023 22:26
@chahak13 chahak13 force-pushed the chainmap_rcparams branch from ff13a4c to 8f3c3b2 Compare April 7, 2023 21:08
@chahak13 chahak13 changed the title Implement RcParams using ChainMap and remove dict inheritance Proposal: Implement RcParams using ChainMap and remove dict inheritance Apr 13, 2023
@chahak13 chahak13 force-pushed the chainmap_rcparams branch from 7ebe855 to 88b068d Compare April 17, 2023 21:51
@chahak13 chahak13 marked this pull request as ready for review April 17, 2023 22:25
@chahak13
Copy link
Contributor Author

@ksunden had to change a few things in the stubs file too but not very sure of the changes. Would appreciate it if you can take a look. Thanks!

Comment on lines +75 to +71
namespaces: tuple
single_key_set: set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be more specific about the type of the values?

e.g. tuple[str, ...] (this is how you say homogeneous tuple of str, indeterminate length)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! I wasn't sure how to explain indeterminate length so kept it as just a tuple. I'll update it.

Copy link
Member

@ksunden ksunden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While my more concrete comments are about the type hinting side of things, I'm not super sold on changing the canonical names we use for the keys in the "default" namespace, even with some level of backwards compatibility...

def __setitem__(self, key: str, val: Any) -> None: ...
def __getitem__(self, key: str) -> Any: ...
def __delitem__(self, key: str) -> None: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually inherited from MutableMapping, but doesn't hurt to put it here more explicitly...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these mainly because we overwrite these functions. So, just to be consistent with having stubs for the functions implemented..

@chahak13
Copy link
Contributor Author

I'm not super sold on changing the canonical names we use for the keys in the "default" namespace, even with some level of backwards compatibility...

I'm not very fond of the name default.* either but am not sure what to actually name it. Maybe something like internal.* would make sense but we also have another _internal.*? I'm open to suggestions for this.

Or do you mean changing the names in matplotlibrc? If that is the case, I'm trying to work on having it such that naming it as <some_namespace>.backend etc is not required for any of those without an easy namespace.

@ksunden
Copy link
Member

ksunden commented Apr 26, 2023

Yeah, mainly I'm thinking that any and all usage should likely stay with the unnamespaced name (in rc files, but also in code) and whatever designation is used should be purely internal to the class.

@chahak13 chahak13 force-pushed the chainmap_rcparams branch 3 times, most recently from d678c93 to 7361150 Compare April 26, 2023 22:36
@chahak13 chahak13 force-pushed the chainmap_rcparams branch from 7361150 to dcfaf01 Compare May 9, 2023 23:58
elif depth == 2:
return self._namespace_maps[keys[0]].maps[-1].get(keys[1])

def getdefault(self, key):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow the naming convention here. While some stdlib API does not use underscores for historic reasons (e.g. dict.setdefault) we should follow the naming convention and not some special historic cases.

Suggested change
def getdefault(self, key):
def get_default(self, key):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind changing this but 1. Do we change setdefault to set_default too? and as you mentioned 2. Do we still keep setdefault?


return self._get_default(key)

def getdefaults(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def getdefaults(self):
def get_defaults(self):

@@ -890,6 +1057,7 @@ def _rc_params_in_file(fname, transform=lambda x: x, fail_on_error=False):
raise

config = RcParams()
config._parents()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain what this does and why we need it (at least in its docstring). The need to call a private method feels like a design flaw to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so this is a little ugly and I'm not very convinced either but the idea is - Since we're initializing RcParams with an empty dictionary before populating it with the default values, this leads to RcParams being initialized as basically ChainMap({}, {}). So, all the updates are then done in the first dictionary and the last dictionary just remains as an empty dictionary. So, to have the defaults in the last layer and have something like ChainMap({}, defaults) instead of ChainMap({}, defaults, {}), I remove the added layer and once all defaults have been set, I add a new layer.

Ideally, I would want to initialize it as RcParams(defaults) and not RcParams() and update then.

@@ -918,6 +1086,7 @@ def _rc_params_in_file(fname, transform=lambda x: x, fail_on_error=False):
or from the matplotlib source distribution""",
dict(key=key, fname=fname, line_no=line_no,
line=line.rstrip('\n'), version=version))
config._new_child()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above: Please explain what this does and why we need it (at least in its docstring). The need to call a private method feels like a design flaw to me.

@timhoffm
Copy link
Member

I'm not convinced by the proposed namespacing concept: It seems to hard-code one level space level, e.g. for ytick.right you have ytick as the namespace and right as the key. But in fact we currently have one to three levels (e.g. timezone, ytick.minor.width). The one level needs special code paths and the three levels are not working correctly for the advertized "return group" features:

In [1]: from matplotlib import rcParams

In [2]: rcParams['ytick']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[2], line 1
----> 1 rcParams['ytick']

File ~/git/matplotlib/lib/matplotlib/__init__.py:819, in RcParams.__getitem__(self, key)
    816         from matplotlib import pyplot as plt
    817         plt.switch_backend(rcsetup._auto_backend_sentinel)
--> 819 return self._get(key)

File ~/git/matplotlib/lib/matplotlib/__init__.py:760, in RcParams._get(self, key)
    755         return self._namespace_maps["default"].get(key)
    756     # Uncomment the following line and remove the raise statement
    757     # to enable getting namespace parameters.
    758     # return self._namespace_maps[key]
    759     else:
--> 760         raise KeyError(
    761             f"{key} is not a valid rc parameter (see rcParams.keys() for "
    762             f"a list of valid parameters)")
    763 elif depth == 2:
    764     return self._namespace_maps[keys[0]].get(keys[1])

KeyError: 'ytick is not a valid rc parameter (see rcParams.keys() for a list of valid parameters)'

In [3]: rcParams['ytick.minor']

In [4]: rcParams['ytick.minor.width']
Out[4]: 0.6

To keep things simple (at least for a start), I suggest not to mix namespacing and chainmap. Let's keep the full dotted names as keys in the internal structure and use a simple two-element ChainMap (untested code):

def __init__(defaults):
    self._default_settings = defaults
    self._user_settings = {}
    self._effective_settings = ChainMap([self._user_settings, self._default_settings])

This should have fairly straight forward implementations for the MutableMapping interface.

You can still build grouping in later dynamically (untested code):

def _keys_with_prefix(self, prefix):
    prefix = prefix + '.'
    return [k for k in self._effective_settings.keys() if k startswith prefix]

def __getitem__(key):
    try:
        return self._effective_settings[key]
    except KeyError:
        pass
     rc_group = {k: self._effective_settings[k] for k in self_keys_with_prefix()}
     if not rc_group:
         raise KeyError(f'{key} is not a valid key')
     return rc_group

That's a bit more expensive but likey bearable. (One could also cache the prefix -> full keys mappings.)

@chahak13
Copy link
Contributor Author

Thanks, @timhoffm. Just a small note, the one level namespace is not working because I intentionally removed it in the last commit based on the discussion in a dev call. In any case, I see your point. Let me go back and think about this again. I plan to get back to this again sometime this week.

@chahak13 chahak13 force-pushed the chainmap_rcparams branch from af93c0b to 56fc4d6 Compare May 28, 2023 21:15
@chahak13
Copy link
Contributor Author

@timhoffm, I narrowed the scope as you suggested and just focused on removing the dictionary inheritance and the rcParamsDefaults object. Will take a look at namespacing again. I'm not sure what suits better, dynamic namespacing for any depth or having a fixed depth but that's a bigger conversation. Let me know if you have more thoughts on the current changes though, thanks!

Comment on lines +846 to +845
def setdefault(self, key, default=None):
"""Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.
"""
if key in self:
return self[key]
self[key] = default
return default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current setdefault behavior (side-effect of __setitem__ not accepting new keys): Return the value if key is valid, otherwise raise. We should keep this behavior for compatibility.

Note that this behavoir is identical to rcParams[key]. Optional: Deprecate setdefault and recommend rcParams[key] instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just return the value (maybe catching the KeyError on misses to match the current error message exactly) and explicitly handle the one case we use this internally .

This is to make the interface such that the code other than RcParams
doesn't have to deal with the `deafult.*` namespace anywhere. This also
changes the keys returned by `.keys()` to not have the `default.*` in
the key name.
As per @timhoffm's suggestion, decreasing scope of this PR to first just
remove the dict inheritance and add a ChainMap to maintain settings.
This is a fairly straightforward change and doesn't change the
interface. Furthermore, the keys are still dotted and don't support
namespacing as defining namespaces might take a little more discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants