Skip to content

VectorMappable with data type objects #28428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

trygvrad
Copy link
Contributor

@trygvrad trygvrad commented Jun 20, 2024

Creation of the VectorMappable class

**NOTE: This PR builds upon the BivarColormap and MultivarColormap classes introduced in ** #28454

PR summary

This PR has been in development during 2024 GSOC as a responese to #14168 Feature request: Bivariate colormapping.
This requires the introduction of multiple new classes, and changes to different parts of the codebase. It has therefore been suggested to introduce this functionality over multiple PRs, so that the changes are easier to review this is the 1st of 3(?) PRs that will resolve #14168.

The end goal of this is to allow for the following funcitonality:

fig, axes = plt.subplots(1, 2, figsize = (12,4))
pm = axes[0].pcolormesh((A,B,C), cmap = '3VarSubA', vmax = (0.4, 0.6, 0.5))
cb0, cb1, cb2 = fig.colorbars(pm, shape = (-1,2))
axes[0].set_title('Subtractive multivariate colormap')
pm = axes[1].pcolormesh((A,B,C), cmap = '3VarAddA', vmax = (0.4, 0.6, 0.5))
cb0, cb1, cb2 = fig.colorbars(pm, shape = (-1,2))
axes[1].set_title('Additive multivariate colormap')

image

fig, axes = plt.subplots(1, 2, figsize = (12,4))

pm = axes[0].pcolormesh((im0,im1), cmap = 'BiOrangeBlue', vmin = -1, vmax = 1)
cax = fig.colorbar_2D(pm, shape = (-1,2))
axes[0].set_title('Square 2d colormap')

pm = axes[1].pcolormesh((im0,im1), cmap = 'BiCone', vmin = -1, vmax = 1)
cax = fig.colorbar_2D(pm, shape = (-1,2))
axes[1].set_title('Circular 2d colormap')

image

The current class hierarchy can be visualized as follows:
rect400

This PR proposes to replace ScalarMappable with a new class VectorMappable, so that multivariate functionality can be inherited by all the relevant plotting methods, as shows below. This includes support for multiple norms (the private variable ._norm always stores a list), support for multiple arguments to .set_vmin(), .set_vmax(), changes to .to_rgba() to support bivariate and multivariate colormaps, as well as support functions to parse multivariate data (top level functions in cm.py).
rect484

This PR does not contain the changes needed in the plotting functions and docs. These will follow a separate PR, as outlined in the figure above. (a (slightly outdated) full implementation is available in the branch here)

PR checklist

¹This is the 2nd out of a series of PRs, and cannot by itself close #14168. Most features are tested, but some private functions do not have complete tests, as they are better tested in a later PR where the private API is tested via the publicly available API (colors._ensure_multivariate_params() etc.)

@story645
Copy link
Member

story645 commented Jun 27, 2024

attn @timhoffm, can you take a quick skim to see if this is a reasonable API and plan/share any concerns?

@timhoffm
Copy link
Member

The initial description says

NOTE: This PR has been superseded by #28454

Is there still something relevant here, or can we close and move the discussion to #28454.

@trygvrad
Copy link
Contributor Author

trygvrad commented Jun 28, 2024

@timhoffm 'superseded' was maybe a bad choice of word. This PR and #28454 will work together, but I think there will be some conflicts if both this and #28454 are accepted (since some changes exist in both), I therefore believe it is easier if I migrate the changes here to a new branch that builds on top of #28454, and make a new PR (and also includes tests). However, this will take a bit of time.

For now, it I think it is a good idea to keep the discussion of this part of the API here, even if this PR is ultimately replaced by another.
We are trying to separate concerns (and also the discussions), by having #28454 be the new colormap classes, and this be the introduction of vectormappable.

I rebased this code so that it correctly builds on #28454

@trygvrad trygvrad force-pushed the VectorMappable-data-type-objects branch from 8ae3a0d to 6fb5884 Compare July 8, 2024 11:41
@trygvrad trygvrad force-pushed the VectorMappable-data-type-objects branch from 6fb5884 to e889929 Compare July 9, 2024 13:28
@trygvrad trygvrad force-pushed the VectorMappable-data-type-objects branch from e889929 to 81181c9 Compare July 9, 2024 13:52
@timhoffm
Copy link
Member

timhoffm commented Jul 9, 2024

Can you please try to summarize the mapping logic for multivariate and 2D colormaps. I cannot follow the architecture between the implementation details and mixing with ScalarMappable logic.

The current scalar mapping works like this. How do VectorMappable and the new colormap types change the picture?

image

Edit: I think the OP is sufficient. I‘ll have a closer look.

@trygvrad trygvrad force-pushed the VectorMappable-data-type-objects branch from 81181c9 to f11c3bf Compare July 9, 2024 16:31
@trygvrad
Copy link
Contributor Author

trygvrad commented Jul 9, 2024

Can you please try to summarize the mapping logic for multivariate and 2D colormaps. I cannot follow the architecture between the implementation details and mixing with ScalarMappable logic.

vectormappable

The VectorMappable stores vector data as a dtype with multiple fields. This is so that X.shape will return the same shape regardless of the number of variates.

This PR does not contain changes to the plotting methods, but it is intended to go something like this.

class Axes:
    def pcolor(self, data, cmap = None, norm = None, vmin = None, vmax = None):
        cmap = colors._ensure_cmap(cmap)
        data, norm, vmin, vmax = colors._ensure_multivariate_params(cmap.n_variates, data, norm, vmin, vmax)

Which transforms data from what is reasonable to accept (proposed: scalar array, complex array, sequence of arrays*, dtype with multiple fields) to a dtype with the correct number of fields (as determined by cmap.n_variates).

*I prefer an input signature of (cmap.n_variates, n, m) for multivariate data, but we could use (n, m, cmap.n_variates) instead.

Edit: @timhoffm Thank you for taking a look, I didn't see your edit before I made the additional figure :D

@timhoffm
Copy link
Member

timhoffm commented Jul 11, 2024

Disclaimer: This contains some fundamental thoughts beyond the concrete MultiVar case, but since MultiVar requires significant architecture changes any, I believe it's worth looking at the whole picture.

I have not yet come to a complete solution, but I put down my working thoughts.

  1. Looking at the structures involved, the three usecases behave alike for data channels and norms, but quite different in terms of colormaps and postprocessing:

    Scalar Bivar Multivar
    data channels 1 2 2..N
    norms 1 2 2..N
    colormaps 1x 1D 1x 2D 2..N x 1D
    postprocessing - - merge colors

    It is reasonable to abstact the step "normalized data channels -> color" in different Colormap classes. I'm inclined to have a similar abstraction for the normalization so that we have
    image
    The norm may also have multiple channels (in the simple form just a collection of scalar norms). The mappable should not have to care about managing the details (i.e. a list of scalar norms). To be checked: How much of the ScalarMappable interface exposes details on the number of channels?

    Also t.b.d.: ScalarMappable attaches the mapping logic to artists via inheritance. Would it make sense to extract the data -> color logic into some separate "Mapper" entity (name t.b.d.)?
    image

  2. We have mutual references between ScalarMappable and Colormap
    image
    We still have to determine how this interferes with the more general mappings.

  3. A current limitation of the mapping mechanism is an assumed 1:1 relation between ScalarMappable and Norm concerning auto-scaling: The norm does not know about the data. If it has no predefined limits, they are automatically inferred from the first call. While it is possible to assign a Norm to multiple Mappables, autoscaling will thus only take into account values from the first mappable. When touching the whole mapping infrastructure anyway, can we improve on this so that a norm can conistently serve multiple mappables?

@trygvrad
Copy link
Contributor Author

  1. It is reasonable to abstact the step "normalized data channels -> color" in different Colormap classes. I'm inclined to have a similar abstraction for the normalization so that we have ...

It is not quite clear to me what the distinction between between the VectorMappable and Mappble class would be in your case. In the weekly GSOC meetings we have considered having a VectorNorm similar to the way we have a VectorMappable (sugegsted by @story645). VectorNorm would then be a generalization of Normalize to support multiple variates.

Having a VectorNorm class would give us certain advantages, and I believe your inclination to have a Mappable class essentially deals with the same issues. From what I can tell, these are:

  • a. When using the same norm on all axes (point 3. above) the litmits should be autoset using the full data, a Mappable class could manage this.
  • b. If there is a coupled norm, i.e. the normalization on axis 1 depends on the value on axis 2.
  • c. Separation of concerns - fewer if-statements in VectorMappable

The alternative, as this PR suggests, is by simply attaching a list of norms to VectorMappable. I am a proponent of this approach because I believe it makes it easier to trace the code (fewer levels of inheritance). Also, I have looked for a use case where a coupled norm would be desirable (point b. above), and I have yet to find one. An example would be if the data is represented in an orthogonal coordinate system, but the normalization should map this in a shear coordinate system. While this would make some mathematical sense, I cannot see a use case. [Note that a circular 2D colormap places different limits than a 2D square colormap, but this is handled after the norm (in the cmap), and does not require a VectorNorm. Limiting the colorspace in other ways can be done by subclassing BivarColormap or MultivarColormap (should be a tutorial).]

However, (all of) you are more experienced to evaluate what makes code more easy to maintain than I am, and if you think a Mappable class is favorable in this regard, we should implement it.

  1. We have mutual references between ScalarMappable and Colormap ...

I have been thinking we need to determine how the equivalent to Colorbar works in the multivariate case, but as you say this topic is probably a bit more general. I have some thoughts on this, but it might be good to discuss this in a weekly development meeting.

  1. A current limitation of the mapping mechanism is an assumed 1:1 relation between ScalarMappable and Norm concerning auto-scaling: ...

To me, the intuitive behavior has always been that each axis has a separate norm:

ax.imshow((A, B), cmap='BiOrangeBlue')
# or 
ax.imshow((A, B), cmap='BiOrangeBlue', norm='LogNorm')

should give two separate norms with independent limits.

I further believe that the intuitive action for a user to take if they want identical limits on both axis is to simply set the limits:

ax.imshow((A, B), cmap='BiOrangeBlue', vmin = -1, vmax = 2)

However, advanced users might want the behavior you describe above, a shared norm, and we should therefore support the following:

n = mpl.colors.Normalize()
ax.imshow((A, B), cmap='BiOrangeBlue', norm=(n, n))

Where both axis are explicitly declared to have the same norm.
With this call signature, I think it would be comparatively difficult for us to change the default (set limits only with first axis), and comparatively easy for the user to get the desired behavior, i.e.:

n = mpl.colors.Normalize()
norm.vmin = np.min((A, B))
norm.vmax = np.max((A, B))
ax.imshow((A, B), cmap='BiOrangeBlue', norm=(n, n))

NOTE: I made a commit this morning that explicitly forbids the call following call signature:

n = mpl.colors.Normalize()
ax.imshow((A, B), cmap='BiOrangeBlue', norm=n)
# → ValuError('When using a colormap with more than one variate, norm must be None, 
# a valid string, a sequence of strings, or a sequence of mpl.Colors.Normalize objects.'

because it I found it unclear if this should be a shared norm (shared limits) or if the norm should be duplicated (independent limits), and I believe we should only accept input where the users intent is clear.

@timhoffm As I understand it, the weekly development meeting today is cancelled, but I will try to be there next week.
If you give me some more details on how you expect the Mappable class to integrate (call signature?), I could prototype an alternative branch before then.

@timhoffm
Copy link
Member

Thanks for the detailed answer, I'll have a closer look later.

Re: 3. Multiple mappables with one colorbar:
I'm referring to https://matplotlib.org/stable/gallery/images_contours_and_fields/multi_image.html. T.b.d. maybe this can be slightly simplified nowadays, but we still have a fundamental disconnect between the images and the colorbar.

@trygvrad
Copy link
Contributor Author

T.b.d. maybe this can be slightly simplified nowadays

I took a look, and the norm now seems to connect automatically, but you still need to manually connect the images before you swap out the colormap.

If we made a 'Mappable' class, we could definitely simplify this, by having a single object that holds the norm+colormap for all images in the example above. I imagine this class would then handle everything that deals with the data→norm→colormap transform, i.e. VectorMappable.to_rgba(...) and parts of Image.make_image(...).

I think you are right that if we want to implement this, it would be a good idea to bundle it with this PR.

Personally I would suggest NormAndColor as a name for the class (I like descriptive names), and I would suggest you can pass it as an alternative to the norm, i.e.:

nac = mpl.cm.NormAndColor(norm='LogNorm', cmap='Viridis')
ax[0].imshow(A, norm=nac)
ax[1].imshow(B, norm=nac)

If this rhymes with what you are thinking, I'll take a look at implementing it.

@story645
Copy link
Member

story645 commented Jul 11, 2024

b. If there is a coupled norm, i.e. the normalization on axis 1 depends on the value on axis 2.

Maybe the usecase in #19515 (by @greglucas ) of choosing vmin and vmax, which I've often done that based on histograms so a linked zoom on histogram and update vmin and vmax w/ the corresponding values.

Also just to make sure I'm on the same page, I'm interpreting the figures above as:

$Bivariate: channel_1 \times channel_2 \rightarrow norm_1(channel_1) \times norm_2(channel_2) \rightarrow RGB$

$multivariate: channel_1 \times channel_2 \rightarrow norm_1(channel_1) \times norm_2(channel_2) \rightarrow RGB_1 \times RGB_2 \rightarrow RGB$

And now I'm wondering where one of the use cases I was originally thinking of fits:

$nvariate: channel_1 \times channel_2 \rightarrow norm(channel_1, channel_2) \rightarrow RGB$

ETA: nvariate is also weird cause it could maybe work w/ scalermappables and the hard part is it knowing how to input match. And yes these all have the same top level signature:
$vnorm: channel_0 \times ...\times channel_i \times ... \times channel_n \rightarrow RGB$

@trygvrad
Copy link
Contributor Author

trygvrad commented Jul 11, 2024

@story645 The PR as it stands works like this:
$multivariate:\underbrace{channel_1×channel_2}_{VectorMappabe} \underbrace{→ norm_1×norm_2} _{norm\ layer} \underbrace{→RGB×RGB →RGB} _{cmap\ layer}$

$bivariate: \underbrace{channel_1×channel_2} _{VectorMappable} \underbrace{→ norm_1×norm_2} _{norm\ layer} \underbrace{→ RGB} _{cmap\ layer}$

as you say it does not support:
$\underbrace{channel_1×channel_2} _{VectorMappable} \underbrace{→ norm(channel_1,channel_2)} _{norm\ layer} \underbrace{→ RGB} _{cmap\ layer}$
with a 1D colormap, nor does it support
$\underbrace{channel_1×channel_2} _{VectorMappable} \underbrace{→ norm _1(channel_1,channel_2)×norm _2(channel_1,channel_2)} _{norm\ layer} \underbrace{→ RGB} _{cmap\ layer}$
with a 2D colormap, etc..

b. If there is a coupled norm, i.e. the normalization on axis 1 depends on the value on axis 2.

Maybe the usecase in #19515 (by @greglucas ) of choosing vmin and vmax, which I've often done that based on histograms so a linked zoom on histogram and update vmin and vmax w/ the corresponding values.

When I was talking about a coupled norm, I was thinking of norms that take two arguments, i.e. $norm(channel_1,channel_2)$, which this PR does not support right now.

If we want to introduce norms that take two arguments, we definitely need a NormAndColor kind of object to hold them, and classes for $norm(channel_1,channel_2)$ objects.
However, it is not clear to me what mathematical operation $norm(channel_1,channel_2)$ should perform.
@story645 do you have a specific use case (mathematical transform or example) in mind?

EDIT: I'll take a look at implementing a NormAndColor kind of object anyways, and then we will at least have the infrastructure to support $norm(channel_1,channel_2)$ kinds of things

@trygvrad
Copy link
Contributor Author

Re: 3. Multiple mappables with one colorbar: I'm referring to https://matplotlib.org/stable/gallery/images_contours_and_fields/multi_image.html. T.b.d. maybe this can be slightly simplified nowadays, but we still have a fundamental disconnect between the images and the colorbar.

I took a crack at implementing a NormAndColor class (name subject to change).
Using this as a container allows us to easily set up multiple plots that share a $data→norm→color$ pipeline.
https://matplotlib.org/stable/gallery/images_contours_and_fields/multi_image.html can now be simplified as follows:

import matplotlib
matplotlib.use('qtagg')
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(19680801)
data = np.linspace(0.1, 0.6, 6)[:, np.newaxis, np.newaxis] * np.random.rand(6, 10, 20)

nac = matplotlib.cm.NormAndColor()
nac.vmin = np.min(data)
nac.vmax = np.max(data)

fig, axs = plt.subplots(2, 3)
fig.suptitle('Multiple images')
for i, ax in enumerate(axs.ravel()):
    im = ax.imshow(data[i], norm = nac)
    ax.label_outer()

cb = fig.colorbar(im, ax=axs, orientation='horizontal', fraction=.1)

plt.show()

The video shows how the images couple:

multiple_images.mp4

(Additionally, in https://matplotlib.org/stable/gallery/images_contours_and_fields/multi_image.html, changing the colormap on the colorbar does not propagate to the images, but these changes propagate in the video above)

This implementation also allows us to couple ND colormaps and 1D colormaps, so that changes in one propagates to the others:

import matplotlib
matplotlib.use('qtagg')
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(19680801)
data = np.random.rand(2, 10, 20)

fig, axes = plt.subplots(1, 3, figsize=(10,3))

im_2D = axes[0].imshow(data, cmap='BiOrangeBlue')
im_0 = axes[1].imshow(data[0], norm=im_2D.nac[0])
im_1 = axes[2].imshow(data[1], norm=im_2D.nac[1])

cb = fig.colorbar(im_0)
cb = fig.colorbar(im_1)

for ax in axes:
    ax.label_outer()

plt.show()
coupled_bivar.mp4

@timhoffm Is this what you had in mind?

I had to build this in a branch further down the pipeline than this PR, because it needs some features of the plotting functions not available in this PR. If you want to take a look, the NormAndColor class can be found here: https://github.com/trygvrad/matplotlib/blob/Multivariate_NormAndColor_class/lib/matplotlib/cm.py

trygvrad added 7 commits July 26, 2024 11:06
Creation and tests for classes containing multivariate and bivariate colormaps.
Adds support for __getitem__ on colors.BivarColormap, i.e.: BivarColormap[0] and BivarColormap[1], which returns (1D) Colormap objects along the selected axes
removal of ColormapBase
Addition of _repr_png_() for MultivarColormap
addition of _get_rgba_and_mask() for Colormap to clean up __call__()
Also adds a an improvement to colors.BIvarColormap.__getitem__() so that this returns a ListedColormap object instead of a Colormap object
Also removed all set_ functions, and replaced them with with_extremes()
@jklymak
Copy link
Member

jklymak commented Jul 30, 2024

As part of your transition plan, strongly suggest sketching out what the new version of the docs are going to look like. 99.99% of folks don't want multiple data sets, norms, or colormaps so how are we going to convey this complexity without making our docs unreadable? The idea of a single norm is already hard to get across and a source of confusion.

@timhoffm
Copy link
Member

Thsnjs for the reminder on docs. I propose to think of docs as soon as we have an idea how the architecture could look like.

That said, I‘m not overly worried about docs. They won’t determine whether we can support multi-valued coloring or not. If we can come up with a reasonable architecture and API, I expect that the docs will fall in place (not to say that we should take docs lightly, but the architecture structure will guide the docs). To quote the zen: If the implementation is hard to explain, it's a bad idea. And following: If the implementation is easy to explain, it‘s easy to document.

@story645
Copy link
Member

As part of your transition plan, strongly suggest sketching out what the new version of the docs are going to look like.

Definitely, and @trygvrad has written a couple of GSOC blog posts that could seed new user guide entries:

so how are we going to convey this complexity without making our docs unreadable?

The plan is to write standalone user guide pages/tutorials/examples and cross reference rather than trying to weave this through what we already have. One of the advantages of pipeline approach is that the signature (and therefore signature docs) more or less stay the same, w/ the heavy lifting on describing what's going on being in the API docs for the specific objects and in multivariate/bivariate specific usage docs.

@trygvrad
Copy link
Contributor Author

So we have discussed how this should come in at the weekly GSOC meetings, and also at the Matplotlib weekly meetings, but I see that it has not been clearly communicated via text.
I.e. the PR summary above does not include a summary of the various steps.

As a reminder: the scope of this has changed with @timhoffm s suggestions, but the original plan looked something like this:

  1. PR 1 New colormap classes for bivariate and multivariate colormaps
  2. PR 2 (This PR) Implement cm.VectorMappable as a drop in replacement for cm.ScalarMappable (in reality cm.ScalarMappable becomes a subclass of cm.VectorMappable) [this approach was discussed at Matplotlib weekly meetings last year]
  3. PR 3 Let the relevant plotting fuctions inherit from cm.VectorMappable instead of cm.ScalarMappable to expose the new functionality. This PR would also include tests for the multivariate call signatures for the relevant plotting fuctions.
  4. PR 4 New colormaps. We need to make a set of Multivariate and Bivariate colormaps the user can choose from. This PR would include separate Choosing a Multivariate Colormap and Choosing a Bivariate Colormap pages in the docs.
  5. PR 5 Tutorials for the new functionality. I.e. an example using imshow, pcolormesh and PatchCollection (choropleth map), as well as examples on how to resample, rotate and invert bivariate colormaps, how to subclass MultivariateColormap to create a new way of combining the individual colormaps.

This brings us to a state where we can include this in a release (labeling the new part of the API as provisional).
However this does not end development, and we can keep implementing features that have been requested/suggested or generally improves the codebase.

  • 'Lighten' and 'Darken' as combination modes, as requested by @JBorrow (comment)
  • A splitting norm that separates scalar data into two colorbars as suggested by @timhoffm above (think above/below sea level on a map)
  • A 2D colormap with that maps to alpha and color as suggested by @timhoffm above
  • Colormaps that combine a bivariate colormap with a scalar luminosity. This is used in polarization microscopy, where the degree and direction of polarization is visualized by the saturation and hue, while the luminosity indicates intensity.
  • With the changes discussed above, we could rework the ImageBase._make_image() pipeline, which currently handles image rendering of the form normalize→remesh→color, remesh→normalize→color, normalize→color→remesh or simply normaize→color. This functionality naturally belongs to the new Mapper(name pending) class, and is more general than the normaize→color currently in use by cm.ScalarMappable.
    I believe we could create a single ¿remesh?→normalize→¿remesh?→color→¿remesh? pipeline on the Mapper class that would work for all methods, greatly simplifying the internals.

It is imperative that we divide the functionality into separate PRs, so that we can discuss different implementation details in isolation. We cannot have a situation where we discuss the internal pipeline in the same thread as colormaps and api. To some extent we can discuss all of these things at the same time, but we cannot do it in the same place. Furthermore, if we can keep the discussions sequential, we greatly reduce the workload.

It is of no great importance to me if this is developed on main or in a feature branch.
However, I would like to avoid a situation where design decisions need to be re-litigated in order to merge the feature branch into main.


I'm not convinced a series of PRs on the main branch is the right way to plan this out.

@jklymak
With the suggestions from @timhoffm the scope of this work has changed, and it is reasonable to reassess if a series of PRs is the right approach. The Matplotlib weekly meeting seems to me like the right place to discuss this, as we have done before.

As part of your transition plan, strongly suggest sketching out what the new version of the docs are going to look like.

It is not quite clear to me what part of the docs you want to see. There are code blocks showing how the new API is intended to be used at the top of this post, but if you want to see how this looks in more detail there are some prototype examples for the docs here and here with colormap references here and here. In addition there are the blog posts @story645 linked to previously. As for the docstrings, I am imagining minimal changes, i.e. for the norm_doc:

norm : str or `~matplotlib.colors.Normalize`, optional
    The normalization method used to scale scalar data to the [0, 1] range
    before mapping to colors using *cmap*. By default, a linear scaling is
    used, mapping the lowest value to 0 and the highest to 1.

    If given, this can be one of the following:

    - An instance of `.Normalize` or one of its subclasses
      (see :ref:`colormapnorms`).
    - A scale name, i.e. one of "linear", "log", "symlog", "logit", etc.  For a
      list of available scales, call `matplotlib.scale.get_scale_names()`.
      In that case, a suitable `.Normalize` subclass is dynamically generated
      and instantiated."""

changes to

norm : str or `~matplotlib.colors.Normalize`, optional
    The normalization method used to scale data to the [0, 1] range
    before mapping to colors using *cmap*. By default, a linear scaling is
    used, mapping the lowest value to 0 and the highest to 1.

    If given, this can be one of the following:

    - An instance of `.Normalize` or one of its subclasses
      (see :ref:`colormapnorms`).
    - A scale name, i.e. one of "linear", "log", "symlog", "logit", etc.  For a
      list of available scales, call `matplotlib.scale.get_scale_names()`.
      In that case, a suitable `.Normalize` subclass is dynamically generated
      and instantiated."""
    - For multivariate data, multiple`.Normalize` objects or strings may be 
      provided as a list. A single string is repeated into a list of appropriate 
      length.

But again, the intention was to avoid a discussion of this here, and save these details for the a different PR (or at least until we have a prototype of the Mappable(name pending) class).


[Note that with the changes @timhoffm are suggesting above, we have to merge PR 2 and 3 in the list above into a single PR. This is not ideal from a PR-review point of view, and this is something I will discuss with @story645 @ksunden at the weekly GSOC meetings, unless the circumstances change at the Matplotlib weekly meeting tomorrow.]

@story645
Copy link
Member

It is of no great importance to me if this is developed on main or in a feature branch.
However, I would like to avoid a situation where design decisions need to be re-litigated in order to merge the feature branch into main.

I very much second this concern and am glad that @trygvrad raised it. I'm supportive of doing the work on a feature branch as a way to avoid having partial functionality merged into main, but only if it's mostly a step in managing the release - meaning that the release manager (likely @ksunden or @QuLogic) would be handling this and discussion would be limited to merge specific topics.

I fully agree w/ @trygvrad that discussions on design and implementation should be happening on the PRs merging into the feature-branch and should be scoped to the specific PRs and more general concerns should be raised at the meeting/mailing-list/discourse. I also very much agree with the concern of every decision being relitigated every time as that will effectively stop work as much as any hard block - ultimately this is coming in as a provisional pipeline that we are likely to leave provisional for years (I think it took us 2+ to make subplot mosaic non-provisional) so there will be room to change and refactor, and - as I have been told many times - that will be easier to do with something in place.

@tacaswell
Copy link
Member

When touching the whole mapping infrastructure anyway, can we improve on this so that a norm can conistently serve multiple mappables?

May a weakref registry on the norm/mapper so that on autoscale the norm can go consult everyone who is using it? Can probably be done independent of this work though.

Would it be reasonable to create a longer-lived feature branch for this, to which we would do smaller PRs (I believe the smaller PRs help us structure the problem).

Based on our experience with getting mpl 2.0 out the door, I recommend against this 🤣


I have not fully caught up on the discussion (or the code), but I am still 👍🏻 on getting this in across a couple of PRs. The assumption in doing it that way is that you can split it up nicely, which assumes you understand your interfaces, which sometimes you do not actually understand until you have built the whole thing so it is not risk-free. However, one mega PR (or a long-lived development branch) has its own risks (hard to review, need to rebase, all-or-nothing) and on net I think trying 4-5 PRs is the better option.

@jklymak
Copy link
Member

jklymak commented Jul 31, 2024

on net I think trying 4-5 PRs is the better option.

Thats fine - just the amount of workshopping in this PR gave me pause. And unlike something like subplot_mosaic, this hits much of the core functionality of the library.

@story645
Copy link
Member

story645 commented Jul 31, 2024

  • just the amount of workshopping in this PR gave me pause. And unlike something like subplot_mosaic, this hits much of the core functionality of the library.

That's understandable and also this hitting so much of the core is why there's so much workshopping in the PR - we're trying to get to a rough consensus on the requirements/constraints/interface on this new pipeline so that @trygvrad can move forward on implementation such that the review of the implementation PRs can focus on:

  • given this implementation, is this a good design, which, as @tacaswell alluded to, may very well be caught at the implementation before review stage
  • If there aren't glaring design issues, the focus can be on reviewing the technical aspects of the implementation

@trygvrad
Copy link
Contributor Author

trygvrad commented Aug 1, 2024

I pushed some updates to this branch, but this is largely just because I want to see the code coverage, and this is the easiest way for me to see how the code coverage changes.

Provisional names are Colorizer instead of Mapper (in cm.py)
ColorizingArtist as the base class for image and collection (in artist.py).
and ColorizerShim to provide access from to the variables on the Colorizer from a class that subclasses both ColorizingArtist and ColorizerShim [i.e. to match the ScalarMappable interface] (in cm.py)
Note that the methods on the ColorizerShim are not used internally, and all internal use acts on Colorizer objects directly.

@trygvrad
Copy link
Contributor Author

trygvrad commented Aug 1, 2024

We discussed this in the Matplotlib weekly meeting today, and we will split this into 2 PRs:

  1. Minimal implementation of the Colorizer, ColorizingArtist and ColorizerShim classes
  2. Changes needed to support multivariate data.

I'll make a comment here once 1. is ready.

@trygvrad trygvrad force-pushed the VectorMappable-data-type-objects branch from f9b2bf0 to e271b85 Compare August 2, 2024 09:55
VectorMappable with data type objects

Creation of the VectorMappable class

correctetion to _ensure_multivariate_norm()

since colors.Normalize does not have a copy() function, we cannot easily convert a single norm to a sequence of norms.

better use of union for colormap types

Defines ColorMapType as Union[colors.Colormap, colors.BivarColormap, colors.MultivarColormap] in typing.py

Suport for multivariate colormaps in imshow, pcolor and pcolormesh

safe_masked_invalid() for data types with multiple fields

NormAndColor class

cleanup of _ImageBase._make_image()

Rename To Mapper and reorganizatino

changed name of new class from Mapper to Colorizer and removed calls to old api

changed name of new class from Mapper to Colorizer
Mapper → Colorizer
ScalarMappableShim → ColorizerShim
ColorableArtist → ColorizingArtist

also removed all internal calls using the ColorizerShim api
@trygvrad trygvrad force-pushed the VectorMappable-data-type-objects branch from e271b85 to b4af71e Compare August 2, 2024 09:58
@trygvrad trygvrad mentioned this pull request Aug 3, 2024
3 tasks
@trygvrad
Copy link
Contributor Author

trygvrad commented Aug 3, 2024

I made a new PR with a minimal implementation of the Colorizer class #28658
I would like to move further discussion of the implementation to #28658
and keep any further discussion of feature branch vs main etc. in #28428 (this thread)

@story645
Copy link
Member

story645 commented Aug 4, 2024

and keep any further discussion of feature branch vs main etc. in #28428 (this thread)

Also, just to explain why #28658 was opened on main, consensus from the call is that we're gonna try the "merging to main" approach in the stages/PRs @trygvrad outlined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment