Skip to content

MEP25 - Artist Manager Class Discussion #3424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
theengineear opened this issue Aug 27, 2014 · 24 comments
Closed

MEP25 - Artist Manager Class Discussion #3424

theengineear opened this issue Aug 27, 2014 · 24 comments
Labels
MEP: MEP25 serialization

Comments

@theengineear
Copy link

This issue thread will serve as a better space to launch a solid development branch from.

This is related to https://github.com/matplotlib/matplotlib/wiki/MEP25

The main goal of this thread is to discuss the breadth of what MEP25 needs to solve in a more timely fashion to get development running in the right direction, sooner.

@theengineear
Copy link
Author

@tacaswell ,@bryevdv , @jakevdp , @danielballan, perhaps we should move this discussion to a proper issue thread here?

(from: mpld3/mplexporter#29 (comment))

@theengineear
Copy link
Author

@tacaswell, @jakevdp (re: a grammar or not a grammar?)

If you saw that comment before I deleted it, please disregard.

Ok, so if we have this visitor pattern, it seems like it will work similarly to how mplexporter works now: "here's the information you need, translate into whatever grammar works for you".

For that, we don't need to specify an mpl grammar of sorts since there's no need to translate the info... we just need to get it.

However, I sort of think we do need to create a grammar if we need to do the following:

  • save an mpl plot as json (clearly)
  • instantiate a plot based on a json representation
    • this allows us to have (bokeh, plotly, etc) plots go the other direction to mpl
    • allows us to open the saved figure in a predictable manner

(And these things were on our todo list from SciPy)

@tacaswell
Copy link
Member

This is the relevant comment from the other thread that @theengineear is responding too


I think we are talking past each other a bit, I don't think I am looking far an uber-grammar, I am looking for a list of those classes that can output themselves to arbitrary grammars.

I am not sure that we want to shoe-horn

  • line plots (2 lists of data, maker style {shape, edge_color, face_color, edged_width, size, mark_every}, line style {color, dashing, line_width})
  • scatter plots (min 2, max 5 (x, y, size, face_color, edge_color) lists of data, color_map, normalization {function, min, max}, marker_style {...}) in to the same interface
  • histogram which has a good deal of processing involved
  • imshow where the spatial location of the data is encoded in the data structures.

all into the same interface.

I could also be way off in left field here and what I am describing is a grammar.

Disagreeing with my self above, is what we want to have a base class for the Controllers which has a visitor pattern and takes a function with the signature:

def out_put_visitor_function(semantic_label, raw_data, displayed_data, process_info, style, raw_artists):
    """
    semantic_label : str
        What mpl thinks this type of graph element is.  This might just be the name of
        the controller class
    raw_data : dict
        dict keyed on something sensible with the raw data passed in by the user, 
        can be empty if no processing is done
    displayed_data : dict
        The data that is used to draw to the screen.
        In the case of a historgram this would be the bin heights and edges, for
        contours this is the lines.  For 2D lines this would be the x, y values
    process_info : dict
        keyed on sensible strings, information about / parameters for turning raw data into
        the processed data
   style : dict
       keyed on strings naming what is being styled. 
    raw_artists : dict
        keyed on sensible strings, access to the artists managed by the Contoller
    """
    pass

And something similar for axes (to specify the non-plotting mark up (like axvline, ...), ticks, limits, title, axis labels, and such) and figures which manage the recursion.

@jakevdp
Copy link
Contributor

jakevdp commented Aug 28, 2014

Ah - good.

You're right that we will need a grammar to do the serialization, but I'm glad we agree that the interface is the important piece of this.

@bryevdv
Copy link

bryevdv commented Aug 30, 2014

ping @pzwang

@tacaswell tacaswell added this to the v1.5.x milestone Sep 6, 2014
@theengineear
Copy link
Author

Responding (from forever ago) to this from @tacaswell :

"As I told @theengineear in an email, where I think we need to start is a list of semantic plot types, the kind of data they take, and the kind of styling they know about.

Both Bokeh and plotly (and d3?) have these things already (I think?) but we need to bring those sets in to agreement. Given that mpl has not much in this regard, the other two should be sorted out before mpl also starts inventing a third (http://xkcd.com/927/)."

Here are the plot types that Plotly currently has:

  • Area (plots where the area under/between traces holds special meaning)
  • Heatmap
  • Scatter3d
  • Histogram (x data means vertical bars, y data means horizontal bars)
  • Histogram2d (x and y data with colormap for resulting z)
  • Box (box plots)
  • Bar (horizontal or vertical bar charts)
  • Scatter (currently this covers all bubble charts, scatter, line, and polar, but this is arguably too broad)
  • Surface (3d surface)
  • Histogram2dContour (contour plot for histogram2d)
  • Contour

I'll start with the data spec? The style is bound to get hairy... in general, in Plotly we get around semantic differences between style and data by saying arrays ['red', 'white', 'blue'] || ['orange', 'purple'] || ['yellow'] all constitute data whereas 'fuchsia' would be considered style (for a color field or something).

@theengineear
Copy link
Author

I'll use the following format to explain each plot type:

Plot Type

Data Signatures
  • Required/positional arguments:
    • list of required arguments
  • Defaulted/keyword arguments:
    • list of optional keyword arguments
Styling Keywords
  • list of styles
  • that are appropriate
  • for this plot type

Scatter

Data Signatures
  • Positional/required arguments:
    • y, [iterable]
  • Keyword/defaulted arguments:
    • x=None, [iterable]
    • marker_color=None, [string, iterable]
    • marker_shape=None, [string, iterable]
    • marker_size=None, [number, iterable]
    • marker_face_color=None, [string, iterable]
    • marker_edge_color=None, [string, iterable]
    • line_color=None, [string, iterable]
    • line_width=None, [number, iterable]
    • alpha=None, [number, iterable]
Styling Keywords
  • TBD

Bar

Data Signatures
  • Positional/required arguments:
    • y, [iterable]
  • Keyword/defaulted arguments:
    • x=None, [iterable]
Styling Keywords
  • TBD

@tacaswell
Copy link
Member

@theengineear Yes, that is along the lines of what I had in mind. but I wouldn't worry too much about what artists will be involved as that feels like a detail that each library has to deal with. I would have expected scatter to take a large number of iterables (x, y, marker_shape, marker_face_color, marker_edge_color, marker_size, line_color, line_width, marker_orientation(?), alpha?)

I fully agree that plotly and bokeh should come as closely in line with each other as possible before mpl invents schema n+1.

That said, I had a chat with @pzwang last night and am no longer sure how I was thinking about it was entirely right (mostly still trying to work through how to map what I have in my head to the firehose of information from Peter). One thing I had not fully apprecitated before was the importance of being able to vectorize plotting commands automaticall (for example http://bokeh.pydata.org/docs/gallery/periodic.html).

One thing that has crystallized is that there are really three layers to all of these plotting libraries, the user-facing API, the raster engine-facing API, and a middle layer (which is currently a bit thin in mpl). It is at the middle layer where I think all of the inter-library communication will have to happen. All of the user-facing APIs are just schemes to generate the (serializable!) object model and the raster-engine facing side turns that object model into colored pixels/vectors. To do interactive plots you now just have to manipulate the object model and have the right call-backs in place to update the render-facing side. All of the brains of plot types the require data manipulation in the plotting library (various statistical plots) live in the middle layer.

I think what I have been calling managers/controllers is this middle layer, as is the DOM
@WeatherGod was talking about at scipy.

It is my understanding that both plotly and bokeh already have these three layers pretty well separated (due to having to ship json documents over the wire).

This might end up being a good application of atom, trait(lets) or other meta-class magic on the mpl side to keep the artist level objects in-sync with the middle-layer objects.

@theengineear
Copy link
Author

@tacaswell , I'll try and be consistent about updating the original spec above (seems silly to repeat for changes, unless the revision history will be helpful)

Also, I'm not quite sure how we've diverged from the original thought. Maybe if I write down my perspective on it, we can at least converge to the same page before continuing to spec things out.

Using MVC terminology:

We'd like to make a revision that inserts a high-level, semantic-loving layer into the existing model (the plethora of artists with their accompanying data). This is our controller that we'd like users to interact with.

Here's the generic layout for reference:
[user] -> [controller] <-> [model] <-> [view]

Option one, allow the controller to attach elucidating information to the current artists
[user] -> [semantic-loving-controller] <-> [matplotlib-artists] <-> [view]

Option two, allow the controller to update two objects, one storing the new semantics
[user] -> [semantic-loving-controller] <-> [semantics-object and matplotlib-artists] <-> [view]

In option one, we'd be adding things like raw histogram data to an existing artist, which I think is actually pretty sloppy, so don't think we'll do this.

In option two, we'd be creating a new model object, this would know what all of the pieces of the plot mean. So, it can tell an object it's exporting to things like, "this artist is actually error bars, not just a scatter". Or, "this marker collection and this Line2D represent a single hunk of data, not two".

When we want to export this information, the semantics-object is able to present each artist with a wealth of context to whoever is asking. E.g., "here are some bars, they're actually a histogram with this binning, and this original input data set".

By having a separate object in our model that provides all the missing context libraries like plotly or bokeh need, we can also provide and 'off' switch so that the semantic-loving-controller doesn't create the additional semantics-object in the first place, representing how things currently run.

@theengineear
Copy link
Author

(also, please correct me if i made gross errors above)

@tacaswell
Copy link
Member

I like model2, but with a slight modification:

[user] -> [semantic controller] <-> [semantics-object] -> [mpl artists] -> [view (s)]

That is the artists, are slaves to the semantics objects so the controller only has to deal with one set of objects. Under the model advocated by @pzwang the objects layer would be (more-or-less) the json representation of the figure.

@theengineear
Copy link
Author

Awesome, getting somewhere! I'm a little hung up thinking about this (possibly contrived example):

You make a figure (thus loading up the semantics-objet and the mpl artists), but what happens when you visualize and/or save the figure? I was under the impression that this had an effect on the values of the artists (many of these values plotly, likely bokeh as well, currently leverages). If the mpl artists were a slave to the semantics-object, would this work by rewiring to allow the semantics-object to be updated instead of the artists directly (subsequently letting these newly set values change the artists)?

This is definitely getting too close to thinking about implementation for this stage of the conversation, but this seems like it will matter here.

Beyond this, it sounds like we need, at minimum, a set of semantic objects that we agree on. I took a stab at this above already using plotly's object names: #3424 (comment).

I can happily supply all the keys used by plotly to describe these objects, but that would really blow up this thread :).

@tacaswell
Copy link
Member

Yes, that is exactly the idea, no user or code should directly talk to the the artists other than mediated by the [sematic-objects] layer. If this works out like it is is in my head, the [sematic-object] layer is the json representation plotly/bokeh needs.

cc @mdboom @efiring

@bryevdv
Copy link

bryevdv commented Oct 20, 2014

Hey guys I am traveling and out of pocket most of this week, but in the mean time

cc @damianavila and @pzwang

@tacaswell tacaswell added the MEP: MEP25 serialization label Nov 26, 2014
@theengineear
Copy link
Author

@tacaswell and @bryevdv , just giving you a heads up that I'm jumping back into this. I would love some guidance and a chance to catch up on the status of this project.

@tacaswell tacaswell modified the milestones: 1.5.0, v1.5.x Feb 16, 2015
@tacaswell
Copy link
Member

@theengineear Have a look at #3944 where I have started (but not made a whole lot of progress on) to improve the existing notion of Containers in mpl. I now think this is the path of least resistance to getting a sane visitor pattern + semantic grouping implemented. I think sub-classing Container provides many of the benefits of what I had been talking about with out adding too much more complexity.

There is a seperate, but related discussion about larger scale refactoring of the library (#3587). This is relevant as there may be some plotting functions which return Containers which do not currently. I think we could implement this in a backwards compatible way by putting the new functions in their own module and then using that function in the axes method, extracting the artists and returning them to match the existing API.

There will have to be some discussion about how far we want to preserve back-compatibility on people reaching in and touching the lines, patches, etc attributes on Axes objects to remove artists rather than using the remove method on the artists.

Also see #4014 for how to get to existing artist tree-walking code.

@pzwang
Copy link

pzwang commented Feb 16, 2015

Yay! FWIW, I had a nice chat with @tacaswell at Strata NYC a few months ago and although I forget the exact details, I think that he was interested in thinking about how a JSON intermediate layer could be introduced in MPL which then Bokeh (and others) could target.

I looked at #3587; what is the fastest way to get up to speed on the current "tip" of the conversation? A particular mailing list thread, or...?

@bryevdv
Copy link

bryevdv commented Feb 16, 2015

I am pretty swamped this week but I will try to take a more careful look at it when I can. I will say that one thing that discouraged me a bit previously about the conversation in this issue, at least my first impression, is that the discussion seemed to be starting from a different place than I would. I believe the most important thing, and the best chance for interop between the 3-4 (more?) libraries is to start at the lowest ands simplest level, and surface a capability to coordinate basic, static graphical primitives, completely independently of any semantic information, metadata, or controlling. To be clear, I'm not saying that level of coordination is not important or desirable, I'm saying I wonder whether we can have a chance to succeed by starting there.

@tacaswell
Copy link
Member

@pzwang My current thinking of the critical path is:

  1. get visitor pattern sorted out
  2. get the ability to dump/return our figures to any serialization format other than pickle
  3. sort out how to map that onto bokeh data model (this may be nasty json <-> json transforms or something nicer)

Unfortunately, this discussion is smeared across a number of locations / in person conversations / in my head.

@tacaswell
Copy link
Member

I think that someone from yhat/ggplot should at least be aware of this discussion.

cc @glamp

@theengineear
Copy link
Author

@tacaswell thanks for the info. A bunch to digest! I'll start poking around asap!

@theengineear
Copy link
Author

@tacaswell , I've looked at the PRs you sent and I've been playing around with models for the visitor pattern. I'm going to try and summarize some of my thoughts so far to try and be on the same page. This is fairly abstracted away from actual implementation, I hope that's alright for this stage in the process.

Visitor Pattern vs Serializer

Before the idea of changing the structure of an MPL figure was on the table (#3944, if i'm understanding right), I thought the visitor pattern was the way to go. However, I'm now thinking that putting effort into making the structure of artists and their children more predictable has basically the same effect.

To my understanding, the visitor pattern would be a way for some exporter to pass through a figure, collecting information it wants to. If the structure is highly predicable though, this seems more and more trivial? No?

Given that, I was tinkering around with the idea of just defining serializers for each artist.

Ideal structure

Again, if I'm seeing this right, you're trying to leverage containers into their own artists so that changes there link into the complicated inter-artist interactions? At any rate, is this the sort of structure that will exist?

Figure:
    artists:
        Patch:
            artists: ..
            attr1: ..
            attr2: ..
            so on ..
         Axes:
            artists: ..
            attr1: ..
            attr2: ..
            so on ..
    attr1: ..
    attr2: ..
    so on ..

If that's what we're shooting for, then our serialization structure could simply be:

{
    "name": "Figure",
    "children": [],
    "context": {}
}

name is the name of the class, children is an array of the serialized artists for the parent class, and context is anything else about the object that is not an artist.

Thoughts on a serializer

In trying to not add more code to the artists, I thought we could define a mapping for each artist. Such that given an artist name aname, you get back a serializer.

serializer = artist_name_to_serializer.get(name, None)

Artists without serializers are forgotten, which will make sense as we develop.

I'm anticipating some fairly complicated logic for handling all the different ways to serialize an artist's context. This is going to bog down artist code so it would be nice separation to have it in a different serializer.

Additional notes

The idea of a serializer might go against the controller class that we've been talking about, which still seems like a lofty goal to me. Again, I just want to pick the path of least resistance to get our desired goals.

@tacaswell tacaswell modified the milestones: next point release, proposed next point release Jul 17, 2015
@tacaswell tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Oct 3, 2017
@bryevdv
Copy link

bryevdv commented Jan 17, 2019

Of the four outside projects that were originally interested in this work, two are now completely unmaintained. With no work/discussion in 4 years, I propose that this issue be closed and the MEP rejected.

@tacaswell tacaswell removed this from the needs sorting milestone Jan 21, 2019
@tacaswell
Copy link
Member

This is still an important thread of work, but I agree that this attempt at it is at a dead-end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MEP: MEP25 serialization
Projects
None yet
Development

No branches or pull requests

5 participants