Skip to content

[WIP] New __repr__ and/or pretty printing of estimators #7618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from

Conversation

amueller
Copy link
Member

@amueller amueller commented Oct 8, 2016

This adds some spice to the good old __repr__. Fixes #6323.

Interestingly Jupyter used __repr__ not __str__ so I needed to add _pretty_repr_ to call __str__ ....

screenshot from 2016-10-08 19-05-27

Please compare Out[2] and Out[5] from a beginners perspective. I wish I had done this before the book lol. Out[2] is pretty much "wtf" while Out[5] makes obvious sense!

This also adds a partial implementation of get_n_features which would be an addition to the API. This probably shouldn't live in BaseEstimator but in the various mixins. In this PR, not all estimators (but a pretty good chunk!) have get_n_features. Adding the rest should be fairly straight-forward.

The main reason I want to add get_n_features is that it also allows us to check whether an estimator was fitted. I feel that's an important part of the string representation (and currently implicit in the existence of the n_features line).

If we want this, we might want to ask some designer to make this more pretty ;)

@amueller
Copy link
Member Author

amueller commented Oct 8, 2016

screenshot from 2016-10-08 19-46-15

@jnothman
Copy link
Member

jnothman commented Oct 8, 2016

Please compare Out[2] and Out[5] from a beginners perspective. I wish I had done this before the book lol. Out[2] is pretty much "wtf" while Out[5] makes obvious sense!

But Out[2] encourages the user to familiarise themselves with what else is configurable.

Yes, this is pretty, though.

@amueller
Copy link
Member Author

amueller commented Oct 8, 2016

@jnothman they can do tab-complete for that, though ;) -- or rather shift-tab in jupyter.

@amueller
Copy link
Member Author

amueller commented Oct 9, 2016

I'm wondering whether I should split this into to PRs, one for parameters and one for info

@lesshaste
Copy link

Out[2] is helpful for newbies who might not even be aware that these other parameters exist or how to list the defaults. For example if they have copied and pasted the code. I think it's just a question of what you get by default. +1 for showing if an estimator has been fitted.

@amueller
Copy link
Member Author

amueller commented Oct 9, 2016

How about we restrict this to _repr_pretty_ and merge and see what people think? We can always go more fancy with html etc later?

@jnothman
Copy link
Member

jnothman commented Oct 9, 2016

I'm happy to have these sorts of changes and advertise them as
experimental. All the cool kids are doing that these days.

On 10 October 2016 at 08:34, Andreas Mueller notifications@github.com
wrote:

How about we restrict this to repr_pretty and merge and see what people
think? We can always go more fancy with html etc later?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7618 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61CB7YeUH91RooAlGfTvh1HBCvpXks5qyV3dgaJpZM4KR2ap
.

@amueller
Copy link
Member Author

@amueller amueller changed the title RFC: New __repr__ and __str__ [MRG] New __repr__ and __str__ Oct 24, 2016
@amueller
Copy link
Member Author

I simplified a bit and got rid of __str__. Now the additional info is only in _repr_pretty_. This is a non-intrusive change and I think it would be great if we could try it.

@amueller
Copy link
Member Author

I added a warning in the docstring. We could be more aggressive and raise a warning on every call, though that feels annoying. Or we could put it into an sklearn.experimental submodule and raise a warning on importing the module? Though warnings are a bit annoying - having an experimental submodule would probably be enough.

Using an experimental module could be interesting to move forward on some API issues ... or it could totally screw us over.

@amueller amueller added this to the 0.19 milestone Oct 24, 2016
@amueller amueller added the API label Oct 24, 2016
sklearn/base.py Outdated
offset=len(class_name),),)

def _repr_pretty_(self, p, cycle):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the point of making a distinction between __repr__ and _repr_pretty is to only affect interactive use (and not, say, library code and log files), I think there should be a comment here to that effect. I suspect there will be some surprise for users related to this distinction; do we care? Do we wish to provide a public function that produces the "info" repr regardless of context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we do that? I removed __str__ to get as little "side effects" as possible. We could a (private?) method _info?

sklearn/base.py Outdated
offset=len(class_name),),)

def _repr_pretty_(self, p, cycle):
my_repr = self.__repr__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not repr(self)?

@amueller
Copy link
Member Author

@GaelVaroquaux wdyt?

@rgbkrk
Copy link
Contributor

rgbkrk commented Jan 10, 2017

However, there is no plugin system for custom transforms in nteract yet, so one would need to run nteract from source and add the transform

That or add the transform directly to nteract in a PR. 😉

@amueller
Copy link
Member Author

So we are usually fairly conservative with our version requirements, so I'm not sure which options are viable. Though having a cutting-edge feature that only works on new jupyter doesn't seem a bit issue.

I think options are:

  • write html standalone, use everywhere
  • have a html standalone as fallback, use a fancy font-end for notebook and jupyter lab (90% usecase imho)
  • use a fancy font-end for notebook and jupyter lab, use standard string repr as fallback.

In terms of fancy fontends, I think widgets or a notebook + jupyter lab JS plugin are the best options.
For the plugin it would probably be easiest to use the mime types.

I have to admit that I really mostly care about the notebook and jupyter lab, and I don't mind that much if other font-ends don't get the benefit as long as they some working repr. Once nteract gets a plugin system they can get nice reprs - or they built it in.

Clearly the plugin thing is a bigger thing than "just" shipping some html, so the question is what the costs and benefits are.
I am currently on the side of the more powerful framework. I want to click on the grid-search and have a visualization of the results show. I think we will very quickly get to the limits of what's possible with HTML/CSS.

Another interesting question: would the plugins live in scikit-learn or a separate package?
It would be possible to only have the data generating part in scikit-learn and the plugins as an independent optional package that does the rendering in the different font-ends.
That means less (or no) JS in scikit-learn, but a tight coupling of versions between sklearn and the renderer package.

@rgbkrk
Copy link
Contributor

rgbkrk commented Jan 10, 2017

I want to click on the grid-search and have a visualization of the results show. I think we will very quickly get to the limits of what's possible with HTML/CSS.

That helps me see what you're looking for in a visualization.

@amueller
Copy link
Member Author

@rgbkrk To be clear, that was an example of how far we could potentially go, not what I would expect from a first version. My argument was more "I want to have a road towards doing something 'crazy' like this".

@betatim
Copy link
Member

betatim commented Jan 10, 2017

I like the idea of providing a custom mime type. That way users get a text representation if on a terminal, HTML in a browser and some super advanced display if in a frontend that knows what it is doing. And you can build up towards it.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jan 11, 2017 via email

@rgbkrk
Copy link
Contributor

rgbkrk commented Jan 11, 2017

To piggy back on this, perhaps taking a step back to put together only the nested static content (markup) for _repr_html_ is what to aim for now, while aiming for bigger improvements as a separate project that uses the scikit learn APIs to provide alternate views.

@amueller
Copy link
Member Author

@GaelVaroquaux I think being able to understand the whole codebase is a good goal, and I agree that it's hard for us to debug JS right now, because most of us are no experts.

The custom mime type would not require any JS in scikit-learn, though. It would "only" require embedding the required data in scikit-learn. If we don't do that, it will be very hard for an outside library to provide this kind of functionality.

That would create some dependencies, though they would be lighter than the outside library trying to do the introspection themselves.
We could also be more "generic" and not implement a MIME type but instead implement some method to get the "interesting" aspects that we would want to report to the user in some way.

If this discussion is blocking us I'm happy for us to go ahead with HTML and see where it goes. I'm crazy busy right now :-/

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jan 17, 2017 via email

@gnestor
Copy link

gnestor commented Jan 17, 2017

Technically, to support a custom mime type that can be rendered in Jupyter Notebook, Jupyterlab, etc., scikit-learn need only depend on IPython. For example:

from IPython.display import display
bundle = {
    'application/vnd.scikit.learn+json': data.to_json(),
    'application/json': data.to_json(),
    'text/html': data.to_html(),
    'text/plain': data.to_string()
}
display(bundle, raw=True)

On the scikit-learn side, you only need to settle on a custom mimetype (or set of them) and display the data using it. In this example, I provide fallback mimetypes, so if the user doesn't have an extension installed to render your custom mimetype, it can still be rendered as JSON, HTML, or text. Given that, I suggest that you just work from top-to-bottom, meaning provide a text output, then work on providing an HTML output, then JSON, then create a custom mimetype and mimerender extension if necessary. If all you need is to display static HTML, then providing that via the text/html mimetype will suffice.

@amueller
Copy link
Member Author

We are talking about javascript dependencies, right? I am freaked out about javascript dependencies. It's a world with a terrible backward compatibility and dependency culture (left pad!).

Yeah we should avoid those. Though I was talking more about the rendering library possibly having a very strict version dependency on scikit-learn, which will define the mime-type.

Ok so let's do html first and maybe custom mime type later.

What's the opinion on limiting the default repr/str to exclude unchanged parameters?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jan 18, 2017 via email

@amueller
Copy link
Member Author

@GaelVaroquaux I meant in the actual __repr__ and __str__ as they appear in the doctests

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jan 18, 2017 via email

@jnothman
Copy link
Member

jnothman commented Jan 18, 2017 via email

@jnothman
Copy link
Member

jnothman commented Feb 7, 2017

I think we should start with a limited PR basically where this one started: change repr to only show diff from default params, perhaps with a config switch.

@jnothman
Copy link
Member

jnothman commented Feb 7, 2017

However I am a bit concerned that we will break doctests everywhere for things derived from BaseEstimator without a config switch.

@amueller
Copy link
Member Author

amueller commented Jun 7, 2017

I think we should start with a limited PR basically where this one started: change repr to only show diff from default params, perhaps with a config switch.

I totally agree and I'm happy to do that. @GaelVaroquaux I think wanted to limit this to jupyter, I think it would be useful anywhere. In particular because we currently are likely to break people's (and our) doctests whenever we add a parameter anywhere.

@amueller
Copy link
Member Author

amueller commented Jun 7, 2017

@jnothman hm so the config switch would be set to the current behavior by default? I'd like to change that at some point, but we don't really have a way to deprecate that, right? We could set the config switch to something that raises a warning when a repr is printed and forced the user to set the switch. That's pretty annoying, though (unless we have an .rc file, which .... would be a whole other can of worms)

@jnothman
Copy link
Member

jnothman commented Jun 7, 2017 via email

@amueller
Copy link
Member Author

amueller commented Jun 7, 2017

So add the flag now, and schedule a change, add a prominent warning to the whatsnew now, and then again when we change it?

@jnothman
Copy link
Member

jnothman commented Jun 7, 2017 via email

@amueller
Copy link
Member Author

amueller commented Jun 7, 2017

Done in #9039

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants