[WIP] New repr and/or pretty printing of estimators #7618

amueller · 2016-10-08T23:05:49Z

This adds some spice to the good old __repr__. Fixes #6323.

Interestingly Jupyter used __repr__ not __str__ so I needed to add _pretty_repr_ to call __str__ ....

Please compare Out[2] and Out[5] from a beginners perspective. I wish I had done this before the book lol. Out[2] is pretty much "wtf" while Out[5] makes obvious sense!

This also adds a partial implementation of get_n_features which would be an addition to the API. This probably shouldn't live in BaseEstimator but in the various mixins. In this PR, not all estimators (but a pretty good chunk!) have get_n_features. Adding the rest should be fairly straight-forward.

The main reason I want to add get_n_features is that it also allows us to check whether an estimator was fitted. I feel that's an important part of the string representation (and currently implicit in the existence of the n_features line).

If we want this, we might want to ask some designer to make this more pretty ;)

amueller · 2016-10-08T23:41:33Z

jnothman · 2016-10-08T23:55:38Z

Please compare Out[2] and Out[5] from a beginners perspective. I wish I had done this before the book lol. Out[2] is pretty much "wtf" while Out[5] makes obvious sense!

But Out[2] encourages the user to familiarise themselves with what else is configurable.

Yes, this is pretty, though.

amueller · 2016-10-08T23:57:03Z

@jnothman they can do tab-complete for that, though ;) -- or rather shift-tab in jupyter.

amueller · 2016-10-09T00:01:24Z

I'm wondering whether I should split this into to PRs, one for parameters and one for info

lesshaste · 2016-10-09T06:25:33Z

Out[2] is helpful for newbies who might not even be aware that these other parameters exist or how to list the defaults. For example if they have copied and pasted the code. I think it's just a question of what you get by default. +1 for showing if an estimator has been fitted.

amueller · 2016-10-09T21:34:20Z

How about we restrict this to _repr_pretty_ and merge and see what people think? We can always go more fancy with html etc later?

jnothman · 2016-10-09T22:22:15Z

I'm happy to have these sorts of changes and advertise them as
experimental. All the cool kids are doing that these days.

On 10 October 2016 at 08:34, Andreas Mueller notifications@github.com
wrote:

How about we restrict this to repr_pretty and merge and see what people
think? We can always go more fancy with html etc later?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7618 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61CB7YeUH91RooAlGfTvh1HBCvpXks5qyV3dgaJpZM4KR2ap
.

amueller · 2016-10-10T19:25:10Z

@GaelVaroquaux @ogrisel @agramfort ?

… add anything. ``_repr_pretty_`` for ipython remains

amueller · 2016-10-24T18:53:21Z

I simplified a bit and got rid of __str__. Now the additional info is only in _repr_pretty_. This is a non-intrusive change and I think it would be great if we could try it.

amueller · 2016-10-24T19:06:16Z

I added a warning in the docstring. We could be more aggressive and raise a warning on every call, though that feels annoying. Or we could put it into an sklearn.experimental submodule and raise a warning on importing the module? Though warnings are a bit annoying - having an experimental submodule would probably be enough.

Using an experimental module could be interesting to move forward on some API issues ... or it could totally screw us over.

jnothman · 2016-10-26T22:33:37Z

sklearn/base.py

                                               offset=len(class_name),),)

+    def _repr_pretty_(self, p, cycle):


If the point of making a distinction between __repr__ and _repr_pretty is to only affect interactive use (and not, say, library code and log files), I think there should be a comment here to that effect. I suspect there will be some surprise for users related to this distinction; do we care? Do we wish to provide a public function that produces the "info" repr regardless of context?

How would we do that? I removed __str__ to get as little "side effects" as possible. We could a (private?) method _info?

jnothman · 2016-10-26T22:33:49Z

sklearn/base.py

                                               offset=len(class_name),),)

+    def _repr_pretty_(self, p, cycle):
+        my_repr = self.__repr__()


not repr(self)?

amueller · 2016-11-22T17:49:41Z

@GaelVaroquaux wdyt?

rgbkrk · 2017-01-10T18:29:46Z

However, there is no plugin system for custom transforms in nteract yet, so one would need to run nteract from source and add the transform

That or add the transform directly to nteract in a PR. 😉

amueller · 2017-01-10T18:49:13Z

So we are usually fairly conservative with our version requirements, so I'm not sure which options are viable. Though having a cutting-edge feature that only works on new jupyter doesn't seem a bit issue.

I think options are:

write html standalone, use everywhere
have a html standalone as fallback, use a fancy font-end for notebook and jupyter lab (90% usecase imho)
use a fancy font-end for notebook and jupyter lab, use standard string repr as fallback.

In terms of fancy fontends, I think widgets or a notebook + jupyter lab JS plugin are the best options.
For the plugin it would probably be easiest to use the mime types.

I have to admit that I really mostly care about the notebook and jupyter lab, and I don't mind that much if other font-ends don't get the benefit as long as they some working repr. Once nteract gets a plugin system they can get nice reprs - or they built it in.

Clearly the plugin thing is a bigger thing than "just" shipping some html, so the question is what the costs and benefits are.
I am currently on the side of the more powerful framework. I want to click on the grid-search and have a visualization of the results show. I think we will very quickly get to the limits of what's possible with HTML/CSS.

Another interesting question: would the plugins live in scikit-learn or a separate package?
It would be possible to only have the data generating part in scikit-learn and the plugins as an independent optional package that does the rendering in the different font-ends.
That means less (or no) JS in scikit-learn, but a tight coupling of versions between sklearn and the renderer package.

rgbkrk · 2017-01-10T20:43:40Z

I want to click on the grid-search and have a visualization of the results show. I think we will very quickly get to the limits of what's possible with HTML/CSS.

That helps me see what you're looking for in a visualization.

amueller · 2017-01-10T21:46:26Z

@rgbkrk To be clear, that was an example of how far we could potentially go, not what I would expect from a first version. My argument was more "I want to have a road towards doing something 'crazy' like this".

betatim · 2017-01-10T22:46:13Z

I like the idea of providing a custom mime type. That way users get a text representation if on a terminal, HTML in a browser and some super advanced display if in a frontend that knows what it is doing. And you can build up towards it.

GaelVaroquaux · 2017-01-11T22:42:03Z

I want to click on the grid-search and have a visualization of the results show. I think we will very quickly get to the limits of what's possible with HTML/CSS.

I am very worried about this line of thoughts: it requires strong expertise in javascript. One of my line of conduct in designing software projects in the last few years has been to try to limit the amount of expertise required to master a codebase. This has been a guiding line for me to define project boundaries. One reason is that the wider the expertise required, the harder it is to find people that understand the whole project and hence can debug and maintain it. Even with the little amount of jquery that we have, in the documentation, I am myself often struggling to debug the problems. And they are not mission critical. I don't think that I'd like to see much javascript appear in scikit-learn. It's not our DNA. We are experts in numerics, and APIs for statistical learning.

rgbkrk · 2017-01-11T23:59:03Z

To piggy back on this, perhaps taking a step back to put together only the nested static content (markup) for _repr_html_ is what to aim for now, while aiming for bigger improvements as a separate project that uses the scikit learn APIs to provide alternate views.

amueller · 2017-01-17T18:20:51Z

@GaelVaroquaux I think being able to understand the whole codebase is a good goal, and I agree that it's hard for us to debug JS right now, because most of us are no experts.

The custom mime type would not require any JS in scikit-learn, though. It would "only" require embedding the required data in scikit-learn. If we don't do that, it will be very hard for an outside library to provide this kind of functionality.

That would create some dependencies, though they would be lighter than the outside library trying to do the introspection themselves.
We could also be more "generic" and not implement a MIME type but instead implement some method to get the "interesting" aspects that we would want to report to the user in some way.

If this discussion is blocking us I'm happy for us to go ahead with HTML and see where it goes. I'm crazy busy right now :-/

GaelVaroquaux · 2017-01-17T18:24:57Z

The custom mime type would not require any JS in scikit-learn, though.

Custom mime type are a good idea. I like them.

That would create some dependencies, though they would be lighter than the outside library trying to do the introspection themselves.

We are talking about javascript dependencies, right? I am freaked out about javascript dependencies. It's a world with a terrible backward compatibility and dependency culture (left pad!).

We could also be more "generic" and not implement a MIME type but instead implement some method to get the "interesting" aspects that we would want to report to the user in some way.

Sounds good!

If this discussion is blocking us I'm happy for us to go ahead with HTML and see where it goes.

I would suggest to do that. It is not committing us much.

gnestor · 2017-01-17T19:08:37Z

Technically, to support a custom mime type that can be rendered in Jupyter Notebook, Jupyterlab, etc., scikit-learn need only depend on IPython. For example:

from IPython.display import display
bundle = {
    'application/vnd.scikit.learn+json': data.to_json(),
    'application/json': data.to_json(),
    'text/html': data.to_html(),
    'text/plain': data.to_string()
}
display(bundle, raw=True)

On the scikit-learn side, you only need to settle on a custom mimetype (or set of them) and display the data using it. In this example, I provide fallback mimetypes, so if the user doesn't have an extension installed to render your custom mimetype, it can still be rendered as JSON, HTML, or text. Given that, I suggest that you just work from top-to-bottom, meaning provide a text output, then work on providing an HTML output, then JSON, then create a custom mimetype and mimerender extension if necessary. If all you need is to display static HTML, then providing that via the text/html mimetype will suffice.

amueller · 2017-01-18T17:46:39Z

We are talking about javascript dependencies, right? I am freaked out about javascript dependencies. It's a world with a terrible backward compatibility and dependency culture (left pad!).

Yeah we should avoid those. Though I was talking more about the rendering library possibly having a very strict version dependency on scikit-learn, which will define the mime-type.

Ok so let's do html first and maybe custom mime type later.

What's the opinion on limiting the default repr/str to exclude unchanged parameters?

GaelVaroquaux · 2017-01-18T17:48:42Z

What's the opinion on limiting the default repr/str to exclude unchanged parameters?

I think that I'd rather have them at the end, smaller and with a lighter color.

amueller · 2017-01-18T17:49:31Z

@GaelVaroquaux I meant in the actual __repr__ and __str__ as they appear in the doctests

GaelVaroquaux · 2017-01-18T17:51:33Z

I don't know. There are pros and cons. The cons being that it decreases the discoverability of arguments. On the other hand, clutter can make these unreadable and people should be looking at docs/docstrings. But they are not. What do other people think?

jnothman · 2017-01-18T21:09:50Z

I've changed my position. I think we should hide non-default params by default. The main reason is that while showing them helps discoverability, with the number of params most of our estimators have, in practice the entire string just gets ignored.

…

On 19 January 2017 at 04:51, Gael Varoquaux ***@***.***> wrote: I don't know. There are pros and cons. The cons being that it decreases the discoverability of arguments. On the other hand, clutter can make these unreadable and people should be looking at docs/docstrings. But they are not. What do other people think? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7618 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66xW_SKD5tChyDtb2NX6Ryx5ct5zks5rTlEmgaJpZM4KR2ap> .

jnothman · 2017-02-07T02:53:01Z

I think we should start with a limited PR basically where this one started: change repr to only show diff from default params, perhaps with a config switch.

jnothman · 2017-02-07T02:53:38Z

However I am a bit concerned that we will break doctests everywhere for things derived from BaseEstimator without a config switch.

amueller · 2017-06-07T11:17:50Z

I think we should start with a limited PR basically where this one started: change repr to only show diff from default params, perhaps with a config switch.

I totally agree and I'm happy to do that. @GaelVaroquaux I think wanted to limit this to jupyter, I think it would be useful anywhere. In particular because we currently are likely to break people's (and our) doctests whenever we add a parameter anywhere.

amueller · 2017-06-07T11:37:22Z

@jnothman hm so the config switch would be set to the current behavior by default? I'd like to change that at some point, but we don't really have a way to deprecate that, right? We could set the config switch to something that raises a warning when a repr is printed and forced the user to set the switch. That's pretty annoying, though (unless we have an .rc file, which .... would be a whole other can of worms)

jnothman · 2017-06-07T12:00:04Z

The default behaviour can just be changed by a prominent warning in what's new if need be. Or by releasing v1.0.

…

On 7 June 2017 at 21:37, Andreas Mueller ***@***.***> wrote: @jnothman <https://github.com/jnothman> hm so the config switch would be set to the current behavior by default? I'd like to change that at some point, but we don't really have a way to deprecate that, right? We could set the config switch to something that raises a warning when a repr is printed and forced the user to set the switch. That's pretty annoying, though (unless we have an .rc file, which .... would be a whole other can of worms) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7618 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64Jd5qd_v_slxKp2bUC6COU1KtDsks5sBot0gaJpZM4KR2ap> .

amueller · 2017-06-07T12:02:48Z

So add the flag now, and schedule a change, add a prominent warning to the whatsnew now, and then again when we change it?

jnothman · 2017-06-07T12:17:19Z

Something like that.

…

On 7 June 2017 at 22:02, Andreas Mueller ***@***.***> wrote: So add the flag now, and schedule a change, add a prominent warning to the whatsnew now, and then again when we change it? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7618 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65T9VaCWgoL0LBgFeuQDQd3hRUNuks5sBpFqgaJpZM4KR2ap> .

amueller · 2017-06-07T17:12:15Z

Done in #9039

amueller mentioned this pull request Oct 13, 2016

[WIP] Sample weight consistency #5515

Closed

amueller mentioned this pull request Oct 24, 2016

Random search with classifier wrapper sometimes doesn't set the right parameters and errors #7740

Closed

amueller added 9 commits October 24, 2016 14:48

initial draft of new __repr__

b0d252b

add get_n_features

9029f11

add some implementations of get_n_features

6e90a72

minimal smoke test

a8be02b

_fit_X can be None

34ac714

docstring and return value for set_print

c4883c5

change test for misleading ValueError in OneClassSVM

3054824

don't validate parameters in __init__

2d122ef

simplified a bit and removed __str__ as it doesn't really seem to…

1fc4caa

… add anything. ``_repr_pretty_`` for ipython remains

amueller force-pushed the more_repr branch from a020223 to 1fc4caa Compare October 24, 2016 18:50

amueller changed the title ~~RFC: New __repr__ and __str__~~ [MRG] New __repr__ and __str__ Oct 24, 2016

add a warning to the docstring

50e3e5e

amueller added this to the 0.19 milestone Oct 24, 2016

amueller added the API label Oct 24, 2016

jnothman reviewed Oct 26, 2016

View reviewed changes

fixed doctest on gridsearch, added comment for _repr_pretty.

39d8ef2

amueller mentioned this pull request Nov 23, 2016

Feature/upload flow openml/openml-python#167

Merged

kmike mentioned this pull request Apr 4, 2017

Add "ve" to stopwords #8687

Closed

jnothman mentioned this pull request Jun 7, 2017

[MRG+1] Instance level common tests #9019

Merged

amueller mentioned this pull request Jun 7, 2017

[RFC] Simple __repr__ with global flag #9039

Closed

amueller modified the milestone: 0.19 Jun 12, 2017

NicolasHug mentioned this pull request Dec 9, 2018

[MRG] Add pprint for estimators - continued #11705

Merged

amueller closed this in #11705 Dec 20, 2018

		offset=len(class_name),),)

		def _repr_pretty_(self, p, cycle):

Uh oh!

[WIP] New __repr__ and/or pretty printing of estimators #7618

[WIP] New __repr__ and/or pretty printing of estimators #7618

Uh oh!

Conversation

amueller commented Oct 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Oct 8, 2016

Uh oh!

amueller commented Oct 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 9, 2016

Uh oh!

lesshaste commented Oct 9, 2016

Uh oh!

amueller commented Oct 9, 2016

Uh oh!

jnothman commented Oct 9, 2016

Uh oh!

amueller commented Oct 10, 2016

Uh oh!

amueller commented Oct 24, 2016

Uh oh!

amueller commented Oct 24, 2016

Uh oh!

jnothman Oct 26, 2016

Choose a reason for hiding this comment

Uh oh!

amueller Oct 27, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 26, 2016

Choose a reason for hiding this comment

Uh oh!

amueller commented Nov 22, 2016

Uh oh!

rgbkrk commented Jan 10, 2017

Uh oh!

amueller commented Jan 10, 2017

Uh oh!

rgbkrk commented Jan 10, 2017

Uh oh!

amueller commented Jan 10, 2017

Uh oh!

betatim commented Jan 10, 2017

Uh oh!

GaelVaroquaux commented Jan 11, 2017 via email

Uh oh!

rgbkrk commented Jan 11, 2017

Uh oh!

amueller commented Jan 17, 2017

Uh oh!

GaelVaroquaux commented Jan 17, 2017 via email

Uh oh!

gnestor commented Jan 17, 2017

Uh oh!

amueller commented Jan 18, 2017

Uh oh!

GaelVaroquaux commented Jan 18, 2017 via email

Uh oh!

amueller commented Jan 18, 2017

Uh oh!

GaelVaroquaux commented Jan 18, 2017 via email

Uh oh!

jnothman commented Jan 18, 2017 via email

Uh oh!

jnothman commented Feb 7, 2017

Uh oh!

jnothman commented Feb 7, 2017

Uh oh!

amueller commented Jun 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Jun 7, 2017

Uh oh!

[WIP] New repr and/or pretty printing of estimators #7618

[WIP] New repr and/or pretty printing of estimators #7618

amueller commented Oct 8, 2016 •

edited

Loading

amueller commented Oct 8, 2016 •

edited

Loading

amueller commented Oct 8, 2016 •

edited

Loading

amueller commented Jun 7, 2017 •

edited

Loading