-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] New __repr__ and/or pretty printing of estimators #7618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
But Out[2] encourages the user to familiarise themselves with what else is configurable. Yes, this is pretty, though. |
@jnothman they can do tab-complete for that, though ;) -- or rather shift-tab in jupyter. |
I'm wondering whether I should split this into to PRs, one for |
Out[2] is helpful for newbies who might not even be aware that these other parameters exist or how to list the defaults. For example if they have copied and pasted the code. I think it's just a question of what you get by default. +1 for showing if an estimator has been fitted. |
How about we restrict this to |
I'm happy to have these sorts of changes and advertise them as On 10 October 2016 at 08:34, Andreas Mueller notifications@github.com
|
… add anything. ``_repr_pretty_`` for ipython remains
I simplified a bit and got rid of |
I added a warning in the docstring. We could be more aggressive and raise a warning on every call, though that feels annoying. Or we could put it into an Using an experimental module could be interesting to move forward on some API issues ... or it could totally screw us over. |
sklearn/base.py
Outdated
offset=len(class_name),),) | ||
|
||
def _repr_pretty_(self, p, cycle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the point of making a distinction between __repr__
and _repr_pretty
is to only affect interactive use (and not, say, library code and log files), I think there should be a comment here to that effect. I suspect there will be some surprise for users related to this distinction; do we care? Do we wish to provide a public function that produces the "info" repr regardless of context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we do that? I removed __str__
to get as little "side effects" as possible. We could a (private?) method _info
?
sklearn/base.py
Outdated
offset=len(class_name),),) | ||
|
||
def _repr_pretty_(self, p, cycle): | ||
my_repr = self.__repr__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not repr(self)
?
@GaelVaroquaux wdyt? |
That or add the transform directly to nteract in a PR. 😉 |
So we are usually fairly conservative with our version requirements, so I'm not sure which options are viable. Though having a cutting-edge feature that only works on new jupyter doesn't seem a bit issue. I think options are:
In terms of fancy fontends, I think widgets or a notebook + jupyter lab JS plugin are the best options. I have to admit that I really mostly care about the notebook and jupyter lab, and I don't mind that much if other font-ends don't get the benefit as long as they some working repr. Once nteract gets a plugin system they can get nice reprs - or they built it in. Clearly the plugin thing is a bigger thing than "just" shipping some html, so the question is what the costs and benefits are. Another interesting question: would the plugins live in scikit-learn or a separate package? |
That helps me see what you're looking for in a visualization. |
@rgbkrk To be clear, that was an example of how far we could potentially go, not what I would expect from a first version. My argument was more "I want to have a road towards doing something 'crazy' like this". |
I like the idea of providing a custom mime type. That way users get a text representation if on a terminal, HTML in a browser and some super advanced display if in a frontend that knows what it is doing. And you can build up towards it. |
I want to click on the grid-search and have a visualization of the results
show. I think we will very quickly get to the limits of what's possible
with HTML/CSS.
I am very worried about this line of thoughts: it requires strong
expertise in javascript.
One of my line of conduct in designing software projects in the last few
years has been to try to limit the amount of expertise required to master
a codebase. This has been a guiding line for me to define project
boundaries. One reason is that the wider the expertise required, the
harder it is to find people that understand the whole project and hence
can debug and maintain it.
Even with the little amount of jquery that we have, in the documentation,
I am myself often struggling to debug the problems. And they are not
mission critical.
I don't think that I'd like to see much javascript appear in
scikit-learn. It's not our DNA. We are experts in numerics, and APIs for
statistical learning.
|
To piggy back on this, perhaps taking a step back to put together only the nested static content (markup) for |
@GaelVaroquaux I think being able to understand the whole codebase is a good goal, and I agree that it's hard for us to debug JS right now, because most of us are no experts. The custom mime type would not require any JS in scikit-learn, though. It would "only" require embedding the required data in scikit-learn. If we don't do that, it will be very hard for an outside library to provide this kind of functionality. That would create some dependencies, though they would be lighter than the outside library trying to do the introspection themselves. If this discussion is blocking us I'm happy for us to go ahead with HTML and see where it goes. I'm crazy busy right now :-/ |
The custom mime type would not require any JS in scikit-learn, though.
Custom mime type are a good idea. I like them.
That would create some dependencies, though they would be lighter than the
outside library trying to do the introspection themselves.
We are talking about javascript dependencies, right? I am freaked out
about javascript dependencies. It's a world with a terrible backward
compatibility and dependency culture (left pad!).
We could also be more "generic" and not implement a MIME type but instead
implement some method to get the "interesting" aspects that we would want to
report to the user in some way.
Sounds good!
If this discussion is blocking us I'm happy for us to go ahead with HTML and
see where it goes.
I would suggest to do that. It is not committing us much.
|
Technically, to support a custom mime type that can be rendered in Jupyter Notebook, Jupyterlab, etc., scikit-learn need only depend on IPython. For example: from IPython.display import display
bundle = {
'application/vnd.scikit.learn+json': data.to_json(),
'application/json': data.to_json(),
'text/html': data.to_html(),
'text/plain': data.to_string()
}
display(bundle, raw=True) On the scikit-learn side, you only need to settle on a custom mimetype (or set of them) and display the data using it. In this example, I provide fallback mimetypes, so if the user doesn't have an extension installed to render your custom mimetype, it can still be rendered as JSON, HTML, or text. Given that, I suggest that you just work from top-to-bottom, meaning provide a text output, then work on providing an HTML output, then JSON, then create a custom mimetype and mimerender extension if necessary. If all you need is to display static HTML, then providing that via the |
Yeah we should avoid those. Though I was talking more about the rendering library possibly having a very strict version dependency on scikit-learn, which will define the mime-type. Ok so let's do html first and maybe custom mime type later. What's the opinion on limiting the default repr/str to exclude unchanged parameters? |
What's the opinion on limiting the default repr/str to exclude unchanged
parameters?
I think that I'd rather have them at the end, smaller and with a lighter
color.
|
@GaelVaroquaux I meant in the actual |
I don't know. There are pros and cons. The cons being that it decreases
the discoverability of arguments. On the other hand, clutter can make
these unreadable and people should be looking at docs/docstrings. But
they are not.
What do other people think?
|
I've changed my position. I think we should hide non-default params by
default. The main reason is that while showing them helps discoverability,
with the number of params most of our estimators have, in practice the
entire string just gets ignored.
…On 19 January 2017 at 04:51, Gael Varoquaux ***@***.***> wrote:
I don't know. There are pros and cons. The cons being that it decreases
the discoverability of arguments. On the other hand, clutter can make
these unreadable and people should be looking at docs/docstrings. But
they are not.
What do other people think?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66xW_SKD5tChyDtb2NX6Ryx5ct5zks5rTlEmgaJpZM4KR2ap>
.
|
I think we should start with a limited PR basically where this one started: change repr to only show diff from default params, perhaps with a config switch. |
However I am a bit concerned that we will break doctests everywhere for things derived from |
I totally agree and I'm happy to do that. @GaelVaroquaux I think wanted to limit this to jupyter, I think it would be useful anywhere. In particular because we currently are likely to break people's (and our) doctests whenever we add a parameter anywhere. |
@jnothman hm so the config switch would be set to the current behavior by default? I'd like to change that at some point, but we don't really have a way to deprecate that, right? We could set the config switch to something that raises a warning when a repr is printed and forced the user to set the switch. That's pretty annoying, though (unless we have an .rc file, which .... would be a whole other can of worms) |
The default behaviour can just be changed by a prominent warning in what's
new if need be.
Or by releasing v1.0.
…On 7 June 2017 at 21:37, Andreas Mueller ***@***.***> wrote:
@jnothman <https://github.com/jnothman> hm so the config switch would be
set to the current behavior by default? I'd like to change that at some
point, but we don't really have a way to deprecate that, right? We could
set the config switch to something that raises a warning when a repr is
printed and forced the user to set the switch. That's pretty annoying,
though (unless we have an .rc file, which .... would be a whole other can
of worms)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz64Jd5qd_v_slxKp2bUC6COU1KtDsks5sBot0gaJpZM4KR2ap>
.
|
So add the flag now, and schedule a change, add a prominent warning to the whatsnew now, and then again when we change it? |
Something like that.
…On 7 June 2017 at 22:02, Andreas Mueller ***@***.***> wrote:
So add the flag now, and schedule a change, add a prominent warning to the
whatsnew now, and then again when we change it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65T9VaCWgoL0LBgFeuQDQd3hRUNuks5sBpFqgaJpZM4KR2ap>
.
|
Done in #9039 |
This adds some spice to the good old
__repr__
. Fixes #6323.Interestingly Jupyter used
__repr__
not__str__
so I needed to add_pretty_repr_
to call__str__
....Please compare
Out[2]
andOut[5]
from a beginners perspective. I wish I had done this before the book lol.Out[2]
is pretty much "wtf" whileOut[5]
makes obvious sense!This also adds a partial implementation of
get_n_features
which would be an addition to the API. This probably shouldn't live inBaseEstimator
but in the various mixins. In this PR, not all estimators (but a pretty good chunk!) haveget_n_features
. Adding the rest should be fairly straight-forward.The main reason I want to add
get_n_features
is that it also allows us to check whether an estimator was fitted. I feel that's an important part of the string representation (and currently implicit in the existence of the n_features line).If we want this, we might want to ask some designer to make this more pretty ;)