[RFC] add pprint for estimators #9099

amueller · 2017-06-10T13:26:11Z

Continuation of #9039.
Right now I didn't replace the __repr__ for easier comparison

amueller · 2017-06-10T13:27:48Z

FYI right now numpy arrays are converted to lists for printing. If there are very large numpy arrays in the repr, this could be problematic. It also doesn't allow the user to distinguish numpy arrays from lists in the arguments, so that's maybe non-optimal. But doing something else shouldn't be that hard.

amueller · 2017-06-10T13:30:41Z

from sklearn.pipeline import make_pipeline
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

large = GridSearchCV(make_pipeline(StandardScaler(), SVC()), param_grid={'C': np.logspace(-3, 3, 7)})

from sklearn._pprint import _Formatter
cf = _Formatter(color_changed=True)
f = _Formatter(color_changed=False)

print(f(large) + "\n")
print(cf(large))

amueller · 2017-06-10T13:30:59Z

Code is horrible right now and could be simplified.
Logic between pipeline (which is special cased) and estimator is duplicated, that should probably be simplified. We could also think about special casing BaseSearchCV but it actually looks fine right now.
Feature union might need a similar special case.

Also, this doesn't detect terminal etc so that's broken, but also fixable.

jnothman · 2017-06-11T07:31:32Z

I'll have to think about whether there's a way to do that kind of layout without special-casing pipeline, which doesn't seem right

GaelVaroquaux · 2017-06-11T09:00:48Z

As mentioned IRL, I think that the functionality is really great (I haven't looked at the code so far)!

+1 for exploring it further

amueller · 2017-06-12T09:33:32Z

@jnothman depends on what you want. You can comment it out and it will still work. But I want to control where the line-break is manually, and I want the indentation of the tuples to be different.

I thought about this for the html quite a bit. I think it's worth special-casing pipeline because it's important to show this nicely and it won't be as nice without explicit treatment.
I want the indentation of steps to be such that lines up with the "steps" string. But In general I want to indent by 4 whenever I do a line-break. If I would always align with the thing in the line before, you'd get to the end of the line way too quickly.

amueller · 2017-06-12T09:38:37Z

without special treatment:

I could always just indent lists by one, but if you have a line that doesn't start with the list, it looks odd. We could change the indentation based on whether the line starts with the list/tuple or not, though.

jnothman · 2017-06-12T10:01:35Z

if we intend that this does not affect doctests, I think it's something we can continue to fine-tune. and it's looking very good for a start! I'll try to review soon

…

On 12 Jun 2017 7:38 pm, "Andreas Mueller" ***@***.***> wrote: without special treatment: [image: image] <https://user-images.githubusercontent.com/449558/27027957-8c942cbc-4f63-11e7-8d2d-00761f1bce28.png> I could always just indent lists by one, but if you have a line that doesn't start with the list, it looks odd. We could change the indentation based on whether the line starts with the list/tuple or not, though. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9099 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz67fy_5rXXHK273LvM-5pRsYZmJbPks5sDQcfgaJpZM4N2HOC> .

jnothman · 2017-08-08T12:38:21Z

Do you think we should also be ordering these parameters in order of the signature, rather than sorting, where possible (i.e. for those with positional arg names)?? I think this would greatly improve readability too as there is often topicalisation (i.e. most important first) and coherence (most related together) in the ordering. On the other hand, sorting allows a user to find the parameter they want...

jnothman

Halfway through I realised you probably weren't seeking a code review on this one.

I like it!

I'd be interested in seeing format_pipeline be extended to other compositions (VotingClassifier, FeatureUnion).

jnothman · 2017-08-14T10:33:25Z

sklearn/_pprint.py

+                   estimator.__init__)
+    init_params = signature(init).parameters
+    for k, v in params.items():
+        if v != init_params[k].default:


This will break for arrays, sparse matrices, Series, etc.

jnothman · 2017-08-14T10:41:10Z

sklearn/_pprint.py

+        self.indent = 0
+        self.step = 4
+        self.width = 79
+        self.set_formater(object, self.__class__.format_object)


double t in formatter

jnothman · 2017-08-14T10:43:21Z

sklearn/_pprint.py

+    def set_formater(self, obj, callback):
+        self.types[obj] = callback
+
+    def __call__(self, value, **args):


Is this **args really needed? It seems a hack.

jnothman · 2017-08-14T10:47:55Z

sklearn/_pprint.py

+        param_repr = self.join_items(items, indent + offset)
+        return '%s(%s)' % (value.__class__.__name__, param_repr)
+
+    def format_pipeline(self, value, indent):


Can we adopt this for all instances of BaseComposition?

jnothman · 2017-08-14T10:52:05Z

sklearn/_pprint.py

+                     + self.format_all(params[key], indent + offset), key)
+                     for key in params]
+        else:
+            items = [str(key) + '='


might as well just define color here and factor this line out.

jnothman · 2017-08-14T10:52:45Z

sklearn/_pprint.py

+
+        self.default_color = default_color
+        self.types = {}
+        self.htchar = ' '


I don't know what htchar stands for.

amueller · 2017-08-14T17:25:16Z

Yeah, the code needs a major cleanup, sorry to waste your time. I should really put a comparison image to master in here to show how much better it actually is ^^. But this is not my highest priority right now.

amueller · 2017-08-14T17:27:12Z

And yes, definitely the special casing should be extended to the other composite estimators. And sorting vs not sorting is a good question.

jnothman · 2017-08-15T00:00:45Z

do we want to write down some specs and open this to a contributor?

…

On 15 Aug 2017 3:27 am, "Andreas Mueller" ***@***.***> wrote: And yes, definitely the special casing should be extended to the other composite estimators. And sorting vs not sorting is a good question. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9099 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz62TDxXfehMQi9hDW2qFL6PA1h32hks5sYINygaJpZM4N2HOC> .

amueller · 2017-08-15T17:16:58Z

The specs would be how the formatting should look like?
And yes, it totally makes sense to open this up.

I like this to be the spec:

The problem is that it's very hard to write down a simple spec that works for everything. In particular we're limited in line-length in jupyter, but we're not limited in nesting depth in sklearn. So if you write RFE(RFE(RFE(RFE(RFE(RFE(RFE(LogisticRegresssion)))))), I have no idea how to format that. It's unlikely someone will write something like this. Basically fixed indentation increase + limited width + arbitrary nesting depth is not solvable. I guess we can use ... once the indented line-length becomes too small, or so small that we can't fit the next thing onto it. Or just for a given nesting level? That will create possibly much longer strings than now, though. Right now we limit the string-length and cut off totally arbitrarily. I don't have a good idea how to do that better.

I think it should be something like this:

Indentation should increase in steps of 4 and be nested (indentation in master is entirely broken).
For composed estimators, make the estimator lists indented by only 1 to match up with opening ( or [.
for composed estimators put each estimator name and each estimator on a separate line.
use (...) for nesting level 5 (?).

amueller · 2017-08-15T17:19:02Z

master for comparison:

jnothman · 2017-08-15T22:00:57Z

and make this the default __repr__ for which estimators and contexts?

…

On 16 Aug 2017 3:19 am, "Andreas Mueller" ***@***.***> wrote: master for comparison: [image: image] <https://user-images.githubusercontent.com/449558/29327174-4d069784-81bc-11e7-8786-dba751d6358c.png> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9099 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64FdFczQlZ42p9uo9Id21fRgApXsks5sYdMIgaJpZM4N2HOC> .

amueller · 2017-08-16T21:08:04Z

This would be plain text (maybe with terminal colors?). So all estimators and all contexts until we have better ones for fancier contexts ;) The current __repr__/pprint thing has obvious bugs that can't be fixed without doing something like this PR.

jnothman · 2018-01-11T05:37:02Z

Well:

RFE(RFE(RFE(RFE(RFE(RFE(RFE(LogisticRegresssion))))))

could be written as:

RFE(rfe1)
where rf1 = RFE(rfe2)
where rfe2 = RFE(rfe3)
where rfe3 = RFE(rfe4)
where rfe4 = RFE(rfe5)
where rfe5 = RFE(rfe6)
where rfe6 = RFE(LogisticRegresssion)

Perhaps this is a bit extreme, but we could extract and define separately nested estimators that are too long (vertically or horizontally) for the current indent...

GaelVaroquaux · 2018-01-11T07:57:24Z

How about we simply allow ourselves to extend beyond the line limit when the useful fraction of the line becomes too small? I think that this PR is so useful that we should merge it without too much more changes to it, and possibly revisit after. ⁣Sent from my phone. Please forgive typos and briefness.

…

On Jan 11, 2018, 06:37, at 06:37, Joel Nothman ***@***.***> wrote: Well: ``` RFE(RFE(RFE(RFE(RFE(RFE(RFE(LogisticRegresssion)))))) ``` could be written as: ``` RFE(rfe1) where rf1 = RFE(rfe2) where rfe2 = RFE(rfe3) where rfe3 = RFE(rfe4) where rfe4 = RFE(rfe5) where rfe5 = RFE(rfe6) where rfe6 = RFE(LogisticRegresssion) ``` Perhaps this is a bit extreme, but we could extract and define separately nested estimators that are too long (vertically or horizontally) for the current indent... -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: #9099 (comment)

jnothman · 2018-01-11T09:23:00Z

good idea!

GaelVaroquaux · 2018-01-11T10:05:08Z

Once this simple modification is in, I'll try to review this PR and aim for merge. I think that it is so terribly useful and will bring a lot of value to users.

gxyd · 2018-01-14T16:45:12Z

Is this still valid with 'help wanted'? I could have a go through the discussion (of related PR's as well) and may be try to get it in, in case this still needs help from a contributor.

jnothman · 2018-01-14T22:36:03Z

it needs the change suggested by Gaël, and a few test cases

amueller · 2018-06-04T17:40:18Z

Putting this one back on my stack... let's see...

GaelVaroquaux · 2018-07-14T09:25:45Z

@amueller : I think that we should not be too ambitious with this PR, and just implement my suggested change: "allow ourselves to extend beyond the line limit when the useful fraction of the line becomes too small", add a few tests, and merge. This is very useful to users already.

Do you have the bandwidth to do it, or should we open it for contributions during the sprint?

amueller · 2018-07-14T14:28:00Z

@GaelVaroquaux I agree. I'm not sure about extending the line length. I have to think about that. But yes, we should keep this one small.
And we can totally open it up during the sprint.

jnothman · 2018-07-16T00:06:52Z

In the first place, I don't think 79 characters is an especially reasonable width when code consistency is not the concern. But if the max width becomes a parameter, I suppose we'd better keep to it.

amueller · 2018-12-14T15:50:39Z

replaced by #11705

amueller added 2 commits June 10, 2017 14:56

add pprint for estimators

acc3732

strip color from length, add color option

c99c85a

amueller changed the title ~~add pprint for estimators~~ [RFC] add pprint for estimators Jun 10, 2017

jnothman reviewed Aug 14, 2017

View reviewed changes

amueller added the Need Contributor label Aug 15, 2017

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

NicolasHug mentioned this pull request Jul 29, 2018

[MRG] Add pprint for estimators - continued #11705

Merged

amueller closed this Dec 14, 2018

Uh oh!

[RFC] add pprint for estimators #9099

[RFC] add pprint for estimators #9099

Uh oh!

Conversation

amueller commented Jun 10, 2017

Uh oh!

amueller commented Jun 10, 2017

Uh oh!

amueller commented Jun 10, 2017

Uh oh!

amueller commented Jun 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jun 11, 2017

Uh oh!

GaelVaroquaux commented Jun 11, 2017

Uh oh!

amueller commented Jun 12, 2017

Uh oh!

amueller commented Jun 12, 2017

Uh oh!

jnothman commented Jun 12, 2017 via email

Uh oh!

jnothman commented Aug 8, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 14, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 14, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 14, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 14, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 14, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 14, 2017

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 14, 2017

Uh oh!

amueller commented Aug 14, 2017

Uh oh!

jnothman commented Aug 15, 2017 via email

Uh oh!

amueller commented Aug 15, 2017

Uh oh!

amueller commented Aug 15, 2017

Uh oh!

jnothman commented Aug 15, 2017 via email

Uh oh!

amueller commented Aug 16, 2017

Uh oh!

jnothman commented Jan 11, 2018

Uh oh!

GaelVaroquaux commented Jan 11, 2018 via email

Uh oh!

jnothman commented Jan 11, 2018 via email

Uh oh!

GaelVaroquaux commented Jan 11, 2018

Uh oh!

gxyd commented Jan 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jan 14, 2018 via email

Uh oh!

amueller commented Jun 4, 2018

Uh oh!

GaelVaroquaux commented Jul 14, 2018

Uh oh!

amueller commented Jul 14, 2018

Uh oh!

amueller commented Jun 10, 2017 •

edited

Loading

gxyd commented Jan 14, 2018 •

edited

Loading