Skip to content

__repr__ not that helpful #6323

Closed
@amueller

Description

@amueller

Maybe this should be an enhancement proposal...
So I think our current __repr__ is not that helpful.
Most construction parameters are default parameters that are never seen by a user, so reporting them is basically noise.
We don't report other important things, though, like whether the model was fitted at all, or, like, what the training score is, or the training time.
I know R has a very different approach, and I'm not sure their approach is good. But I think our current approach is pretty suboptimal.
I think the __repr__ has become more important because of the popularity of jupyter notebook. If I run fit, I get the __repr__ back. And it is just noise.

A slight improvement might be to just print the construction parameters that are not set to the default value. But we could also think about something more helpful, and maybe something more model specific. GridSearchCV for example could report the best score, and the best parameters found etc.

I have been thinking about this for a while, but this is somewhat inspired by looking at #5299. Why does adding a faster solver change the __repr__ of PCA? That seems really weird to me. PCA is one of the simplest and most commonly used methods in ML, in particular in courses. Now people will constantly see something about tol and number of iterations that is probably not relevant for them at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EasyWell-defined and straightforward way to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions