Description
Maybe this should be an enhancement proposal...
So I think our current __repr__
is not that helpful.
Most construction parameters are default parameters that are never seen by a user, so reporting them is basically noise.
We don't report other important things, though, like whether the model was fitted at all, or, like, what the training score is, or the training time.
I know R has a very different approach, and I'm not sure their approach is good. But I think our current approach is pretty suboptimal.
I think the __repr__
has become more important because of the popularity of jupyter notebook. If I run fit, I get the __repr__
back. And it is just noise.
A slight improvement might be to just print the construction parameters that are not set to the default value. But we could also think about something more helpful, and maybe something more model specific. GridSearchCV for example could report the best score, and the best parameters found etc.
I have been thinking about this for a while, but this is somewhat inspired by looking at #5299. Why does adding a faster solver change the __repr__
of PCA
? That seems really weird to me. PCA
is one of the simplest and most commonly used methods in ML, in particular in courses. Now people will constantly see something about tol and number of iterations that is probably not relevant for them at all.