Description
Describe the workflow you want to enable
When I interact when non-advanced users a recurrent difficulty for them is finding information and understanding what is going on.
Describe your proposed solution
I think that we can guide users with better html displays. In general, what would be desirable is to give ways to the users to access all the information that an estimator knows about itself, but avoiding to add any lengthy computation during fit. Of course the difficulty of any UX which is that adding more information leads to crowding, and thus the UX needs to be kept light and focused.
I propose to do changes in an iterative way, adding one feature after the other. Here are the ideas that I have in mind:
- Display the result of "get_params" (not visible by default, either folded or in a hover)
- Add a link to the API documentation. This link would be inferred from the version of the module, the import path and the name of the class. For instance sklearn.cluster._spectral.SpectralClustering would lead to https://scikit-learn.org/1.2/modules/generated/sklearn.cluster.SpectralClustering.html . Note that we will have to apply heuristics such as dropping the last modules in the path if they are private. Also, we will have to be careful to cater for non scikit-learn classes inheriting from our BaseEstimator, and thus define an override mechanism and probably check that the imported module corresponds to the one for which the path was defined
- Display in a light way whether the estimator has been fit or not
- Display the estimator's parameters
- Add a "?" symbol redirecting to the parameter documentation (cf. ENH: Display parameters in HTML representation #30763 (comment))
- Display the (public) fitted attributes
- at least dtype and shape for array-valued attributes
- maybe a few summary statistics for array-valued attributes.
- Display the methods of an estimator with a tooltip with a documentation portion
- Display the feature names
- Display the shape of outgoing data structures
- Reorganize the HTML diagram to have a more condensed view or a more vertical view
- Fix the string representation (not displays) using a more vertical appearance (cf. black style)
In terms of plan, I propose to first adapt our current display without changing its main philosophy. Hence we need to add light accessors of the information, and not a huge list of things (think "mac", design).
Another important thing to keep in mind is that, for users, the hardest things to comprehend are composite estimators, such as pipelines. Most users do not understand how they can access internal objects in these.
cc @amueller who, if I understand correctly, has been pushing these ideas for a long time. Also cc @thomasjpfan who has always shown impressive skills at html.
Describe alternatives you've considered, if relevant
No response
Additional context
We should make sure that the displays work and are easy to view in all the relevant environments: jupyter notebooks, vscode.
This means that avoid javascript. If needed, we can consider using https://purecss.io/ for buttons, tabs, ...
Metadata
Metadata
Assignees
Type
Projects
Status