Skip to content

UX: Enhance the HTML displays #26595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 14 tasks
GaelVaroquaux opened this issue Jun 16, 2023 · 4 comments
Open
2 of 14 tasks

UX: Enhance the HTML displays #26595

GaelVaroquaux opened this issue Jun 16, 2023 · 4 comments

Comments

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 16, 2023

Describe the workflow you want to enable

When I interact when non-advanced users a recurrent difficulty for them is finding information and understanding what is going on.

Describe your proposed solution

I think that we can guide users with better html displays. In general, what would be desirable is to give ways to the users to access all the information that an estimator knows about itself, but avoiding to add any lengthy computation during fit. Of course the difficulty of any UX which is that adding more information leads to crowding, and thus the UX needs to be kept light and focused.

I propose to do changes in an iterative way, adding one feature after the other. Here are the ideas that I have in mind:

  • Display the result of "get_params" (not visible by default, either folded or in a hover)
  • Add a link to the API documentation. This link would be inferred from the version of the module, the import path and the name of the class. For instance sklearn.cluster._spectral.SpectralClustering would lead to https://scikit-learn.org/1.2/modules/generated/sklearn.cluster.SpectralClustering.html . Note that we will have to apply heuristics such as dropping the last modules in the path if they are private. Also, we will have to be careful to cater for non scikit-learn classes inheriting from our BaseEstimator, and thus define an override mechanism and probably check that the imported module corresponds to the one for which the path was defined
  • Display in a light way whether the estimator has been fit or not
  • Display the estimator's parameters
  • Display the (public) fitted attributes
    • at least dtype and shape for array-valued attributes
    • maybe a few summary statistics for array-valued attributes.
  • Display the methods of an estimator with a tooltip with a documentation portion
  • Display the feature names
  • Display the shape of outgoing data structures
  • Reorganize the HTML diagram to have a more condensed view or a more vertical view
  • Fix the string representation (not displays) using a more vertical appearance (cf. black style)

In terms of plan, I propose to first adapt our current display without changing its main philosophy. Hence we need to add light accessors of the information, and not a huge list of things (think "mac", design).

Another important thing to keep in mind is that, for users, the hardest things to comprehend are composite estimators, such as pipelines. Most users do not understand how they can access internal objects in these.

cc @amueller who, if I understand correctly, has been pushing these ideas for a long time. Also cc @thomasjpfan who has always shown impressive skills at html.

Describe alternatives you've considered, if relevant

No response

Additional context

We should make sure that the displays work and are easy to view in all the relevant environments: jupyter notebooks, vscode.

This means that avoid javascript. If needed, we can consider using https://purecss.io/ for buttons, tabs, ...

@eskayML
Copy link

eskayML commented Jun 17, 2023

I have a question though,
Is this display enhancement only for using sklearn in notebooks?
Because I'm really curious on how some of those things you mentioned could be implemented for those running sklearn in scripts on their terminal.

@GaelVaroquaux
Copy link
Member Author

GaelVaroquaux commented Jun 18, 2023 via email

@koaning
Copy link

koaning commented Jun 5, 2024

Something that came up the other day was to perhaps add sizes to the diagram. Suppose I have something relatively complex, like below, it might be nice to be able to figure out the size/shape of the outgoing array/dataframe.

CleanShot 2024-06-05 at 07 12 19@2x

@ogrisel
Copy link
Member

ogrisel commented Oct 18, 2024

That's a good point. This information is a bit redundant with displaying the feature names but displaying the feature names cannot be done by default (because that would take too much screen real estate) but the number of output features can always be displayed. Maybe the number of output features would be a natural way for the user to click and expand the list of ouput feature names.

I added an item to the bullet list of the issue.

I also added an item about displaying (public) fitted attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Discussion
Development

No branches or pull requests

4 participants