Skip to content

Building interactive demos for examples #24878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
merveenoyan opened this issue Nov 9, 2022 · 36 comments
Open

Building interactive demos for examples #24878

merveenoyan opened this issue Nov 9, 2022 · 36 comments

Comments

@merveenoyan
Copy link

merveenoyan commented Nov 9, 2022

Describe the issue linked to the documentation

This is not an issue but rather a good-to-have feature for documentation and this issue is more of a discussion.

It would be nice to have links to or embedding interactive demos made with Gradio inside documentation.
I've built two apps that you might see how it looks like:

We can have below demo (but more official version 😅) embedded in your documentation.
Ekran Resmi 2022-11-10 15 11 41

The current workflow for users is to download python code or run binder. This reduces amount of work in this workflow too!
If you think embedded demos is a bit overkill what we could do is to create these demos and host them on Spaces and put link to Space instead (like it is with Keras examples or kornia examples, you can see them here and here, they're linked from their tutorials on both Keras.io/examples and Kornia docs) We will also host the Spaces in better hardware (8 vCPU 32 GiB RAM) to make sure it's always running and all good. ✨

You can see how it's implemented in Kornia for rst docs here. It looks like this inside Kornia docs.

Ekran Resmi 2022-11-10 15 15 08

As for Keras, we just put a redirection link inside docs since their document generation is more layered.
Ekran Resmi 2022-11-10 15 09 57

We definitely want to reduce the core maintainers' workflow so what would be cool is that we (team at Gradio & community) will be developing the demos and maintaining them through GitHub actions such that they will not break every time there's a breaking change in sklearn (this happens rarely, though 🙂)

Hoping to see how we could collaborate!

@merveenoyan merveenoyan added Documentation Needs Triage Issue requires triage labels Nov 9, 2022
@adrinjalali adrinjalali changed the title Building interactive demos for examples Building interactive demos on Hugging Face Hub for examples Nov 10, 2022
@merveenoyan
Copy link
Author

merveenoyan commented Nov 10, 2022

@adrinjalali I think you're wrong in changing the title. Embedding demo itself is not about Hugging Face Hub but about having interactive demos within sklearn docs. (see Kornia docs for example)

I updated the issue for better clarity 🙂

@merveenoyan merveenoyan changed the title Building interactive demos on Hugging Face Hub for examples Building interactive demos for examples Nov 10, 2022
@adrinjalali
Copy link
Member

@merveenoyan where do these demos run? Don't they then run on a Space? If yes, then it's a Hub related issue. We don't have the compute backend to actually run the demos on our servers (which we don't have much of). We could think of having them as pyodide though.

What I'm saying: who's maintaining the demo code, and who's maintaining the compute servers/infra. If the answer to those questions are people at Hugging Face, then it's a Hub thing.

@merveenoyan
Copy link
Author

merveenoyan commented Nov 14, 2022

@adrinjalali I see, but this is something done for sklearn docs in the end and that is what I mean 🙂

Wanted to ask @ArturoAmorQ their opinion 🙂

@ogrisel
Copy link
Member

ogrisel commented Nov 15, 2022

Thanks for the demos @merveenoyan, they look great and they can really help the reader gain insights about the impact of hyperparameters on model behaviors. However as of now the scikit-learn HTML documentation has no dependency on external online services: it can be generated and browsed locally without any internet connection.

Furthermore any code change in scikit-learn that breaks an example can be detected on our CI when we generate the HTML doc which will not be the case for code hosted and executed on a third party platform. So I am a bit worried about starting to do this.

I would love to have interactive demos directly in the scikit-learn documentation. However I would prefer if we could find a way that does not rely on Python code execution on a remote server (as gradio requires) for the reason I mentioned above. Out of the top of my head, I see the following two options:

  • Precomputing the results when executing the example and then rely on something like ipywidgets to display the results interactively in the generated HTML. Integration with sphinx-gallery might be challenging though. According to https://docs.readthedocs.io/en/stable/guides/jupyter.html#rendering-interactive-widgets it might require using the notebook format for the examples which we tried to avoid so far (because of bad interactions between json and git).

  • Using pyodide / jupyterlite and co to directly run interactive examples in the browser. However, based on current investigations by @lesteve in https://github.com/lesteve/scikit-learn-tests-pyodide there are still a bunch of low level fatal errors that can happen when using scikit-learn on this platform so maybe it's a bit to early to invest in this.

I would like to probe the opinion of other users, contributors and maintainers. Maybe using the gradio / huggingface space combo is fine compared to the alternatives that have their own limitations at this time.

If maintainers do not want to have to deal with a doc dependency on remotely maintained code demos on huggingface spaces, then at least it would be worth publishing a blog post with those demos included on https://blog.scikit-learn.org (via https://github.com/scikit-learn/blog).

@ogrisel
Copy link
Member

ogrisel commented Nov 15, 2022

BTW for the classifier comparison demo, it would be great to allow the user to interactively "brush" the 2d dataset in addition to loading one of the standard datasets such as blob, 2 moons & co.

The user would be able to use the mouse to click on a region of the 2d canvas to add random points of a selected class (red vs blue) around the position of the cursor at the time of the click. And the longer you hold, the higher the density of the points.

I am sure I saw a demo in that spirit on twitter a while ago but I cannot find it.

@glemaitre glemaitre removed the Needs Triage Issue requires triage label Nov 15, 2022
@betatim
Copy link
Member

betatim commented Nov 16, 2022

I think having interactive examples really helps people get a feeling for a concept by allowing them to test their understanding a la "I think doing X will make change Y", then doing X and comparing what really happened to their expectation.

I think something that is embedded in the docs, like ipywidgets or pyodide, would be the best solution. It means the docs don't depend on an additional service (availability is the product of the individual uptimes, so more services almost always means lower availability) and there are no data protection questions that need answering (to directly embed many social sharing widgets or YouTube content you need understand the privacy/regulatory implications).

An approach other projects take is to link out to other community maintained blog posts, examples and tutorials. Maybe this is a model for these examples as well? It makes it clear to the reader that they are not maintained by the project itself, but somehow noteworthy because they are linked to. It does make it harder to detect that they have drifted from what the docs describe/broken :-/

I'd invest in pyodide, maybe by starting with examples that don't lead to errors. My assumption being that in the examples you'd not necessarily give people a full shell/editor experience but something graphical as in the examples above. So you can constrain users to the part of scikit-learn that works.


Another example are the demos on https://spacy.io/. They are code based and rely on an additional service (mybinder.org). Spacy seem happy with this setup, but see my comment above regarding "availability is the product of the individual uptimes".

@ArturoAmorQ
Copy link
Member

ArturoAmorQ commented Nov 16, 2022

Thanks for the initiative @merveenoyan! I do agree that interactive examples are very effective in generating intuitions and understanding. I think linking to the demos is doable as long as we make it clear to the reader that they are noteworthy but are not maintained by the project itself, as mentioned by @betatim. This should probably include a disclaimer about the privacy/regulatory implications.

Regarding @ogrisel's comment:

The scikit-learn HTML documentation has no dependency on external online services: it can be generated and browsed locally without any internet connection.

I don't fully agree because we do link to Wikipedia and scientific papers, but I guess the spirit of the comment is more inline with the idea that pre-calculated demos would be better in terms of usability and even computational sources. Doing so would then be the responsibility of the external maintainers.

The only concern about adding external links (even with a disclaimer) is that it may set a precedent that people may use opportunistically to demand linking to their blogposts/projects as long as the content is "relevant", instead of actually contributing to the development of the existing content.

In any case, @ogrisel's solution has my +1:

If maintainers do not want to have to deal with a doc dependency on remotely maintained code demos on huggingface spaces, then at least it would be worth publishing a blog post with those demos included on blog.scikit-learn.org (via scikit-learn/blog).

For the moment I encourage other users, contributors and maintainers to give their opinion!

@betatim
Copy link
Member

betatim commented Nov 16, 2022

The only concern about adding external links (even with a disclaimer) is that it may set a precedent that people may use opportunistically to demand linking to their blogposts/projects as long as the content is "relevant", instead of actually contributing to the development of the existing content.

That is true. I think that you can deal with this via PR reviews, technology as well as an explicit statement about inclusion and spam.

A good thing about linking is that it requires little up front investment, you can start doing it relatively quickly, and if there is a flood of link spam PRs you can also stop quite quickly again. At least compared to getting pyodide working well :D

@merveenoyan
Copy link
Author

merveenoyan commented Nov 16, 2022

Is this possible to add links in the examples with their associated demos? @betatim @ArturoAmorQ
E.g. in here you have link to binder, can we have a link to demo instead of having them altogether inside one blog post?

It makes sense for someone to see the example and try the demo instead if they find it interesting. This is what we do with Keras examples, you can see one here. Also see below on how they look. it's inside a simple markdown actually with a badge 🙂 some people have directly put markdown link.

Ekran Resmi 2022-11-16 14 15 19

@ogrisel we have a canvas component that we can do 2D space for data points 🙂

@merveenoyan
Copy link
Author

@ogrisel @betatim @ArturoAmorQ we're discussing pyodide support for Gradio. I'll update you on this once everything's all clear 🙂

@betatim
Copy link
Member

betatim commented Nov 28, 2022

Someone pointed me to https://jupyterlite-sphinx.readthedocs.io/en/latest/directives/replite.html which is a sphinx plugin that lets you have pyodide in the docs. Maybe this is a thing to investigate for the examples or even inline code in the docs

@lesteve
Copy link
Member

lesteve commented Nov 28, 2022

For the scikit-learn examples, some work is needed to integrate JupyterLite into sphinx-gallery, see sphinx-gallery/sphinx-gallery#977 for an attempt by Andreas for example (full disclosure I haven't looked at it in details).

Also there are caveats down the road:

  • there are weird low-level bugs in Pyodide e.g. see Fatal error with snippet using np.random and scipy.linalg pyodide/pyodide#3203
  • "magical" imports don't work for all the packages that we use in our examples. For example "import plotly" does not work you need to do import piplite; await piplite.install('plotly'). I think not that many examples use plotly so maybe not a crucial issue for now. For the general case, you would need to generate a slightly different notebook if you are inside JupyterLite compared to when you are in a normal notebook (e.g. Binder), not sure how that would be doable ... the alternative would be to use jupyterlite-xeus-python-kernel that could in principle install all the dependencies from emscripten-forge. In my small tests with jupyterlite-xeus-python-kernel, as soon as you have numpy, the start-up time of the kernel takes a very long time and as things stand today is not such a nice user experience.

@betatim
Copy link
Member

betatim commented Dec 19, 2022

A related issue that mentions thebe-light. Might be worth keeping an eye on as jupyterbooks is based on sphinx (I think). And thebe is a "old" project (probably the OG?!) related to making interactive computation possible (with support for widgets, etc, etc).

Quansight-Labs/czi-scientific-python-mgmt#6

@merveenoyan
Copy link
Author

Hey folks! 👋

While waiting for Gradio to integrate pyodide, we decided to kick-off a community event to build the demos. For this, we drafted this guiding document. (for now, I allowed you to leave comments too if you have any!)
At the end, we will have prizes for submissions. (we have not determined what to give yet)

We would like to do it in collaboration with scikit-learn and were wondering if you would like to be involved in this?
Looking forward to hear from you!
Pinging @GaelVaroquaux @ogrisel @ArturoAmorQ @betatim

@glemaitre glemaitre added the RFC label Mar 8, 2023
@lesteve
Copy link
Member

lesteve commented Apr 26, 2023

FYI, an update on my earlier comment, the dev website examples now have a JupyterLite button, for example you can run this example inside JupyterLite or find your favourite example from the gallery and click on the JupyterLite button!

For more details about the implementation, you can have a look at #25887.

@adrinjalali
Copy link
Member

So we kinda need a decision here, with the inclusion of JupyterLite link in our docs now, do we think we'd like to do more in terms of interactivity?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Apr 26, 2023 via email

@adrinjalali
Copy link
Member

Seems like there's a port of streamlit (stlite) which allows us to do that, here's a demo: https://discuss.streamlit.io/t/new-library-stlite-a-port-of-streamlit-to-wasm-powered-by-pyodide/25556/28

@adrinjalali
Copy link
Member

@ArturoAmorQ would you have bandwidth to check how easy it would be to do a demo with stlite?

@osanseviero
Copy link

Hey all! Omar from Hugging Face here 🤗 In the recently co-organized community sprint, community members built ~80 Gradio-based apps based on scikit-learn documentation. You can check them at https://huggingface.co/sklearn-docs. It might be worth exploring using some of this.

The Gradio team, including @whitphx (creator of stlite) and @abidlabs, is working on official pyodide support directly in the gradio library.

@ogrisel
Copy link
Member

ogrisel commented May 4, 2023

As soon as pyodide support for gradio is released (even as a tech preview), it would be interesting to open a DRAFT PR to prototype a possible integration of a small gradio app into a sphinx page of the doc. This way the scikit-learn CI will automatically generate the rendered HTML and we will able to see the end result.

I am not sure if such interactive examples necessarily need to be treated as sphinx-gallery examples. Maybe it would be better to have a dedicated folder to host the source code of such gradio examples and a dedicated sphinx-gradio-lite (or whatever the future name) plugin to build such examples and include them in a dedicated interactive examples/demos section of the website. In particular, we might not want to display the source code of such examples as notebook-style pages as sphinx-gallery does (but maybe just link to the github URL of the source code of the gradio app in a footer below the demo).

@ogrisel
Copy link
Member

ogrisel commented May 4, 2023

We could even think of including interactive demos directly in the user guide (as we do for generated sphinx-gallery generated matplotlib png figures) when it makes sense.

@adrinjalali
Copy link
Member

If we are to have interactive demos on the website, that means adding a dependency to those libraries, and having some contributors to understand the framework we're adding.

So there are a few open points:

  • do we want interactive demos?
    • if yes, they shouldn't load by default, we should have the structure to allow users to click on something to load the demo, with a warning about the size of the download if they click on it (we should probably do that for jupyterlite as well cc @lesteve)
    • we should also figure out where to put them. Probably the best way for them would be inside the user guide, not the examples, with a link to their source code.
  • if we're doing it, we should then discuss if we'd like to use stlite or gradio-lite. Streamlit has a much larger user community, is more mature, and its much better documented compared to gradio, and I've heard from many users that working with streamlit seems easier.

I think we should be careful about what we add here, since in the long term, what matters is what we and our contributors feel more comfortable working with, as well as the download size on each demo for each framework.

@ogrisel
Copy link
Member

ogrisel commented May 4, 2023

We might also want to explore the possibility to just use ipywidgets in regular sphinx-gallery examples. Combined with jupyterlite, that might be an interesting alternative with lighter weight dependencies albeit maybe at the cost of more limited UI and inability to be included as interactive demos in the user guide.

@ogrisel
Copy link
Member

ogrisel commented May 4, 2023

And finally there is also https://shiny.rstudio.com/py/docs/shinylive.html.

@osanseviero
Copy link

Yes, I think the scikit-learn team should explore all possibilities and see what makes the most sense on your side that will satisfy your users and community and make technical sense. From our side happy to support you if there's interest! <3

if we're doing it, we should then discuss if we'd like to use stlite or gradio-lite. Streamlit has a much larger user community, is more mature, and its much better documented compared to gradio, and I've heard from many users that working with streamlit seems easier.

Happy to discuss async about this, I want to make sure there is an objective discussion here. Gradio has 3.4M monthly pip installs, a significant, thriving usage base (with big projects such as Auto111, etc), and from 🤗 Hub usage, we've seen significantly organic accelerating usage for it. The fact that the community was able to quite independently build 80 demos in some days also tells something, and we're always happy to take feedback on how to improve our docs in Gradio's repo if you feel anything is lacking for sklearn use cases.

In any case, we <3 Streamlit and Shiny and are collaborating closely with them; I hope one of these options (or other pyodide solutions!) work well for you!

@merveenoyan
Copy link
Author

Hey @adrinjalali @ogrisel thanks a lot for the discussion!
Reason why the core developers were very fond of it at the first place was how nice it is to interact with examples, e.g. have a canvas to draw clusters interactively (I think it was Olivier's idea?) or play with a parameter in a slider and see the effect of it real time thanks to event listeners.
Besides this, some of the demos are multi-step, e.g. you can generate, visualize and interact with data in 3D, then train a model and see the effects of the parameters you played with. I remember talking to folks at the consortium and thought I'd just re-iterate on the motive :) now that there's so many cool demos that actually accomplish this, it would be very nice to let more people play with them.

@betatim
Copy link
Member

betatim commented May 5, 2023

I like the idea of having interactive figures directly in the user guide (somehow activated by user interaction to avoid large downloads). I think being able to explore stuff as you are reading about it is super cool. It would also be a lot of work to write the guide so that it works together with the example, etc. But luckily we have quite a few people with experience in teaching. Count this as a vote for having a way to have interactive figures in the user guide.

@ogrisel
Copy link
Member

ogrisel commented May 5, 2023

There is another contender, namely voici, the WASM version of voila dashboards for jupyter/ipywidgets.

Here is a demo:

I don't know if they can be embedded in a sphinx rendered HTML page though. We could probably use an iframe if needed but maybe there is a better way to do it.

Edit: I opened a feature request for embeddable apps: voila-dashboards/voici#79.

@freddyaboulton
Copy link
Contributor

As soon as pyodide support for gradio is released (even as a tech preview)

Good news @ogrisel, that day is today!

We just released @gradio/lite, which lets you run gradio (along with scikit-learn, numpy) entirely in the browser thanks to pyodide.

I ported this demo from the scikit-learn gradio hackathon to gradio-lite: https://huggingface.co/spaces/freddyaboulton/gradio-lite-sklearn

You can run it yourself locally by pasting this into an html file and opening it in your browser:

<!DOCTYPE html>
<html>
	<head>
      <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto&display=swap" >
      <style>
          body {
              font-family: 'Roboto', sans-serif;
              font-size: 16px; 
          }
        .logo {
            height: 1em;
            vertical-align: middle;
            margin-bottom: 0.1em; 
          }
      </style>
      
		<script type="module" crossorigin src="https://cdn.jsdelivr.net/npm/@gradio/lite@0.4.1/dist/lite.js"></script>
		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@gradio/lite@0.4.1/dist/lite.css" />
	</head>
	<body>
      <h2>
        <img src="https://gradio-builds.s3.amazonaws.com/assets/lite-logo.png" alt="logo" class="logo">
        Gradio and scikit-learn running entirely in your browser thanks to pyodide!
      </h2>
<gradio-lite>
🔥
<gradio-requirements>
scikit-learn
plotly
numpy
</gradio-requirements>

<gradio-file name="app.py" entrypoint>
import numpy as np
import plotly.graph_objects as go

from sklearn import decomposition
from sklearn import datasets

import gradio as gr

np.random.seed(5)

## PCA
def PCA_Pred(x1, x2, x3, x4):
    #Load Data from iris dataset:
    iris = datasets.load_iris()
    X = iris.data
    Y = iris.target
    label_data = [("Setosa", 0), ("Versicolour", 1), ("Virginica", 2)]

    #Create the model with 3 principal components:
    pca = decomposition.PCA(n_components=3)
    
    #Fit model and transform (decrease dimensions) iris dataset:
    pca.fit(X)
    X = pca.transform(X)

    #Create figure with plotly
    fig = go.Figure()

    for name, label in label_data:
        fig.add_trace(go.Scatter3d(
            x=X[Y == label, 0],
            y=X[Y == label, 1],
            z=X[Y == label, 2],
            mode='markers',
            marker=dict(
                size=8,
                color=label,            
                colorscale='Viridis',   
                opacity=0.8),
            name=name
            ))
    
    user_iris_data = np.array([[x1, x2, x3, x4]], ndmin=2)

    #Perform reduction to user data
    pc_output = pca.transform(user_iris_data)
    fig.add_traces([go.Scatter3d(
            x=np.array(pc_output[0, 0]),
            y=np.array(pc_output[0, 1]),
            z=np.array(pc_output[0, 2]),
            mode='markers',
            marker=dict(
                size=12,
                color=4,                # set color
                colorscale='Viridis',   # choose a colorscale
                opacity=0.8),
            name="User data"
            )])
    fig.update_layout(scene = dict(
                            xaxis_title="1st PCA Axis",
                            yaxis_title="2nd PCA Axis",
                            zaxis_title="3th PCA Axis"),
                     legend_title="Species"
                    )

    return [pc_output, fig]
    
title = "PCA example with Iris Dataset 🌺"
with gr.Blocks(title=title) as demo:
    gr.Markdown(f"## {title}")
    gr.Markdown(
        """
        The following app is a demo for PCA decomposition. It takes 4 dimensions as input, in reference \
        to the following image, and returns the transformed first three principal components (feature \
        reduction), taken from a pre-trained model with Iris dataset. 
        """)
    with gr.Row():
        with gr.Column():
            inp1 = gr.Slider(0, 7, value=1, step=0.1, label="Sepal Length (cm)")
            inp2 = gr.Slider(0, 5, value=1, step=0.1, label="Sepal Width (cm)")
            inp3 = gr.Slider(0, 7, value=1, step=0.1, label="Petal Length (cm)")
            inp4 = gr.Slider(0, 5, value=1, step=0.1, label="Petal Width (cm)")
            output = gr.Textbox(label="PCA Axes")
        with gr.Column():
            plot = gr.Plot(label="PCA 3D Space")

    Reduction = gr.Button("PCA Transform")
    Reduction.click(fn=PCA_Pred, inputs=[inp1, inp2, inp3, inp4], outputs=[output, plot])
    demo.load(fn=PCA_Pred, inputs=[inp1, inp2, inp3, inp4], outputs=[output, plot])

demo.launch()
</gradio-file>

</gradio-lite>		
</body>
</html>

@adrinjalali
Copy link
Member

This would probably require a SLEP comparing alternatives before we move forward. I don't have the bandwidth to drive that SLEP, but another maintainer might.

@merveenoyan
Copy link
Author

Hello @adrinjalali 👋 since we already have 135 demos here that are already built by community using gradio, I think gradio-lite is a very good consideration given it doesn't require a lot of change except for the HTML parts on top and bottom. We can ask the contributors to open PRs to sklearn docs as well because they were very eager to see their code in the docs! 😊

Also pinging @ArturoAmorQ and @francoisgoupil who were very fond of them.
For those who don't use HF Spaces, the Spaces over there are stopped because they're not in use, when you open them they'll restart very quickly, so check them out :)

@nalgeon
Copy link

nalgeon commented Feb 29, 2024

I suggest using Codapi for interactive code examples in the documentation. It's an open source tool designed for this very purpose (disclaimer: I'm the author).

Reasons to choose Codapi:

  • Non-invasive integration: code blocks in the source .rst files do not need to be modified and will still render correctly on GitHub (and everywhere else).
  • In-browser or server-side sandboxes (whichever you prefer).
  • Jupyter-like code cells.
  • Templates and visualizations.

Here is how it looks like: https://codapi.org/try/scikit-learn (source)

@betatim
Copy link
Member

betatim commented Mar 4, 2024

How is codapi (for Python) different from something like Juniper and Thebe (server based) or using pyodide (browser based)?

The reason I'm asking is that the former (thebe and juniper) are well tested, community owned and supported tools that are IMHO unlikely to go unmaintained (at least they've survived for many years so far). They have the downside of needing a server (currently powered by mybinder.org). Pyodide and JupyterLite are newer but don't require a server and there is already an ongoing effort without scikit-learn to use them (for example all examples already run in JupyterLite). This means there are quite a lot of options already and the question is "why add even more options instead of focussing on finishing one?"

@nalgeon
Copy link

nalgeon commented Mar 4, 2024

I'm no expert on Juniper and Thebe, but at first glance they seem to lack the features you need to make existing documentation interactive (like the ones I mentioned — templates and snippet dependencies).

I've taken one of the tutorials and made it interactive using Codapi (the link is in my comment above). You can try to do the same with Juniper/Thebe and see how it works out.

As for Pyodide, it is just an execution engine. Codapi can also use it.

In any case, I'm not trying to say that Codapi is better and other tools are worse. I've presented it as a solution. I've described its strengths (as I see them). I've given a specific example for one of the existing tutorials. The rest is up to the scikit-learn team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests