Skip to content

Add links to examples from the docstrings and user guide #30621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
StefanieSenger opened this issue Jan 10, 2025 · 110 comments
Open

Add links to examples from the docstrings and user guide #30621

StefanieSenger opened this issue Jan 10, 2025 · 110 comments
Labels
Documentation good first issue Easy with clear instructions to resolve Meta-issue General issue associated to an identified list of tasks Sprint

Comments

@StefanieSenger
Copy link
Contributor

StefanieSenger commented Jan 10, 2025

TLDR: Meta-issue for new contributors to add links to the examples in helpful places of the rest of the docs.

Description

This meta-issue is a good place to start with your first contributions to scikit-learn.

This issue builds on top of #26927 and is introduced for easier maintainability. The goal is exactly the same as in the old issue.

Here, we improve the documentation by making the Examples more discoverable by adding links to examples in relevant sections of the documentation in the API documentation and in the User Guide:

  • the API documentation is made from the docstrings of public classes and functions which can be found in the sklearn folder of the project
  • the User Guide can be found in the doc/modules folder of the project

Together with the examples (which are in the examples folder of the project), these files get rendered into html when the documentation is build and then are displayed on the scikit-learn website.

Important: We estimate that only 70% of the examples in this list will ultimately be referenced. This means part of the task is deciding which examples deserve being referenced and we are aware that this is not a trivial decision, especially for new contributors. We encourage you to share your reasoning, and a team member will make the final call. We hope this isn’t too frustrating, but please know that evaluating an example is not just an exercise for new contributors; it’s a meaningful and valuable contribution to the project, even (and especially) if the example you worked on doesn’t end up being linked.

Workflow

We recommend this workflow for you:

  1. have pre-commit installed in your environment as in point 10 of How to contribute in the development guide (this will re-format your contribution to the standards used in scikit-learn and will spare you a lot of confusion when you are a beginner)

  2. pick an example to work on

    • Make sure your example of interest had not recently been claimed by someone else by looking through the discussion of this issue (you will have to load hidden items in this discussion). Hint: If somebody has claimed an example several weeks ago and then never started it, you can take it. You can also take over tasks marked as stalled.
    • search the repo for other links to your example and check if the example is already linked in relevant parts of the docs
      • how to search the repo: a) find the file name of your example in the examples folder (it starts with plot_...); b) use full text search of your IDE to look for where that name appears
      • you can totally ignore the "Gallery examples" on the website, as it is auto-generated; do only look for real links in the repo
    • comment on the issue to claim an example (you don't need to wait for a team member's approval before starting to work)
  3. find suitable spots in either the API documentation or the User Guide (or both) where users would be happy to find your example linked

    • read through your example and understand where it is making its most useful statements
    • how to find a good spot (careful: we are extremely picky here)
      • if the example demonstrates a certain real world use case: find where in the User Guide the same use case is treated or could be treated
      • if the example shows how to use a certain param: the param description in the API documentation might be a good spot to put the link
      • if the example compares different techniques: this highly calls for mentioning it in the more theoretical parts of the User Guide
      • not all the examples listed here need to be referenced: a link to an example on simply how to use some estimator, doesn't add enough value
        • if you find an example that doesn't add enough value to be linked: please leave a comment here; this kind of contribution is highly appreciated
    • not a good spot: the See Also section, which is (theoretically) reserved for links to other API functionalities, not examples
  4. add links

    • An example with the path examples/developing_estimators/sklearn_is_fitted.py whould be referenced like this:
      :ref:`sphx_glr_auto_examples_developing_estimators_sklearn_is_fitted.py`
    
    • see this example PR, that shows how to add a link to the User Guide: DOC add link to sklearn_is_fitted example in check_is_fitted #26926
    • we aim not to use the .. rubric:: Examples section to put the example if possible, but to integrate it into the text; be aware that if you add a link like this :ref:`title <link>`, you can change its title so that the example's title gets substituted by your picked title and the link can be fitted more nicely to the sentences
    • please avoid adding your link to a list of other examples, since we strive to add the links in the most relevant places
    • please avoid adding a new .. rubric:: Examples section
  5. test build the documentation before opening your PR

  6. open PR

    • use a PR title like DOC add links to <name of example> (starting with DOC)
    • do not refer to this issue on the title of the PR, instead:
    • do refer to this issue using in the Reference Issues/PRs section of your PR, do refer to this issue using "Towards #30621" (do not use "Closes #..." or "Fixes #...")
  7. check the CI

    • After the CI tests have finished (~90 minutes) you can find one that says "Check the rendered docs here!". In there, you can look into how the CI has built the documentation for the changed files to check if everything looks alright. You will see something like auto_examples/path_to_example, [dev], [stable], where the first link is your branche's version, the second is the main dev branch and the third link is the last released scikit-learn version that is used for the stable documentation on the website.
    • if the CI shows any failure, you should to take action by investigating and proposing solutions; as a rule of thump, you can find the most useful information from the CIs, if you click the upper links first; in any case you need to click through several layers until you see actual test results with more information (and until it looks similar to running pytest, ruff or doctest locally)
    • if the CI shows linting issues, check if you have installed and activated pre-commit properly, and fix the issue by the action the CI proposes (for instance adding or deleting an empty line)
    • if you are lost and don't know what to do with a CI failure, look through other PRs from this issue; most things have already happened to others
    • sometimes, http request errors such as 404 or 405 show up in the CI, in which case you should push an empty commit (git commit --allow-empty -m "empty commit to re-trigger CI")
  8. wait for reviews and be ready to adjust your contribution later on

Expectation management for new contributors

How long will your first PR take you up until the point you open a PR?

  • 8-16 hours if you have never contributed to any project and have only basic or no understanding of the workflow yet
  • 2-8 hours if you know the workflow and are just new to scikit-learn (more to the shorter end if you know what linting is and a bit of sphinx)
  • 1-2 hours for your 2nd, 3rd, ... PR on the same issue for everyone

How long will it take us to merge your PR?

  • we strive for a scikit-learn member to look at your PR within a few days and suggest changes depending on technical quality of the PR and an assessment of added value to the user
  • we strive for a maintainer to evaluate your PR within a few weeks; they might also suggest changes before approving and merging
  • the whole process on average takes several weeks and can take up months, depending of availability of maintainers and on how many review cycles are necessary

ToDo

Here's a list of all the remaining examples:

What comes next?

@StefanieSenger StefanieSenger added Documentation Sprint good first issue Easy with clear instructions to resolve Meta-issue General issue associated to an identified list of tasks labels Jan 10, 2025
@marenwestermann
Copy link
Member

Leaving a comment so I get updates about incoming PRs :)

@virchan
Copy link
Member

virchan commented Jan 10, 2025

Commenting to stay updated!

@sarang-26
Copy link

sarang-26 commented Jan 11, 2025

Hi Sklearn Team,

excited to work on this on this issue. I will be following the workflow and start working on this.

Its my first time contributing to sklearn, hence hoping I learn as much as possible and also contribute to the open source community!

I will be working on

examples/classification:
plot_classifier_comparison.py
plot_digits_classification.py

Updated:

Hi @StefanieSenger, I have found few places, where I can place examples from SVM in the User Guide.
Could I also select few more examples to work on:
plot_custom_kernel.py
plot_iris_svc.py
plot_linearsvc_support_vectors.py
plot_oneclass.py
plot_rbf_parameters.py

Is it okay, to select a group of this size and work on this in a single PR? (from review perspective)

@AlviseSembenico
Copy link

Hey team!
I will also work on this in the coming days as my first contribution!

@StefanieSenger
Copy link
Contributor Author

Hi @sarang-26, nice to hear you want to contribute.
plot_classifier_comparison.py was recently claimed by someone else.
You can still work on plot_digits_classification.py, I think.

@stefanogaspari
Copy link
Contributor

Working on examples/covariance/plot_mahalanobis_distances.py

@Crucible0
Copy link

Hi, I'd like to work on adding references for plot_iris_dtc.py and plot_tree_regression_multioutput.py. Could you confirm if they're available for contribution? Thank you!

@StefanieSenger
Copy link
Contributor Author

StefanieSenger commented Jan 11, 2025

Hi, I'd like to work on adding references for plot_iris_dtc.py and plot_tree_regression_multioutput.py. Could you confirm if they're available for contribution? Thank you!

Awesome! Happy to see your contribution.
It looks like nobody has worked on it yet, I cannot confirm though. 🤷 You need to check yourself (see workflow 1. b), it's part of the process of submitting a PR.

@Rchintalapati0111
Copy link

Hi @StefanieSenger!
I am a first-time contributor to Scikit-learn and would like to contribute to the example plot_confusion_matrix.py from the applications category. Please let me know if it is available and if there are any additional guidelines I should follow.
Thank you

@Peeyush2
Copy link

Hello @StefanieSenger

I am looking for open-source contributions. This will be my first contribution, I would like to work on file
examples/feature_selection:
plot_feature_selection.py

Thank you!

@Vish75
Copy link

Vish75 commented Jan 11, 2025

Hello @StefanieSenger

I am quite new to open-source contributions. I would like to work on examples/model_selection/plot_cv_predict.py.

Thank you

@hriti99
Copy link

hriti99 commented Jan 13, 2025

Hello @StefanieSenger
I am looking for open-source contributions, this will be my first contribution.
I will work on the file : plot_image_denoising.py

Thank you!

@StefanieSenger
Copy link
Contributor Author

Hi @StefanieSenger, I have found few places, where I can place examples from SVM in the User Guide.
Could I also select few more examples to work on:
plot_custom_kernel.py
plot_iris_svc.py
plot_linearsvc_support_vectors.py
plot_oneclass.py
plot_rbf_parameters.py

Is it okay, to select a group of this size and work on this in a single PR? (from review perspective)

Hi @sarang-26, I'd say that's too many. Please split it up or go one by one.

@Si-ddhartha
Copy link

Hello @StefanieSenger

I am quite new to open-source contributions. I would like to work on examples/model_selection/plot_underfitting_overfitting.py

Thank you!

@Crucible0
Copy link

Crucible0 commented Jan 15, 2025

Hi! I have created a pull request: #30650 . It adds a reference to the plot_iris_dtc example in the DecisionTreeClassifier documentation. Please let me know if any further changes are needed!

@simarssidhu
Copy link
Contributor

Hi @StefanieSenger,

I'm new to open source contributions, I was wondering if I could work on:

  • plot_weighted_samples.py in examples/svm

Thank you,
Simar

@anotherk1nd
Copy link

Hello! I'd like to claim this example as my first contribution:

scikit-learn/examples/mixture/plot_concentration_prior.py

Thanks!
Josh

@sotagg
Copy link
Contributor

sotagg commented Jan 17, 2025

Hello @StefanieSenger,

This will be my first contribution to scikit-learn.
I'd like to work on examples/linear_model/plot_ols.py if it's available.

Thank you!

@sotagg
Copy link
Contributor

sotagg commented Jan 19, 2025

Hello @StefanieSenger,
I just realized that plot_ols.py is already referenced in the User Guide, so there isn’t much to add there.
My apologies for the oversight.

I would still love to contribute, so I’d like to switch to plot_ols_ridge_variance.py if it’s still available.
Please let me know if that works.

Thanks for your patience.

@StefanieSenger
Copy link
Contributor Author

StefanieSenger commented Jan 19, 2025

I just realized that plot_ols.py is already referenced in the User Guide, so there isn’t much to add there.

That's an excellent finding, thank you, @sotagg!
Finding that we don't need to reference one of the examples is part of the workflow, so that's been just right. I will check it off the list.

plot_ols_ridge_variance.py looks fine, but I didn't check in depth (that's on you again).

@PriyankaWani66
Copy link

Hi @StefanieSenger ,

I'd like to contribute to this issue by adding references to the following examples in the documentation:
examples/neighbors/plot_lof_novelty_detection.py
examples/neighbors/plot_lof_outlier_detection.py

I have checked the comments and PRs and didn't find any related work for these files. Please confirm if I can proceed with these contributions.

Thanks!

@sidg1215
Copy link

Hi @StefanieSenger,
I'd like to contribute by adding references to examples/linear_model/plot_logistic.py. I have checked for the comments and PRs and I don't see anything related. Please let me know if I can start working on this.

Thank you,
Sid

@StefanieSenger
Copy link
Contributor Author

Hello @StefanieSenger, I am a first-time contributor to scikiy-learn, and I would like to contribute by adding a reference to plot_feature_selection.py to the documentation.

I checked and haven't found any work related to this file in the PRs or the comments.

Hi @tarek7669, somebody has claimed plot_feature_selection.py on Jan 11th, but after 2 months have passed, you can take over.

@DanieSimonlLowe
Copy link

DanieSimonlLowe commented Mar 16, 2025

I will do examples/miscellaneous/plot_anomaly_comparison.py

I have done some searching and I think all the places that would be useful for it to link to already have it. (I have not done an exhaustive search.)

@marktemi
Copy link

I will do plot_forest_importances.py

@BrandoNelly
Copy link

Hello! I can try adding an entry for

examples/cluster/plot_face_compress.py

@Mindlord-rex
Copy link

Hello @StefanieSenger,

I would like to work on plot_feature_transformation.py

@marktemi
Copy link

I will do plot_forest_importances.py

Hi @StefanieSenger,

The file examples/ensemble/plot_forest_importances.py is already properly referenced in doc/modules/feature_selection.rst (line 273) and doc/modules/ensemble.rst (line 1164).

I didn’t find any other locations where additional references are needed. Feel free to check them off the list if everything looks good to you.

@StefanieSenger
Copy link
Contributor Author

Hi @StefanieSenger,

The file examples/ensemble/plot_forest_importances.py is already properly referenced in doc/modules/feature_selection.rst (line 273) and doc/modules/ensemble.rst (line 1164).

I didn’t find any other locations where additional references are needed. Feel free to check them off the list if everything looks good to you.

Awesome, @marktemi. Thanks for your work. I have checked it off.
Feel free to grap another example or check the help wanted label for further issues.

@elhambbi
Copy link
Contributor

Hi @StefanieSenger. I was looking at plot_f_test_vs_mi.py.It is already referenced in doc/modules/feature_selection.rst line 121. You may want to check it off the list.

There are some sections about "Mutual Information" in doc/modules/clustering.rst (lines 1453 onwards). I am not sure if this example fits here too. If this is the case, let me know please. I will add it. Otherwise, we can just remove it from the list.
Thank you.

@aysh34
Copy link

aysh34 commented Mar 20, 2025

Hi, I'd like to contribute by improving the documentation for plot_classifier_comparison.py Please let me know if there are any specific guidelines or suggestions before I proceed.

@StefanieSenger
Copy link
Contributor Author

Hi @StefanieSenger. I was looking at plot_f_test_vs_mi.py.It is already referenced in doc/modules/feature_selection.rst line 121. You may want to check it off the list.

There are some sections about "Mutual Information" in doc/modules/clustering.rst (lines 1453 onwards). I am not sure if this example fits here too. If this is the case, let me know please. I will add it. Otherwise, we can just remove it from the list.
Thank you.

Hi @elhambbi, yes I agree the one link is enough. I would not add this to the clustering.rst file since it's a regression example. Thus I have checked the example off the list.

Thanks a lot for reporting back!

@ashbleu
Copy link

ashbleu commented Apr 3, 2025

Hi @StefanieSenger, I'd like to work on plot_gpr_on_structured_data!

@Aftabby
Copy link

Aftabby commented Apr 8, 2025

If the issue is still open, I'd like work on it. Kindly assign it to me. @StefanieSenger

@aashirpersonal
Copy link

Hi, I would like to work on linking the example examples/calibration/plot_compare_calibration.py. @StefanieSenger

@vivaannanavati123
Copy link

Hi @StefanieSenger! I'd like to work on adding documentation links for the following examples:

  1. plot_gmm_pdf.py
  2. plot_roc_crossval.py

I've reviewed the discussion thread and confirmed that no one has claimed these examples recently, nor do they appear to have any associated PRs or "in progress" markers.

Please let me know if there are any specific considerations I should keep in mind while working on these examples.

Thank you!

@natmokval
Copy link
Contributor

Hi @StefanieSenger, I would like to work on 'plot_gmm_covariances.py'.

@desainidhi99
Copy link

Hi @StefanieSenger -

would like to take -
plot_topics_extraction_with_nmf_lda.py. (checked the discussion forum and seems no one has taken it yet)
It is my first open source contribution in ML/AI category and excited for it !

@EngineerDanny
Copy link

Hey @StefanieSenger,
I would love to contribute and add documentation links to plot_sparse_cov.py.
plot_sparse_cov.py is still free and is the canonical example for sparse inverse-covariance estimation.

@AidenFrank
Copy link

Hello @StefanieSenger,

I would like to contribute for the first time and add a documentation link for the plot_nnls.py example in the API.

Thank you!

AidenFrank added a commit to AidenFrank/scikit-learn that referenced this issue Apr 30, 2025
This is intended to add a link to the Non-negative least squares example in the LimearRegression API page.
It is towards scikit-learn#30621.
The following example is used: `plot_nnls.py`
This example is linked in the User Guide for Linear Regression, but not anywhere on the API page.
@Manthanjain
Copy link

Hello @StefanieSenger
I would love to contribute for the first tine and add a documentation link for the plot_face_compress.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation good first issue Easy with clear instructions to resolve Meta-issue General issue associated to an identified list of tasks Sprint
Projects
None yet