diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst index 402711dcd1bf3..a9dc7cb305f61 100644 --- a/doc/developers/contributing.rst +++ b/doc/developers/contributing.rst @@ -7,12 +7,9 @@ Contributing .. currentmodule:: sklearn This project is a community effort, and everyone is welcome to -contribute. - -The project is hosted on https://github.com/scikit-learn/scikit-learn - +contribute. It is hosted on https://github.com/scikit-learn/scikit-learn. The decision making process and governance structure of scikit-learn is laid -out in the governance document: :ref:`governance`. +out in :ref:`governance`. Scikit-learn is somewhat :ref:`selective ` when it comes to adding new algorithms, and the best way to contribute and to help the project @@ -70,10 +67,12 @@ link to it from your website, or simply star to say "I use it": .. raw:: html - Star - +

+ + +

In case a contribution/issue involves changes to the API principles or changes to dependencies or supported versions, it must be backed by a @@ -100,9 +99,8 @@ and follows the decision-making process outlined in :ref:`governance`. * `matplotlib `__ * and so on. - Look for issues marked "help wanted" or similar. - Helping these projects may help Scikit-learn too. - See also :ref:`related_projects`. + Look for issues marked "help wanted" or similar. Helping these projects may help + scikit-learn too. See also :ref:`related_projects`. Submitting a bug report or a feature request ============================================ @@ -135,17 +133,15 @@ following rules before submitting: How to make a good bug report ----------------------------- -When you submit an issue to `Github +When you submit an issue to `GitHub `__, please do your best to follow these guidelines! This will make it a lot easier to provide you with good feedback: - The ideal bug report contains a :ref:`short reproducible code snippet - `, this way - anyone can try to reproduce the bug easily (see `this - `_ for more details). If your snippet is - longer than around 50 lines, please link to a `gist - `_ or a github repo. + `, this way anyone can try to reproduce the bug easily. If your + snippet is longer than around 50 lines, please link to a `Gist + `_ or a GitHub repo. - If not feasible to include a reproducible snippet, please be specific about what **estimators and/or functions are involved and the shape of the data**. @@ -154,18 +150,18 @@ feedback: - Please include your **operating system type and version number**, as well as your **Python, scikit-learn, numpy, and scipy versions**. This information - can be found by running the following code snippet:: + can be found by running: - >>> import sklearn - >>> sklearn.show_versions() # doctest: +SKIP + .. prompt:: bash + + python -c "import sklearn; sklearn.show_versions()" - Please ensure all **code snippets and error messages are formatted in appropriate code blocks**. See `Creating and highlighting code blocks `_ for more details. -If you want to help curate issues, read :ref:`the following -`. +If you want to help curate issues, read about :ref:`bug_triaging`. Contributing code ================= @@ -241,7 +237,7 @@ how to set up your git repository: 3. Clone your fork of the scikit-learn repo from your GitHub account to your local disk: - .. prompt:: bash $ + .. prompt:: bash git clone git@github.com:YourLogin/scikit-learn.git # add --depth 1 if your connection is slow cd scikit-learn @@ -251,7 +247,7 @@ how to set up your git repository: 5. Install the development dependencies: - .. prompt:: bash $ + .. prompt:: bash pip install pytest pytest-cov ruff mypy numpydoc black==24.3.0 @@ -261,12 +257,14 @@ how to set up your git repository: scikit-learn repository, which you can use to keep your repository synchronized with the latest changes: - .. prompt:: bash $ + .. prompt:: bash git remote add upstream git@github.com:scikit-learn/scikit-learn.git 7. Check that the `upstream` and `origin` remote aliases are configured correctly - by running `git remote -v` which should display:: + by running `git remote -v` which should display: + + .. code-block:: text origin git@github.com:YourLogin/scikit-learn.git (fetch) origin git@github.com:YourLogin/scikit-learn.git (push) @@ -282,7 +280,7 @@ The next steps now describe the process of modifying code and submitting a PR: 8. Synchronize your ``main`` branch with the ``upstream/main`` branch, more details on `GitHub Docs `_: - .. prompt:: bash $ + .. prompt:: bash git checkout main git fetch upstream @@ -290,7 +288,7 @@ The next steps now describe the process of modifying code and submitting a PR: 9. Create a feature branch to hold your development changes: - .. prompt:: bash $ + .. prompt:: bash git checkout -b my_feature @@ -300,7 +298,7 @@ The next steps now describe the process of modifying code and submitting a PR: 10. (**Optional**) Install `pre-commit `_ to run code style checks before each commit: - .. prompt:: bash $ + .. prompt:: bash pip install pre-commit pre-commit install @@ -312,7 +310,7 @@ The next steps now describe the process of modifying code and submitting a PR: do the version control. When you're done editing, add changed files using ``git add`` and then ``git commit``: - .. prompt:: bash $ + .. prompt:: bash git add modified_files git commit @@ -320,7 +318,7 @@ The next steps now describe the process of modifying code and submitting a PR: to record your changes in Git, then push the changes to your GitHub account with: - .. prompt:: bash $ + .. prompt:: bash git push -u origin my_feature @@ -333,7 +331,7 @@ The next steps now describe the process of modifying code and submitting a PR: It is often helpful to keep your local feature branch synchronized with the latest changes of the main scikit-learn repository: -.. prompt:: bash $ +.. prompt:: bash git fetch upstream git merge upstream/main @@ -343,7 +341,7 @@ Subsequently, you might need to solve the conflicts. You can refer to the line `_. -.. topic:: Learning git: +.. topic:: Learning Git The `Git documentation `_ and http://try.github.io are excellent resources to get started with git, @@ -355,19 +353,18 @@ Pull request checklist ---------------------- Before a PR can be merged, it needs to be approved by two core developers. -Please prefix the title of your pull request with ``[MRG]`` if the -contribution is complete and should be subjected to a detailed review. An -incomplete contribution -- where you expect to do more work before receiving -a full review -- should be prefixed ``[WIP]`` (to indicate a work in -progress) and changed to ``[MRG]`` when it matures. WIPs may be useful to: +An incomplete contribution -- where you expect to do more work before receiving +a full review -- should be marked as a `draft pull request +`__ +and changed to "ready for review" when it matures. Draft PRs may be useful to: indicate you are working on something to avoid duplicated work, request -broad review of functionality or API, or seek collaborators. WIPs often +broad review of functionality or API, or seek collaborators. Draft PRs often benefit from the inclusion of a `task list `_ in the PR description. In order to ease the reviewing process, we recommend that your contribution -complies with the following rules before marking a PR as ``[MRG]``. The +complies with the following rules before marking a PR as "ready for review". The **bolded** ones are especially important: 1. **Give your pull request a helpful title** that summarizes what your @@ -414,13 +411,11 @@ complies with the following rules before marking a PR as ``[MRG]``. The non-regression tests should fail for the code base in the ``main`` branch and pass for the PR code. - 5. Follow the :ref:`coding-guidelines`. - -6. When applicable, use the validation tools and scripts in the - ``sklearn.utils`` submodule. A list of utility routines available - for developers can be found in the :ref:`developers-utils` page. +6. When applicable, use the validation tools and scripts in the :mod:`sklearn.utils` + module. A list of utility routines available for developers can be found in the + :ref:`developers-utils` page. 7. Often pull requests resolve one or more other issues (or pull requests). If merging your pull request means that some other issues/PRs should @@ -429,61 +424,59 @@ complies with the following rules before marking a PR as ``[MRG]``. The (e.g., ``Fixes #1234``; multiple issues/PRs are allowed as long as each one is preceded by a keyword). Upon merging, those issues/PRs will automatically be closed by GitHub. If your pull request is simply - related to some other issues/PRs, create a link to them without using - the keywords (e.g., ``See also #1234``). + related to some other issues/PRs, or it only partially resolves the target + issue, create a link to them without using the keywords (e.g., ``Towards #1234``). 8. PRs should often substantiate the change, through benchmarks of - performance and efficiency (see :ref:`monitoring_performances`) or through - examples of usage. Examples also illustrate the features and intricacies of - the library to users. Have a look at other examples in the `examples/ - `_ - directory for reference. Examples should demonstrate why the new - functionality is useful in practice and, if possible, compare it to other - methods available in scikit-learn. + performance and efficiency (see :ref:`monitoring_performances`) or through + examples of usage. Examples also illustrate the features and intricacies of + the library to users. Have a look at other examples in the `examples/ + `_ + directory for reference. Examples should demonstrate why the new + functionality is useful in practice and, if possible, compare it to other + methods available in scikit-learn. 9. New features have some maintenance overhead. We expect PR authors - to take part in the maintenance for the code they submit, at least - initially. New features need to be illustrated with narrative - documentation in the user guide, with small code snippets. - If relevant, please also add references in the literature, with PDF links - when possible. + to take part in the maintenance for the code they submit, at least + initially. New features need to be illustrated with narrative + documentation in the user guide, with small code snippets. + If relevant, please also add references in the literature, with PDF links + when possible. 10. The user guide should also include expected time and space complexity of the algorithm and scalability, e.g. "this algorithm can scale to a large number of samples > 100000, but does not scale in dimensionality: - n_features is expected to be lower than 100". + `n_features` is expected to be lower than 100". You can also check our :ref:`code_review` to get an idea of what reviewers will expect. You can check for common programming errors with the following tools: -* Code with a good unittest coverage (at least 80%, better 100%), check - with: +* Code with a good unit test coverage (at least 80%, better 100%), check with: - .. prompt:: bash $ + .. prompt:: bash pip install pytest pytest-cov - pytest --cov sklearn path/to/tests_for_package + pytest --cov sklearn path/to/tests - see also :ref:`testing_coverage` + See also :ref:`testing_coverage`. - Run static analysis with `mypy`: +* Run static analysis with `mypy`: - .. prompt:: bash $ + .. prompt:: bash mypy sklearn - must not produce new errors in your pull request. Using `# type: ignore` + This must not produce new errors in your pull request. Using `# type: ignore` annotation can be a workaround for a few cases that are not supported by mypy, in particular, - - when importing C or Cython modules - - on properties with decorators + - when importing C or Cython modules, + - on properties with decorators. Bonus points for contributions that include a performance analysis with a benchmark script and profiling output (see :ref:`monitoring_performances`). - Also check out the :ref:`performance-howto` guide for more details on profiling and Cython optimizations. @@ -494,7 +487,7 @@ profiling and Cython optimizations. on all new contributions will get the overall code base quality in the right direction. -.. note:: +.. seealso:: For two very well documented and more detailed guides on development workflow, please pay a visit to the `Scipy Development Workflow @@ -546,9 +539,7 @@ Stalled pull requests As contributing a feature can be a lengthy process, some pull requests appear inactive but unfinished. In such a case, taking -them over is a great service for the project. - -A good etiquette to take over is: +them over is a great service for the project. A good etiquette to take over is: * **Determine if a PR is stalled** @@ -615,33 +606,32 @@ the contributor become familiar with the contribution workflow, and for the core devs to become acquainted with the contributor; besides which, we frequently underestimate how easy an issue is to solve! -.. topic:: good first issue tag - - A great way to start contributing to scikit-learn is to pick an item from - the list of `good first issues - `_ - in the issue tracker. Resolving these issues allow you to start contributing - to the project without much prior knowledge. If you have already contributed - to scikit-learn, you should look at Easy issues instead. +- **Good first issue tag** -.. topic:: Easy tag + A great way to start contributing to scikit-learn is to pick an item from + the list of `good first issues + `_ + in the issue tracker. Resolving these issues allow you to start contributing + to the project without much prior knowledge. If you have already contributed + to scikit-learn, you should look at Easy issues instead. - If you have already contributed to scikit-learn, another great way to contribute - to scikit-learn is to pick an item from the list of `Easy issues - `_ in the issue - tracker. Your assistance in this area will be greatly appreciated by the - more experienced developers as it helps free up their time to concentrate on - other issues. +- **Easy tag** -.. topic:: help wanted tag + If you have already contributed to scikit-learn, another great way to contribute + to scikit-learn is to pick an item from the list of `Easy issues + `_ in the issue + tracker. Your assistance in this area will be greatly appreciated by the + more experienced developers as it helps free up their time to concentrate on + other issues. - We often use the help wanted tag to mark issues regardless of difficulty. Additionally, - we use the help wanted tag to mark Pull Requests which have been abandoned - by their original contributor and are available for someone to pick up where the original - contributor left off. The list of issues with the help wanted tag can be found - `here `_. +- **Help wanted tag** - Note that not all issues which need contributors will have this tag. + We often use the help wanted tag to mark issues regardless of difficulty. + Additionally, we use the help wanted tag to mark Pull Requests which have been + abandoned by their original contributor and are available for someone to pick up where + the original contributor left off. The list of issues with the help wanted tag can be + found `here `_. + Note that not all issues which need contributors will have this tag. .. _contribute_documentation: @@ -650,44 +640,49 @@ Documentation We are glad to accept any sort of documentation: -* **function/method/class docstrings** (also known as "API documentation") - - these describe what the object does and details any parameters, attributes and - methods. Docstrings live alongside the code in - `sklearn/ `_. -* **user guide** - these provide more detailed information about the algorithms +* **Function/method/class docstrings:** Also known as "API documentation", these + describe what the object does and details any parameters, attributes and + methods. Docstrings live alongside the code in `sklearn/ + `_, and are generated + generated according to `doc/api_reference.py + `_. To + add, update, remove, or deprecate a public API that is listed in :ref:`api_ref`, this + is the place to look at. +* **User guide:** These provide more detailed information about the algorithms implemented in scikit-learn and generally live in the root `doc/ `_ directory - and - `doc/modules/ `_. -* **tutorials** - these introduce various statistical learning and machine learning + and `doc/modules/ `_. +* **Tutorials:** These introduce various statistical learning and machine learning concepts and are located in `doc/tutorial `_. -* **examples** - these provide full code examples that may demonstrate the use +* **Examples:** These provide full code examples that may demonstrate the use of scikit-learn modules, compare different algorithms or discuss their - interpretation etc. Examples live in - `examples/ `_ -* **other reStructuredText documents** - provide various other - useful information (e.g., the :ref:`contributing` guide) and live in + interpretation, etc. Examples live in + `examples/ `_. +* **Other reStructuredText documents:** These provide various other useful information + (e.g., the :ref:`contributing` guide) and live in `doc/ `_. .. dropdown:: Guidelines for writing docstrings * When documenting the parameters and attributes, here is a list of some - well-formatted examples:: + well-formatted examples + + .. code-block:: text n_clusters : int, default=3 The number of clusters detected by the algorithm. - some_param : {'hello', 'goodbye'}, bool or int, default=True + some_param : {"hello", "goodbye"}, bool or int, default=True The parameter description goes here, which can be either a string literal (either `hello` or `goodbye`), a bool, or an int. The default value is True. - array_parameter : {array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples,) + array_parameter : {array-like, sparse matrix} of shape (n_samples, n_features) \ + or (n_samples,) This parameter accepts data in either of the mentioned forms, with one - of the mentioned shapes. The default value is - `np.ones(shape=(n_samples,))`. + of the mentioned shapes. The default value is `np.ones(shape=(n_samples,))`. list_param : list of int @@ -727,7 +722,9 @@ We are glad to accept any sort of documentation: * Add "See Also" in docstrings for related classes/functions. * "See Also" in docstrings should be one line per reference, with a colon and an - explanation, for example:: + explanation, for example: + + .. code-block:: text See Also -------- @@ -769,7 +766,9 @@ We are glad to accept any sort of documentation: backticks should be used nowadays. * Too much information makes it difficult for users to access the content they - are interested in. Use dropdowns to factorize it by using the following syntax:: + are interested in. Use dropdowns to factorize it by using the following syntax + + .. code-block:: rst .. dropdown:: Dropdown title @@ -808,13 +807,13 @@ We are glad to accept any sort of documentation: use the sphinx directives `:arxiv:` or `:doi:`. For example, see references in :ref:`Spectral Clustering Graphs `. - * For "References" in docstrings, see the Silhouette Coefficient - (:func:`sklearn.metrics.silhouette_score`). + * For the "References" section in docstrings, see + :func:`sklearn.metrics.silhouette_score` as an example. * To cross-reference to other pages in the scikit-learn documentation use the reStructuredText cross-referencing syntax: - * Section - to link to an arbitrary section in the documentation, use + * **Section:** to link to an arbitrary section in the documentation, use reference labels (see `Sphinx docs `_). For example: @@ -834,13 +833,13 @@ We are glad to accept any sort of documentation: existing cross references and external links pointing to specific sections in the scikit-learn documentation. - * Glossary - linking to a term in the :ref:`glossary`: + * **Glossary:** linking to a term in the :ref:`glossary`: .. code-block:: rst :term:`cross_validation` - * Function - to link to the documentation of a function, use the full import + * **Function:** to link to the documentation of a function, use the full import path to the function: .. code-block:: rst @@ -857,8 +856,8 @@ We are glad to accept any sort of documentation: :func:`cross_val_score` - * Class - to link to documentation of a class, use the full import path to the - class, unless there is a 'currentmodule' directive in the document above + * **Class:** to link to documentation of a class, use the full import path to the + class, unless there is a `.. currentmodule::` directive in the document above (see above): .. code-block:: rst @@ -867,8 +866,12 @@ We are glad to accept any sort of documentation: You can edit the documentation using any text editor, and then generate the HTML output by following :ref:`building_documentation`. The resulting HTML files -will be placed in ``_build/html/stable`` and are viewable in a web browser, for -instance by opening the local ``_build/html/stable/index.html`` file. +will be placed in ``_build/html/`` and are viewable in a web browser, for instance by +opening the local ``_build/html/index.html`` file or by running a local server + +.. prompt:: bash + + python -m http.server -d _build/html .. _building_documentation: @@ -879,15 +882,14 @@ Building the documentation **Before submitting a pull request check if your modifications have introduced new sphinx warnings by building the documentation locally and try to fix them.** -First, make sure you have :ref:`properly installed ` -the development version. +First, make sure you have :ref:`properly installed ` the +development version. On top of that, building the documentation requires installing some +additional packages: .. packaging is not needed once setuptools starts shipping packaging>=17.0 -Building the documentation requires installing some additional packages: - -.. prompt:: bash $ +.. prompt:: bash pip install sphinx sphinx-gallery numpydoc matplotlib Pillow pandas \ polars scikit-image packaging seaborn sphinx-prompt \ @@ -897,14 +899,14 @@ Building the documentation requires installing some additional packages: To build the documentation, you need to be in the ``doc`` folder: -.. prompt:: bash $ +.. prompt:: bash cd doc In the vast majority of cases, you only need to generate the full web site, without the example gallery: -.. prompt:: bash $ +.. prompt:: bash make @@ -913,25 +915,22 @@ and are viewable in a web browser, for instance by opening the local ``_build/html/stable/index.html`` file. To also generate the example gallery you can use: -.. prompt:: bash $ +.. prompt:: bash make html -This will run all the examples, which takes a while. If you only want to -generate a few examples, you can use: +This will run all the examples, which takes a while. If you only want to generate a few +examples, which is particularly useful if you are modifying only a few examples, you can +use: -.. prompt:: bash $ +.. prompt:: bash EXAMPLES_PATTERN=your_regex_goes_here make html -This is particularly useful if you are modifying a few examples. - -Set the environment variable `NO_MATHJAX=1` if you intend to view -the documentation in an offline setting. +Set the environment variable `NO_MATHJAX=1` if you intend to view the documentation in +an offline setting. To build the PDF manual, run: -To build the PDF manual, run: - -.. prompt:: bash $ +.. prompt:: bash make latexpdf @@ -977,18 +976,17 @@ subpackages. For a more detailed `pytest` workflow, please refer to the We expect code coverage of new features to be at least around 90%. -.. dropdown:: Writing matplotlib related tests +.. dropdown:: Writing matplotlib-related tests Test fixtures ensure that a set of tests will be executing with the appropriate - initialization and cleanup. The scikit-learn test suite implements a fixture - which can be used with ``matplotlib``. + initialization and cleanup. The scikit-learn test suite implements a ``pyplot`` + fixture which can be used with ``matplotlib``. - ``pyplot`` - The ``pyplot`` fixture should be used when a test function is dealing with - ``matplotlib``. ``matplotlib`` is a soft dependency and is not required. - This fixture is in charge of skipping the tests if ``matplotlib`` is not - installed. In addition, figures created during the tests will be - automatically closed once the test function has been executed. + The ``pyplot`` fixture should be used when a test function is dealing with + ``matplotlib``. ``matplotlib`` is a soft dependency and is not required. + This fixture is in charge of skipping the tests if ``matplotlib`` is not + installed. In addition, figures created during the tests will be + automatically closed once the test function has been executed. To use this fixture in a test function, one needs to pass it as an argument:: @@ -999,13 +997,13 @@ We expect code coverage of new features to be at least around 90%. .. dropdown:: Workflow to improve test coverage To test code coverage, you need to install the `coverage - `_ package in addition to pytest. + `_ package in addition to `pytest`. - 1. Run 'make test-coverage'. The output lists for each file the line - numbers that are not tested. + 1. Run `make test-coverage`. The output lists for each file the line + numbers that are not tested. 2. Find a low hanging fruit, looking at which lines are not tested, - write or adapt a test specifically for these lines. + write or adapt a test specifically for these lines. 3. Loop. @@ -1021,8 +1019,9 @@ When proposing changes to the existing code base, it's important to make sure that they don't introduce performance regressions. Scikit-learn uses `asv benchmarks `_ to monitor the performance of a selection of common estimators and functions. You can view -these benchmarks on the `scikit-learn benchmark page `_. -The corresponding benchmark suite can be found in the `scikit-learn/asv_benchmarks` directory. +these benchmarks on the `scikit-learn benchmark page +`_. +The corresponding benchmark suite can be found in the `asv_benchmarks/` directory. To use all features of asv, you will need either `conda` or `virtualenv`. For more details please check the `asv installation webpage @@ -1030,20 +1029,20 @@ more details please check the `asv installation webpage First of all you need to install the development version of asv: -.. prompt:: bash $ +.. prompt:: bash pip install git+https://github.com/airspeed-velocity/asv and change your directory to `asv_benchmarks/`: -.. prompt:: bash $ +.. prompt:: bash - cd asv_benchmarks/ + cd asv_benchmarks The benchmark suite is configured to run against your local clone of scikit-learn. Make sure it is up to date: -.. prompt:: bash $ +.. prompt:: bash git fetch upstream @@ -1051,20 +1050,20 @@ In the benchmark suite, the benchmarks are organized following the same structure as scikit-learn. For example, you can compare the performance of a specific estimator between ``upstream/main`` and the branch you are working on: -.. prompt:: bash $ +.. prompt:: bash asv continuous -b LogisticRegression upstream/main HEAD The command uses conda by default for creating the benchmark environments. If you want to use virtualenv instead, use the `-E` flag: -.. prompt:: bash $ +.. prompt:: bash asv continuous -E virtualenv -b LogisticRegression upstream/main HEAD You can also specify a whole module to benchmark: -.. prompt:: bash $ +.. prompt:: bash asv continuous -b linear_model upstream/main HEAD @@ -1074,7 +1073,7 @@ the `-f` flag. To run the full benchmark suite, simply remove the `-b` flag : -.. prompt:: bash $ +.. prompt:: bash asv continuous upstream/main HEAD @@ -1084,14 +1083,14 @@ expression for a more complex subset of benchmarks to run. To run the benchmarks without comparing to another branch, use the `run` command: -.. prompt:: bash $ +.. prompt:: bash asv run -b linear_model HEAD^! You can also run the benchmark suite using the version of scikit-learn already installed in your current Python environment: -.. prompt:: bash $ +.. prompt:: bash asv run --python=same @@ -1100,20 +1099,20 @@ avoid creating a new environment each time you run the benchmarks. By default the results are not saved when using an existing installation. To save the results you must specify a commit hash: -.. prompt:: bash $ +.. prompt:: bash asv run --python=same --set-commit-hash= Benchmarks are saved and organized by machine, environment and commit. To see the list of all saved benchmarks: -.. prompt:: bash $ +.. prompt:: bash asv show and to see the report of a specific run: -.. prompt:: bash $ +.. prompt:: bash asv show @@ -1136,11 +1135,11 @@ All issues and pull requests on the `GitHub issue tracker `_ should have (at least) one of the following tags: -:Bug / Crash: +:Bug: Something is happening that clearly shouldn't happen. Wrong results as well as unexpected errors from estimators go here. -:Cleanup / Enhancement: +:Enhancement: Improving performance, usability, consistency. :Documentation: @@ -1151,7 +1150,7 @@ should have (at least) one of the following tags: There are four other tags to help new contributors: -:good first issue: +:Good first issue: This issue is ideal for a first contribution to scikit-learn. Ask for help if the formulation is unclear. If you have already contributed to scikit-learn, look at Easy issues instead. @@ -1163,7 +1162,7 @@ There are four other tags to help new contributors: Might need some knowledge of machine learning or the package, but is still approachable for someone new to the project. -:help wanted: +:Help wanted: This tag marks an issue which currently lacks a contributor or a PR that needs another contributor to take over the work. These issues can range in difficulty, and may not be approachable @@ -1180,12 +1179,15 @@ Maintaining backwards compatibility Deprecation ----------- -If any publicly accessible method, function, attribute or parameter -is renamed, we still support the old one for two releases and issue -a deprecation warning when it is called/passed/accessed. -E.g., if the function ``zero_one`` is renamed to ``zero_one_loss``, -we add the decorator ``deprecated`` (from ``sklearn.utils``) -to ``zero_one`` and call ``zero_one_loss`` from that function:: +If any publicly accessible class, function, method, attribute or parameter is renamed, +we still support the old one for two releases and issue a deprecation warning when it is +called, passed, or accessed. + +.. rubric:: Deprecating a class or a function + +Suppose the function ``zero_one`` is renamed to ``zero_one_loss``, we add the decorator +:class:`utils.deprecated` to ``zero_one`` and call ``zero_one_loss`` from that +function:: from ..utils import deprecated @@ -1193,36 +1195,47 @@ to ``zero_one`` and call ``zero_one_loss`` from that function:: # actual implementation pass - @deprecated("Function 'zero_one' was renamed to 'zero_one_loss' " - "in version 0.13 and will be removed in release 0.15. " - "Default behavior is changed from 'normalize=False' to " - "'normalize=True'") + @deprecated( + "Function `zero_one` was renamed to `zero_one_loss` in 0.13 and will be " + "removed in 0.15. Default behavior is changed from `normalize=False` to " + "`normalize=True`" + ) def zero_one(y_true, y_pred, normalize=False): return zero_one_loss(y_true, y_pred, normalize) -If an attribute is to be deprecated, -use the decorator ``deprecated`` on a property. Please note that the -``deprecated`` decorator should be placed before the ``property`` -decorator for the docstrings to be rendered properly. -E.g., renaming an attribute ``labels_`` to ``classes_`` can be done as:: +One also needs to move ``zero_one`` from ``API_REFERENCE`` to +``DEPRECATED_API_REFERENCE`` and add ``zero_one_loss`` to ``API_REFERENCE`` in the +``doc/api_reference.py`` file to reflect the changes in :ref:`api_ref`. + +.. rubric:: Deprecating an attribute or a method + +If an attribute or a method is to be deprecated, use the decorator +:class:`~utils.deprecated` on the property. Please note that the +:class:`~utils.deprecated` decorator should be placed before the ``property`` decorator +if there is one, so that the docstrings can be rendered properly. For instance, renaming +an attribute ``labels_`` to ``classes_`` can be done as:: - @deprecated("Attribute `labels_` was deprecated in version 0.13 and " - "will be removed in 0.15. Use `classes_` instead") + @deprecated( + "Attribute `labels_` was deprecated in 0.13 and will be removed in 0.15. Use " + "`classes_` instead" + ) @property def labels_(self): return self.classes_ -If a parameter has to be deprecated, a ``FutureWarning`` warning -must be raised too. -In the following example, k is deprecated and renamed to n_clusters:: +.. rubric:: Deprecating a parameter + +If a parameter has to be deprecated, a ``FutureWarning`` warning must be raised +manually. In the following example, ``k`` is deprecated and renamed to n_clusters:: import warnings - def example_function(n_clusters=8, k='deprecated'): - if k != 'deprecated': - warnings.warn("'k' was renamed to n_clusters in version 0.13 and " - "will be removed in 0.15.", - FutureWarning) + def example_function(n_clusters=8, k="deprecated"): + if k != "deprecated": + warnings.warn( + "`k` was renamed to `n_clusters` in 0.13 and will be removed in 0.15", + FutureWarning, + ) n_clusters = k When the change is in a class, we validate and raise warning in ``fit``:: @@ -1235,10 +1248,11 @@ When the change is in a class, we validate and raise warning in ``fit``:: self.k = k def fit(self, X, y): - if self.k != 'deprecated': - warnings.warn("'k' was renamed to n_clusters in version 0.13 and " - "will be removed in 0.15.", - FutureWarning) + if self.k != "deprecated": + warnings.warn( + "`k` was renamed to `n_clusters` in 0.13 and will be removed in 0.15.", + FutureWarning, + ) self._n_clusters = self.k else: self._n_clusters = self.n_clusters @@ -1252,9 +1266,14 @@ adapt their code to the new behaviour. For example, if the deprecation happened in version 0.18-dev, the message should say it happened in version 0.18 and the old behavior will be removed in version 0.20. +The warning message should also include a brief explanation of the change and point +users to an alternative. + In addition, a deprecation note should be added in the docstring, recalling the same information as the deprecation warning as explained above. Use the -``.. deprecated::`` directive:: +``.. deprecated::`` directive: + +.. code-block:: rst .. deprecated:: 0.13 ``k`` was renamed to ``n_clusters`` in version 0.13 and will be removed @@ -1270,7 +1289,7 @@ Change the default value of a parameter --------------------------------------- If the default value of a parameter needs to be changed, please replace the -default value with a specific value (e.g., ``warn``) and raise +default value with a specific value (e.g., ``"warn"``) and raise ``FutureWarning`` when users are using the default value. The following example assumes that the current version is 0.20 and that we change the default value of ``n_clusters`` from 5 (old default for 0.20) to 10 @@ -1278,10 +1297,12 @@ default value of ``n_clusters`` from 5 (old default for 0.20) to 10 import warnings - def example_function(n_clusters='warn'): - if n_clusters == 'warn': - warnings.warn("The default value of n_clusters will change from " - "5 to 10 in 0.22.", FutureWarning) + def example_function(n_clusters="warn"): + if n_clusters == "warn": + warnings.warn( + "The default value of `n_clusters` will change from 5 to 10 in 0.22.", + FutureWarning, + ) n_clusters = 5 When the change is in a class, we validate and raise warning in ``fit``:: @@ -1289,22 +1310,26 @@ When the change is in a class, we validate and raise warning in ``fit``:: import warnings class ExampleEstimator: - def __init__(self, n_clusters='warn'): + def __init__(self, n_clusters="warn"): self.n_clusters = n_clusters def fit(self, X, y): - if self.n_clusters == 'warn': - warnings.warn("The default value of n_clusters will change from " - "5 to 10 in 0.22.", FutureWarning) - self._n_clusters = 5 + if self.n_clusters == "warn": + warnings.warn( + "The default value of `n_clusters` will change from 5 to 10 in 0.22.", + FutureWarning, + ) + self._n_clusters = 5 Similar to deprecations, the warning message should always give both the version in which the change happened and the version in which the old behavior will be removed. The parameter description in the docstring needs to be updated accordingly by adding -a `versionchanged` directive with the old and new default value, pointing to the -version when the change will be effective:: +a ``versionchanged`` directive with the old and new default value, pointing to the +version when the change will be effective: + +.. code-block:: rst .. versionchanged:: 0.22 The default value for `n_clusters` will change from 5 to 10 in version 0.22. @@ -1314,12 +1339,11 @@ not in other cases. The warning should be caught in all other tests (using e.g., ``@pytest.mark.filterwarnings``), and there should be no warning in the examples. -.. currentmodule:: sklearn - .. _code_review: Code Review Guidelines ====================== + Reviewing code contributed to the project as PRs is a crucial component of scikit-learn development. We encourage anyone to start reviewing code of other developers. The code review process is often highly educational for everybody @@ -1438,9 +1462,9 @@ make this task easier and faster (in no particular order). relevant, and which are not. In scikit-learn **a lot** of input checking is performed, especially at the beginning of the :term:`fit` methods. Sometimes, only a very small portion of the code is doing the actual job. - For example looking at the ``fit()`` method of + For example looking at the :meth:`~linear_model.LinearRegression.fit` method of :class:`~linear_model.LinearRegression`, what you're looking for - might just be the call the ``scipy.linalg.lstsq``, but it is buried into + might just be the call the :func:`scipy.linalg.lstsq`, but it is buried into multiple lines of input checking and the handling of different kinds of parameters. - Due to the use of `Inheritance @@ -1470,7 +1494,7 @@ make this task easier and faster (in no particular order). IDE goes a long way towards digesting the code base. Being able to quickly jump (or *peek*) to a function/class/attribute definition helps a lot. So does being able to quickly see where a given name is used in a file. - - `git `_ also has some built-in killer + - `Git `_ also has some built-in killer features. It is often useful to understand how a file changed over time, using e.g. ``git blame`` (`manual `_). This can also be done directly @@ -1482,7 +1506,7 @@ make this task easier and faster (in no particular order). - Configure `git blame` to ignore the commit that migrated the code style to `black`. - .. prompt:: bash $ + .. prompt:: bash git config blame.ignoreRevsFile .git-blame-ignore-revs