|
| 1 | +=================================================== |
| 2 | +NEP 44 — Restructuring the NumPy Documentation |
| 3 | +=================================================== |
| 4 | + |
| 5 | +:Author: Ralf Gommers |
| 6 | +:Author: Melissa Mendonça |
| 7 | +:Author: Mars Lee |
| 8 | +:Status: Draft |
| 9 | +:Type: Process |
| 10 | +:Created: 2020-02-11 |
| 11 | + |
| 12 | +Abstract |
| 13 | +======== |
| 14 | + |
| 15 | +This document proposes a restructuring of the NumPy Documentation, both in form |
| 16 | +and content, with the goal of making it more organized and discoverable for |
| 17 | +beginners and experienced users. |
| 18 | + |
| 19 | +Motivation and Scope |
| 20 | +==================== |
| 21 | + |
| 22 | +See `here <https://numpy.org/devdocs/>`_ for the front page of the latest docs. |
| 23 | +The organization is quite confusing and illogical (e.g. user and developer docs |
| 24 | +are mixed). We propose the following: |
| 25 | + |
| 26 | +- Reorganizing the docs into the four categories mentioned in [1]_, namely *Tutorials*, *How Tos*, *Reference Guide* and *Explanations* (more about this below). |
| 27 | +- Creating dedicated sections for Tutorials and How-Tos, including orientation |
| 28 | + on how to create new content; |
| 29 | +- Adding an Explanations section for key concepts and techniques that require |
| 30 | + deeper descriptions, some of which will be rearranged from the Reference Guide. |
| 31 | + |
| 32 | +Usage and Impact |
| 33 | +================ |
| 34 | + |
| 35 | +The documentation is a fundamental part of any software project, especially |
| 36 | +open source projects. In the case of NumPy, many beginners might feel demotivated |
| 37 | +by the current structure of the documentation, since it is difficult to discover |
| 38 | +what to learn (unless the user has a clear view of what to look for in the |
| 39 | +Reference docs, which is not always the case). |
| 40 | + |
| 41 | +Looking at the results of a "NumPy Tutorial" search on any search engine also |
| 42 | +gives an idea of the demand for this kind of content. Having official high-level |
| 43 | +documentation written using up-to-date content and techniques will certainly |
| 44 | +mean more users (and developers/contributors) are involved in the NumPy |
| 45 | +community. |
| 46 | + |
| 47 | +Backward compatibility |
| 48 | +====================== |
| 49 | + |
| 50 | +The restructuring will effectively demand a complete rewrite of links and some |
| 51 | +of the current content. Input from the community will be useful for identifying |
| 52 | +key links and pages that should not be broken. |
| 53 | + |
| 54 | +Detailed description |
| 55 | +==================== |
| 56 | + |
| 57 | +As discussed in the article [1]_, there are four categories of doc content: |
| 58 | + |
| 59 | +- Tutorials |
| 60 | +- How-to guides |
| 61 | +- Explanations |
| 62 | +- Reference guide |
| 63 | + |
| 64 | +We propose to use those categories as the ones we use (for writing and |
| 65 | +reviewing) whenever we add a new documentation section. |
| 66 | + |
| 67 | +The reasoning for this is that it is clearer both for |
| 68 | +developers/documentation writers and to users where each piece of |
| 69 | +information should go, and the scope and tone of each document. For |
| 70 | +example, if explanations are mixed with basic tutorials, beginners |
| 71 | +might be overwhelmed and alienated. On the other hand, if the reference |
| 72 | +guide contains basic how-tos, it might be difficult for experienced |
| 73 | +users to find the information they need, quickly. |
| 74 | + |
| 75 | +Currently, there are many blogs and tutorials on the internet about NumPy or |
| 76 | +using NumPy. One of the issues with this is that if users search for this |
| 77 | +information they may end up in an outdated (unofficial) tutorial before |
| 78 | +they find the current official documentation. This can be especially |
| 79 | +confusing, especially for beginners. Having a better infrastructure for the |
| 80 | +documentation also aims to solve this problem by giving users high-level, |
| 81 | +up-to-date official documentation that can be easily updated. |
| 82 | + |
| 83 | +Status and ideas of each type of doc content |
| 84 | +-------------------------------------------- |
| 85 | + |
| 86 | +**Reference guide** |
| 87 | + |
| 88 | +NumPy has a quite complete reference guide. All functions are documented, most |
| 89 | +have examples, and most are cross-linked well with *See Also* sections. Further |
| 90 | +improving the reference guide is incremental work that can be done (and is being |
| 91 | +done) by many people. There are, however, many explanations in the reference |
| 92 | +guide. These can be moved to a more dedicated Explanations section on the docs. |
| 93 | + |
| 94 | +**How-to guides** |
| 95 | + |
| 96 | +NumPy does not have many how-to's. The subclassing and array ducktyping section |
| 97 | +may be an example of a how-to. Others that could be added are: |
| 98 | + |
| 99 | +- Parallelization (controlling BLAS multithreading with ``threadpoolctl``, using |
| 100 | + multiprocessing, random number generation, etc.) |
| 101 | +- Storing and loading data (``.npy``/``.npz`` format, text formats, Zarr, HDF5, |
| 102 | + Bloscpack, etc.) |
| 103 | +- Performance (memory layout, profiling, use with Numba, Cython, or Pythran) |
| 104 | +- Writing generic code that works with NumPy, Dask, CuPy, pydata/sparse, etc. |
| 105 | + |
| 106 | +**Explanations** |
| 107 | + |
| 108 | +There is a reasonable amount of content on fundamental NumPy concepts such as |
| 109 | +indexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could be |
| 110 | +organized better and clarified to ensure it's really about explaining the concepts |
| 111 | +and not mixed with tutorial or how-to like content. |
| 112 | + |
| 113 | +There are few explanations about anything other than those fundamental NumPy |
| 114 | +concepts. |
| 115 | + |
| 116 | +Some examples of concepts that could be expanded: |
| 117 | + |
| 118 | +- Copies vs. Views; |
| 119 | +- BLAS and other linear algebra libraries; |
| 120 | +- Fancy indexing. |
| 121 | + |
| 122 | +In addition, there are many explanations in the Reference Guide, which should be |
| 123 | +moved to this new dedicated Explanations section. |
| 124 | + |
| 125 | +**Tutorials** |
| 126 | + |
| 127 | +There's a lot of scope for writing better tutorials. We have a new *NumPy for |
| 128 | +absolute beginners tutorial* [3]_ (GSoD project of Anne Bonner). In addition we |
| 129 | +need a number of tutorials addressing different levels of experience with Python |
| 130 | +and NumPy. This could be done using engaging data sets, ideas or stories. For |
| 131 | +example, curve fitting with polynomials and functions in ``numpy.linalg`` could |
| 132 | +be done with the Keeling curve (decades worth of CO2 concentration in air |
| 133 | +measurements) rather than with synthetic random data. |
| 134 | + |
| 135 | +Ideas for tutorials (these capture the types of things that make sense, they're |
| 136 | +not necessarily the exact topics we propose to implement): |
| 137 | + |
| 138 | +- Conway's game of life with only NumPy (note: already in `Nicolas Rougier's book |
| 139 | + <https://www.labri.fr/perso/nrougier/from-python-to-numpy/#the-game-of-life>`_) |
| 140 | +- Using masked arrays to deal with missing data in time series measurements |
| 141 | +- Using Fourier transforms to analyze the Keeling curve data, and extrapolate it. |
| 142 | +- Geospatial data (e.g. lat/lon/time to create maps for every year via a stacked |
| 143 | + array, like `gridMet data <http://www.climatologylab.org/gridmet.html>`_) |
| 144 | +- Using text data and dtypes (e.g. use speeches from different people, shape |
| 145 | + ``(n_speech, n_sentences, n_words)``) |
| 146 | + |
| 147 | +The *Preparing to Teach* document [2]_ from the Software Carpentry Instructor |
| 148 | +Training materials is a nice summary of how to write effective lesson plans (and |
| 149 | +tutorials would be very similar). In addition to adding new tutorials, we also |
| 150 | +propose a *How to write a tutorial* document, which would help users contribute |
| 151 | +new high-quality content to the documentation. |
| 152 | + |
| 153 | +Data sets |
| 154 | +--------- |
| 155 | + |
| 156 | +Using interesting data in the NumPy docs requires giving all users access to |
| 157 | +that data, either inside NumPy or in a separate package. The former is not the |
| 158 | +best idea, since it's hard to do without increasing the size of NumPy |
| 159 | +significantly. Even for SciPy there has so far been no consensus on this (see |
| 160 | +`scipy PR 8707 <https://github.com/scipy/scipy/pull/8707>`_ on adding a new |
| 161 | +``scipy.datasets`` subpackage). |
| 162 | + |
| 163 | +So we'll aim for a new (pure Python) package, named ``numpy-datasets`` or |
| 164 | +``scipy-datasets`` or something similar. That package can take some lessons from |
| 165 | +how, e.g., scikit-learn ships data sets. Small data sets can be included in the |
| 166 | +repo, large data sets can be accessed via a downloader class or function. |
| 167 | + |
| 168 | +Related Work |
| 169 | +============ |
| 170 | + |
| 171 | +Some examples of documentation organization in other projects: |
| 172 | + |
| 173 | +- `Documentation for Jupyter <https://jupyter.org/documentation>`_ |
| 174 | +- `Documentation for Python <https://docs.python.org/3/>`_ |
| 175 | +- `Documentation for TensorFlow <https://www.tensorflow.org/learn>`_ |
| 176 | + |
| 177 | +These projects make the intended audience for each part of the documentation |
| 178 | +more explicit, as well as previewing some of the content in each section. |
| 179 | + |
| 180 | +Implementation |
| 181 | +============== |
| 182 | + |
| 183 | +Currently, the `documentation for NumPy <https://numpy.org/devdocs/>`_ can be |
| 184 | +confusing, especially for beginners. Our proposal is to reorganize the docs in |
| 185 | +the following structure: |
| 186 | + |
| 187 | +- For users: |
| 188 | + - Absolute Beginners Tutorial |
| 189 | + - main Tutorials section |
| 190 | + - How Tos for common tasks with NumPy |
| 191 | + - Reference Guide (API Reference) |
| 192 | + - Explanations |
| 193 | + - F2Py Guide |
| 194 | + - Glossary |
| 195 | +- For developers/contributors: |
| 196 | + - Contributor's Guide |
| 197 | + - Under-the-hood docs |
| 198 | + - Building and extending the documentation |
| 199 | + - Benchmarking |
| 200 | + - NumPy Enhancement Proposals |
| 201 | +- Meta information |
| 202 | + - Reporting bugs |
| 203 | + - Release Notes |
| 204 | + - About NumPy |
| 205 | + - License |
| 206 | + |
| 207 | +Ideas for follow-up |
| 208 | +------------------- |
| 209 | + |
| 210 | +Besides rewriting the current documentation to some extent, it would be ideal |
| 211 | +to have a technical infrastructure that would allow more contributions from the |
| 212 | +community. For example, if Jupyter Notebooks could be submitted as-is as |
| 213 | +tutorials or How-Tos, this might create more contributors and broaden the NumPy |
| 214 | +community. |
| 215 | + |
| 216 | +Similarly, if people could download some of the documentation in Notebook |
| 217 | +format, this would certainly mean people would use less outdated material for |
| 218 | +learning NumPy. |
| 219 | + |
| 220 | +It would also be interesting if the new structure for the documentation makes |
| 221 | +translations easier. |
| 222 | + |
| 223 | +Discussion |
| 224 | +========== |
| 225 | + |
| 226 | + |
| 227 | +References and Footnotes |
| 228 | +======================== |
| 229 | + |
| 230 | +.. [1] `What nobody tells you about documentation <https://www.divio.com/blog/documentation/>`_ |
| 231 | +
|
| 232 | +.. [2] `Preparing to Teach <https://carpentries.github.io/instructor-training/15-lesson-study/index.html>`_ (from the `Software Carpentry <https://software-carpentry.org/>`_ Instructor Training materials) |
| 233 | +
|
| 234 | +.. [3] `NumPy for absolute beginners Tutorial <https://numpy.org/devdocs/user/absolute_beginners.html>`_ by Anne Bonner |
| 235 | +
|
| 236 | +Copyright |
| 237 | +========= |
| 238 | + |
| 239 | +This document has been placed in the public domain. |
0 commit comments