Skip to content

Commit df40e55

Browse files
authored
Merge pull request numpy#15554 from melissawm/nep44-docs
NEP 44 - Restructuring the NumPy Documentation
2 parents 5eff78b + 17bab2d commit df40e55

File tree

1 file changed

+239
-0
lines changed

1 file changed

+239
-0
lines changed
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
===================================================
2+
NEP 44 — Restructuring the NumPy Documentation
3+
===================================================
4+
5+
:Author: Ralf Gommers
6+
:Author: Melissa Mendonça
7+
:Author: Mars Lee
8+
:Status: Draft
9+
:Type: Process
10+
:Created: 2020-02-11
11+
12+
Abstract
13+
========
14+
15+
This document proposes a restructuring of the NumPy Documentation, both in form
16+
and content, with the goal of making it more organized and discoverable for
17+
beginners and experienced users.
18+
19+
Motivation and Scope
20+
====================
21+
22+
See `here <https://numpy.org/devdocs/>`_ for the front page of the latest docs.
23+
The organization is quite confusing and illogical (e.g. user and developer docs
24+
are mixed). We propose the following:
25+
26+
- Reorganizing the docs into the four categories mentioned in [1]_, namely *Tutorials*, *How Tos*, *Reference Guide* and *Explanations* (more about this below).
27+
- Creating dedicated sections for Tutorials and How-Tos, including orientation
28+
on how to create new content;
29+
- Adding an Explanations section for key concepts and techniques that require
30+
deeper descriptions, some of which will be rearranged from the Reference Guide.
31+
32+
Usage and Impact
33+
================
34+
35+
The documentation is a fundamental part of any software project, especially
36+
open source projects. In the case of NumPy, many beginners might feel demotivated
37+
by the current structure of the documentation, since it is difficult to discover
38+
what to learn (unless the user has a clear view of what to look for in the
39+
Reference docs, which is not always the case).
40+
41+
Looking at the results of a "NumPy Tutorial" search on any search engine also
42+
gives an idea of the demand for this kind of content. Having official high-level
43+
documentation written using up-to-date content and techniques will certainly
44+
mean more users (and developers/contributors) are involved in the NumPy
45+
community.
46+
47+
Backward compatibility
48+
======================
49+
50+
The restructuring will effectively demand a complete rewrite of links and some
51+
of the current content. Input from the community will be useful for identifying
52+
key links and pages that should not be broken.
53+
54+
Detailed description
55+
====================
56+
57+
As discussed in the article [1]_, there are four categories of doc content:
58+
59+
- Tutorials
60+
- How-to guides
61+
- Explanations
62+
- Reference guide
63+
64+
We propose to use those categories as the ones we use (for writing and
65+
reviewing) whenever we add a new documentation section.
66+
67+
The reasoning for this is that it is clearer both for
68+
developers/documentation writers and to users where each piece of
69+
information should go, and the scope and tone of each document. For
70+
example, if explanations are mixed with basic tutorials, beginners
71+
might be overwhelmed and alienated. On the other hand, if the reference
72+
guide contains basic how-tos, it might be difficult for experienced
73+
users to find the information they need, quickly.
74+
75+
Currently, there are many blogs and tutorials on the internet about NumPy or
76+
using NumPy. One of the issues with this is that if users search for this
77+
information they may end up in an outdated (unofficial) tutorial before
78+
they find the current official documentation. This can be especially
79+
confusing, especially for beginners. Having a better infrastructure for the
80+
documentation also aims to solve this problem by giving users high-level,
81+
up-to-date official documentation that can be easily updated.
82+
83+
Status and ideas of each type of doc content
84+
--------------------------------------------
85+
86+
**Reference guide**
87+
88+
NumPy has a quite complete reference guide. All functions are documented, most
89+
have examples, and most are cross-linked well with *See Also* sections. Further
90+
improving the reference guide is incremental work that can be done (and is being
91+
done) by many people. There are, however, many explanations in the reference
92+
guide. These can be moved to a more dedicated Explanations section on the docs.
93+
94+
**How-to guides**
95+
96+
NumPy does not have many how-to's. The subclassing and array ducktyping section
97+
may be an example of a how-to. Others that could be added are:
98+
99+
- Parallelization (controlling BLAS multithreading with ``threadpoolctl``, using
100+
multiprocessing, random number generation, etc.)
101+
- Storing and loading data (``.npy``/``.npz`` format, text formats, Zarr, HDF5,
102+
Bloscpack, etc.)
103+
- Performance (memory layout, profiling, use with Numba, Cython, or Pythran)
104+
- Writing generic code that works with NumPy, Dask, CuPy, pydata/sparse, etc.
105+
106+
**Explanations**
107+
108+
There is a reasonable amount of content on fundamental NumPy concepts such as
109+
indexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could be
110+
organized better and clarified to ensure it's really about explaining the concepts
111+
and not mixed with tutorial or how-to like content.
112+
113+
There are few explanations about anything other than those fundamental NumPy
114+
concepts.
115+
116+
Some examples of concepts that could be expanded:
117+
118+
- Copies vs. Views;
119+
- BLAS and other linear algebra libraries;
120+
- Fancy indexing.
121+
122+
In addition, there are many explanations in the Reference Guide, which should be
123+
moved to this new dedicated Explanations section.
124+
125+
**Tutorials**
126+
127+
There's a lot of scope for writing better tutorials. We have a new *NumPy for
128+
absolute beginners tutorial* [3]_ (GSoD project of Anne Bonner). In addition we
129+
need a number of tutorials addressing different levels of experience with Python
130+
and NumPy. This could be done using engaging data sets, ideas or stories. For
131+
example, curve fitting with polynomials and functions in ``numpy.linalg`` could
132+
be done with the Keeling curve (decades worth of CO2 concentration in air
133+
measurements) rather than with synthetic random data.
134+
135+
Ideas for tutorials (these capture the types of things that make sense, they're
136+
not necessarily the exact topics we propose to implement):
137+
138+
- Conway's game of life with only NumPy (note: already in `Nicolas Rougier's book
139+
<https://www.labri.fr/perso/nrougier/from-python-to-numpy/#the-game-of-life>`_)
140+
- Using masked arrays to deal with missing data in time series measurements
141+
- Using Fourier transforms to analyze the Keeling curve data, and extrapolate it.
142+
- Geospatial data (e.g. lat/lon/time to create maps for every year via a stacked
143+
array, like `gridMet data <http://www.climatologylab.org/gridmet.html>`_)
144+
- Using text data and dtypes (e.g. use speeches from different people, shape
145+
``(n_speech, n_sentences, n_words)``)
146+
147+
The *Preparing to Teach* document [2]_ from the Software Carpentry Instructor
148+
Training materials is a nice summary of how to write effective lesson plans (and
149+
tutorials would be very similar). In addition to adding new tutorials, we also
150+
propose a *How to write a tutorial* document, which would help users contribute
151+
new high-quality content to the documentation.
152+
153+
Data sets
154+
---------
155+
156+
Using interesting data in the NumPy docs requires giving all users access to
157+
that data, either inside NumPy or in a separate package. The former is not the
158+
best idea, since it's hard to do without increasing the size of NumPy
159+
significantly. Even for SciPy there has so far been no consensus on this (see
160+
`scipy PR 8707 <https://github.com/scipy/scipy/pull/8707>`_ on adding a new
161+
``scipy.datasets`` subpackage).
162+
163+
So we'll aim for a new (pure Python) package, named ``numpy-datasets`` or
164+
``scipy-datasets`` or something similar. That package can take some lessons from
165+
how, e.g., scikit-learn ships data sets. Small data sets can be included in the
166+
repo, large data sets can be accessed via a downloader class or function.
167+
168+
Related Work
169+
============
170+
171+
Some examples of documentation organization in other projects:
172+
173+
- `Documentation for Jupyter <https://jupyter.org/documentation>`_
174+
- `Documentation for Python <https://docs.python.org/3/>`_
175+
- `Documentation for TensorFlow <https://www.tensorflow.org/learn>`_
176+
177+
These projects make the intended audience for each part of the documentation
178+
more explicit, as well as previewing some of the content in each section.
179+
180+
Implementation
181+
==============
182+
183+
Currently, the `documentation for NumPy <https://numpy.org/devdocs/>`_ can be
184+
confusing, especially for beginners. Our proposal is to reorganize the docs in
185+
the following structure:
186+
187+
- For users:
188+
- Absolute Beginners Tutorial
189+
- main Tutorials section
190+
- How Tos for common tasks with NumPy
191+
- Reference Guide (API Reference)
192+
- Explanations
193+
- F2Py Guide
194+
- Glossary
195+
- For developers/contributors:
196+
- Contributor's Guide
197+
- Under-the-hood docs
198+
- Building and extending the documentation
199+
- Benchmarking
200+
- NumPy Enhancement Proposals
201+
- Meta information
202+
- Reporting bugs
203+
- Release Notes
204+
- About NumPy
205+
- License
206+
207+
Ideas for follow-up
208+
-------------------
209+
210+
Besides rewriting the current documentation to some extent, it would be ideal
211+
to have a technical infrastructure that would allow more contributions from the
212+
community. For example, if Jupyter Notebooks could be submitted as-is as
213+
tutorials or How-Tos, this might create more contributors and broaden the NumPy
214+
community.
215+
216+
Similarly, if people could download some of the documentation in Notebook
217+
format, this would certainly mean people would use less outdated material for
218+
learning NumPy.
219+
220+
It would also be interesting if the new structure for the documentation makes
221+
translations easier.
222+
223+
Discussion
224+
==========
225+
226+
227+
References and Footnotes
228+
========================
229+
230+
.. [1] `What nobody tells you about documentation <https://www.divio.com/blog/documentation/>`_
231+
232+
.. [2] `Preparing to Teach <https://carpentries.github.io/instructor-training/15-lesson-study/index.html>`_ (from the `Software Carpentry <https://software-carpentry.org/>`_ Instructor Training materials)
233+
234+
.. [3] `NumPy for absolute beginners Tutorial <https://numpy.org/devdocs/user/absolute_beginners.html>`_ by Anne Bonner
235+
236+
Copyright
237+
=========
238+
239+
This document has been placed in the public domain.

0 commit comments

Comments
 (0)