Skip to content

Commit 5dc2add

Browse files
author
Marc Garcia
committed
working on the pandas documentation refactoring
1 parent 0851a66 commit 5dc2add

File tree

2 files changed

+392
-362
lines changed

2 files changed

+392
-362
lines changed

pandas/guide/source/contents.rst

Lines changed: 23 additions & 362 deletions
Original file line numberDiff line numberDiff line change
@@ -1,369 +1,30 @@
1-
====================================
2-
How to write a good pandas docstring
3-
====================================
1+
===========================
2+
Pandas documentation sprint
3+
===========================
44

5-
About docstrings and standards
6-
------------------------------
5+
This document provides all the necessary information to participate to the
6+
pandas documentation sprint.
77

8-
A Python docstring is a string used to document a Python function or method,
9-
so programmers can understand what it does without having to read the details
10-
of the implementation.
8+
The pandas documentation sprint is a worldwide event that will take place
9+
the 10th of March of 2018. During the sprint open source hackers will work
10+
on improving the `pandas API documentation
11+
<https://pandas.pydata.org/pandas-docs/stable/api.html>`_.
1112

12-
Also, it is a commonn practice to generate online (html) documentation
13-
automatically from docstrings. `Sphinx <http://www.sphinx-doc.org>`_ serves
14-
this purpose.
13+
While most of pandas documentation is great, very extense, and easy to follow,
14+
the API documentation could in many cases be better. Many of the `DataFrame`
15+
or `Series` methods for example, are documented with simply a one liner
16+
summary. In some cases, the documented parameters are not up to date with
17+
the actual method parameters. And while docstrings use the numpy docstring
18+
convention, they could benefit from some pandas specific convention.
1519

16-
Next example gives an idea on how a docstring looks like:
20+
There are around 1,000 API pages in pandas, meaning that the effort to fix,
21+
standardize and improve all the API documentation is huge. But the pandas
22+
user base is also huge. And the Python community is known for being the most
23+
active of any programming language (citation needed, but you know it is true).
1724

18-
.. code-block:: python
25+
So, Python/PyData user groups of all around the world
1926

20-
def add(num1, num2):
21-
"""Add up to integer numbers.
2227

23-
This function simply wraps the `+` operator, and does not
24-
do anything interesting, except for illustrating what is
25-
the docstring of a very simple function.
26-
27-
Parameters
28-
----------
29-
num1 : int
30-
First number to add
31-
num2 : int
32-
Second number to add
33-
34-
Returns
35-
-------
36-
int
37-
The sum of `num1` and `num2`
38-
39-
Examples
40-
--------
41-
>>> add(2, 2)
42-
4
43-
>>> add(25, 0)
44-
25
45-
>>> add(10, -10)
46-
0
47-
"""
48-
return num1 + num2
49-
50-
To make it easier to understand docstrings, and to make it possible to export
51-
them to html, some standards exist.
52-
53-
The first conventions every Python docstring should follow are defined in
54-
`PEP-257 <https://www.python.org/dev/peps/pep-0257/>`_.
55-
56-
As PEP-257 is quite open, some other standards exist. In the case of pandas,
57-
the numpy docstring convention is followed. There are two main documents
58-
that explain this convention:
59-
60-
- `Guide to NumPy/SciPy documentation <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_
61-
- `numpydoc docstring guide <http://numpydoc.readthedocs.io/en/latest/format.html>`_
62-
63-
numpydoc is a Sphinx extension to support the numpy docstring convention.
64-
65-
The standard uses reStructuredText (reST). reStructuredText is a markup
66-
language that allows encoding styles in plain text files. Documentation
67-
about reStructuredText can be found in:
68-
69-
- `Sphinx reStructuredText primer <http://www.sphinx-doc.org/en/stable/rest.html>`_
70-
- `Quick reStructuredText reference <http://docutils.sourceforge.net/docs/user/rst/quickref.html>`_
71-
- `Full reStructuredText specification <http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html>`_
72-
73-
The rest of this document will summarize all the above guides, and will
74-
provide additional convention specific to the pandas project.
75-
76-
Writing a docstring
77-
-------------------
78-
79-
General rules
80-
~~~~~~~~~~~~~
81-
82-
Docstrings must be defined with three double-quotes. No blank lines should be
83-
left before or after the docstring. The text starts immediately after the
84-
opening quotes (not in the next line). The closing quotes have their own line
85-
(and are not added at the end of the last sentence).
86-
87-
**Good:**
88-
89-
.. code-block:: python
90-
91-
def func():
92-
"""Some function.
93-
94-
With a good docstring.
95-
"""
96-
foo = 1
97-
bar = 2
98-
return foo + bar
99-
100-
**Bad:**
101-
102-
.. code-block:: python
103-
104-
def func():
105-
106-
"""
107-
Some function.
108-
109-
With several mistakes in the docstring.
110-
111-
It has a blank like after the signature `def func():`.
112-
113-
The text 'Some function' should go in the same line as the
114-
opening quotes of the docstring, not in the next line.
115-
116-
There is a blank line between the docstring and the first line
117-
of code `foo = 1`.
118-
119-
The closing quotes should be in the next line, not in this one."""
120-
121-
foo = 1
122-
bar = 2
123-
return foo + bar
124-
125-
Section 1: Short summary
126-
~~~~~~~~~~~~~~~~~~~~~~~~
127-
128-
The short summary is a single sentence that express what the function does in a
129-
concise way.
130-
131-
The short summary must start with a verb infinitive, end with a dot, and fit in
132-
a single line. It needs to express what the function does without providing
133-
details.
134-
135-
**Good:**
136-
137-
.. code-block:: python
138-
139-
def astype(dtype):
140-
"""Cast Series type.
141-
142-
This section will provide further details.
143-
"""
144-
pass
145-
146-
**Bad:**
147-
148-
.. code-block:: python
149-
150-
def astype(dtype):
151-
"""Casts Series type.
152-
153-
Verb in third-person of the present simple, should be infinitive.
154-
"""
155-
pass
156-
157-
def astype(dtype):
158-
"""Method to cast Series type.
159-
160-
Does not start with verb.
161-
"""
162-
pass
163-
164-
def astype(dtype):
165-
"""Cast Series type
166-
167-
Missing dot at the end.
168-
"""
169-
pass
170-
171-
def astype(dtype):
172-
"""Cast Series type from its current type to the new type defined in
173-
the parameter dtype.
174-
175-
Summary is too verbose and doesn't fit in a single line.
176-
"""
177-
pass
178-
179-
Section 2: Extended summary
180-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
181-
182-
The extended summary provides details on what the function does. It should not
183-
go into the details of the parameters, or discuss implementation notes, which
184-
go in other sections.
185-
186-
A blank line is left between the short summary and the extended summary. And
187-
every paragraph in the extended summary is finished by a dot.
188-
189-
.. code-block:: python
190-
191-
def unstack():
192-
"""Pivot a row index to columns.
193-
194-
When using a multi-index, a level can be pivoted so each value in
195-
the index becomes a column. This is especially useful when a subindex
196-
is repeated for the main index, and data is easier to visualize as a
197-
pivot table.
198-
199-
The index level will be automatically when added as columns.
200-
"""
201-
pass
202-
203-
Section 3: Parameters
204-
~~~~~~~~~~~~~~~~~~~~~
205-
206-
The details of the parameters will be added in this section. This section has
207-
the title "Parameters", followed by a line with a hyphen under each letter of
208-
the word "Parameters". A blank line is left before the section title, but not
209-
after, and not between the line with the word "Parameters" and the one with
210-
the hyphens.
211-
212-
After the title, each parameter in the signature must be documented, including
213-
`*args` and `**kwargs`, but not `self`.
214-
215-
The parameters are defined by their name, followed by a space, a colon, another
216-
space, and the type (or type). Note that the space between the name and the
217-
colon is important. Types are not defined for `*args` and `**kwargs`, but must
218-
be defined for all other parameters. After the parameter definition, it is
219-
required to have a line with the parameter description, which is indented, and
220-
can have multiple lines. The description must start with a capital letter, and
221-
finish with a dot.
222-
223-
**Good:**
224-
225-
.. code-block:: python
226-
227-
class Series:
228-
def plot(self, kind, **kwargs):
229-
"""Generate a plot.
230-
231-
Render the data in the Series as a matplotlib plot of the
232-
specified kind.
233-
234-
Parameters
235-
----------
236-
kind : str
237-
Kind of matplotlib plot.
238-
**kwargs
239-
These parameters will be passed to the matplotlib plotting
240-
function.
241-
"""
242-
pass
243-
244-
**Bad:**
245-
246-
.. code-block:: python
247-
248-
class Series:
249-
def plot(self, kind, **kwargs):
250-
"""Generate a plot.
251-
252-
Render the data in the Series as a matplotlib plot of the
253-
specified kind.
254-
255-
Note the blank line between the parameters title and the first
256-
parameter. Also, not that after the name of the parameter `kind`
257-
and before the colo, a space is missing.
258-
259-
Also, note that the parameter descriptions do not start with a
260-
capital letter, and do not finish with a dot.
261-
262-
Finally, the `**kwargs` is missing.
263-
264-
Parameters
265-
----------
266-
267-
kind: str
268-
kind of matplotlib plot
269-
"""
270-
pass
271-
272-
Parameter types
273-
^^^^^^^^^^^^^^^
274-
275-
When specifying the parameter types, Python built-in data types can be used
276-
directly:
277-
278-
- int
279-
- float
280-
- str
281-
282-
For complex types, define the subtypes:
283-
284-
- list of int
285-
- dict of str : int
286-
- tuple of (str, int, int)
287-
- set of str
288-
289-
In case there are just a set of values allowed, list them in curly brackets
290-
and separated by commas (followed by a space):
291-
292-
- {0, 10, 25}
293-
- {'simple', 'advanced'}
294-
295-
If the type is defined in a Python module, the module must be specified:
296-
297-
- datetime.date
298-
- datetime.datetime
299-
- decimal.Decimal
300-
301-
If the type is in a package, the module must be equally specified:
302-
303-
- numpy.ndarray
304-
- scipy.sparse.coo_matrix
305-
306-
If the type is a pandas type, also specify pandas:
307-
308-
- pandas.Series
309-
- pandas.DataFrame
310-
311-
If more than one type is accepted, separate them by commas, except the
312-
last two types, that need to be separated by the word 'or':
313-
314-
- int or float
315-
- float, decimal.Decimal or None
316-
- str or list of str
317-
318-
Section 4: Returns or Yields
319-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
320-
321-
If the method returns a value, it will be documented in this section. Also
322-
if the method yields its output.
323-
324-
The title of the section will be defined in the same way as the "Parameters".
325-
With the names "Returns" or "Yields" followed by a line with as many hyphens
326-
as the letters in the preceding word.
327-
328-
The documentation of the return is also similar to the parameters. But in this
329-
case, no name will be provided, unless the method returns or yields more than
330-
one value (a tuple of values).
331-
332-
For example, with a single value:
333-
334-
.. code-block:: python
335-
336-
def sample():
337-
"""Generate and return a random number.
338-
339-
The value is sampled from a continuos uniform distribution between
340-
0 and 1.
341-
342-
Returns
343-
-------
344-
float
345-
Random number generated.
346-
"""
347-
return random.random()
348-
349-
With more than one value:
350-
351-
.. code-block:: python
352-
353-
def random_letters():
354-
"""Generate and return a sequence of random letters.
355-
356-
The length of the returned string is also random, and is also
357-
returned.
358-
359-
Returns
360-
-------
361-
length : int
362-
Length of the returned string.
363-
letters : str
364-
String of random letters.
365-
"""
366-
length = random.randint(1, 10)
367-
letters = ''.join(random.choice(string.ascii_lowercase)
368-
for i in range(length))
369-
return length, letters
28+
.. toctree::
29+
pandas_setup
30+
pandas_docstring

0 commit comments

Comments
 (0)