-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
DOC: update structured array docs to reflect #6053 #9056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/doc/structured_arrays.py
Outdated
changes the structured array, the field view also changes: :: | ||
If ``fieldname`` is the empty string (``''``) the field will be given a default | ||
name of the form ``f#``, where ``#`` is the integer index of the field, | ||
counting from 0 from the left:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a complaint about this PR, but I don't think I like this behaviour:
>>> np.dtype([('', 'f4'),('f0', 'i4'),('z', 'i8')])
ValueError: field 'f0' occurs more than once
In #9054, I change this in some cases to be "index within the unnamed values". Is that a good thing?
numpy/doc/structured_arrays.py
Outdated
to a datatype, and shape is a tuple of integers specifying subarray shape. | ||
|
||
>>> x = np.zeros(3, dtype=[('x', 'f4'), ('y', np.float32), ('z', 'f4', (2,2))]) | ||
>>> x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be clearer to show these as >>> dt = np.dtype(...)
, >>> np.zeros(1, dtype=dt)
numpy/doc/structured_arrays.py
Outdated
array([(0, 0.0), (0, 0.0), (0, 0.0)], | ||
dtype=[('col1', '>i4'), ('col2', '>f4')]) | ||
In this shorthand notation any of the :ref:`string dtype specifications | ||
<arrays.dtypes.constructing>` may be used in a string, separated by commas. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that arrays.dtypes.rst
duplicates a lot of the contents here, and perhaps these should be condensed into a single help page
numpy/doc/structured_arrays.py
Outdated
|
||
Filling structured arrays | ||
========================= | ||
Note that unlike other numpy scalars void structured scalars act like views |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comma
numpy/doc/structured_arrays.py
Outdated
the arrays will result in a boolean array with the dimension of the original | ||
arrays, with elements set to True where all fields of the correspnding | ||
structures are equal. Structured dtypes are equivalent if the field names, | ||
dtypes and titles are the same, ignoring endianness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is misleading - we should clarify whether "equivalent dtypes" are such that dt1 == dt2
, or if simply np.can_cast(dt1, dt2)
numpy/doc/structured_arrays.py
Outdated
>>> np.zeros(3, dtype={'names': ['col1', 'col2'], | ||
... 'formats': ['i4','f4'], | ||
... 'offsets': [0, 4], | ||
... 'itemsize': 12}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be tempting to write this in the following form:
>>> np.zeros(3, dtype=dict(names= ['col1', 'col2'],
... formats= ['i4','f4'],
... offsets= [0, 4],
... itemsize= 12)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which of course, raises the question of whether np.dtype(**dict)
should be added as a shorhand for np.dtype(dict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligning the [...]
isn't PEP8. It may take a bit of getting used to but eventually the unaligned versions become easier to read.
@ahaldane ping. |
Got it, I'll go over it soon. |
f07ea87
to
3a7f388
Compare
Updated, and ready to read through. You can view an html version of the current state at https://ahaldane.github.io/user/basics.rec.html |
602e22d
to
5be7def
Compare
numpy/doc/structured_arrays.py
Outdated
this array is a structure that contains three items, a 32-bit integer, a 32-bit | ||
float, and a string of length 10 or less. If we index this array at the second | ||
position we get the second structure: :: | ||
Here ``x`` is a one-dimensional array length 2, whose datatype is a structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"array of length two" and omit the comma. The clause after the comma is essential.
numpy/doc/structured_arrays.py
Outdated
dtype=[('name', 'S10'), ('age', '<i4'), ('weight', '<f4')]) | ||
|
||
Structured arrays are designed for low-level manipulation of structured data, | ||
for example for interpreting binary blobs. Structured datatypes are designed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for example, interpreting ..."
numpy/doc/structured_arrays.py
Outdated
Structured arrays are designed for low-level manipulation of structured data, | ||
for example for interpreting binary blobs. Structured datatypes are designed to | ||
mimic 'structs' in the C language, making them useful for interfacing with C | ||
code. For these purposes numpy supports specialized features such as subarrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma after "purposes"
numpy/doc/structured_arrays.py
Outdated
and nested datatypes, and allows manual control over the memory layout of the | ||
structure. | ||
|
||
If you only wish to manipulate tabular data with labelled columns, you are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "For simple manipulation of tabular data, other pydata projects, such as pandas, xarray, or DataArray, provide higher-level interfaces that may be more suitable."
numpy/doc/structured_arrays.py
Outdated
structured datatypes, and it may also be a :term:`sub-array` which behaves like | ||
an ndarray of a specified shape. The offsets of the fields are arbitrary, and | ||
fields may even overlap. These offsets are usually determined automatically by | ||
numpy but can also be manually specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"by numpy, but can be manually specified."
numpy/doc/structured_arrays.py
Outdated
Structured Datatype Creation | ||
---------------------------- | ||
|
||
Structured datatypes may be created using the function :func:`numpy.dtype` with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a rewrite. I'd probably start a new sentence instead of using "with".
numpy/doc/structured_arrays.py
Outdated
:ref:`Data Type Objects <arrays.dtypes.constructing>` reference page, and in | ||
summary they are: | ||
|
||
1. A list of tuples, one tuple per field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to number subtitles? Maybe a simple enumerated list would do.
numpy/doc/structured_arrays.py
Outdated
>>> np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2,2))]) | ||
dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))]) | ||
|
||
If ``fieldname`` is the empty string (``''``) the field will be given a default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is an empty string, ''
, the ..."
numpy/doc/structured_arrays.py
Outdated
``````````````````````````````````````````````````` | ||
|
||
In this shorthand notation any of the :ref:`string dtype specifications | ||
<arrays.dtypes.constructing>` may be used in a string, separated by commas. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... string and separated ..."
numpy/doc/structured_arrays.py
Outdated
The dictionary has two required keys, 'names' and 'formats', and four optional | ||
keys, 'offsets', 'itemsize', 'aligned' and 'titles'. 'names' and 'formats' | ||
should respectively correspond to a list of field names and a list of dtype | ||
specifications of the same length. The optional 'offsets' key must correspond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"specifications, all of the same length."
numpy/doc/structured_arrays.py
Outdated
keys, 'offsets', 'itemsize', 'aligned' and 'titles'. 'names' and 'formats' | ||
should respectively correspond to a list of field names and a list of dtype | ||
specifications of the same length. The optional 'offsets' key must correspond | ||
to a list of integer byte-offsets of each field within the structure, of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The optional 'offsets' key is a list of integer byte offsets, one for each field within the structure."
numpy/doc/structured_arrays.py
Outdated
same length. If 'offsets' is not given the offsets are determined | ||
automatically. The optional 'itemsize' key should correspond to an integer | ||
describing the total size in bytes of the dtype, which must be large enough | ||
that all the fields are contained. :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... to contain all the fields."
numpy/doc/structured_arrays.py
Outdated
Because of this, and because the ``names`` attribute preserves the field order | ||
while the ``fields`` attribute may not, it is recommended to iterate through | ||
the fields of a dtype using the ``names`` attribute of the dtype (which will | ||
not list titles), as in:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commas rather than parenthesis. I don't know when the current parenthetical scourge originated, but it seems to be everywhere these days :-(
numpy/doc/structured_arrays.py
Outdated
For the last example: :: | ||
A scalar assigned to a structured element will be assigned to all fields. This | ||
happens when a scalar is assigned to a structured array, or when a scalar array | ||
is assigned to a structured array:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an example of using a scalar array for the rhs? Does scalar array mean 1-D array here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess by "scalar array" I mean "unstructured array", will fix.
numpy/doc/structured_arrays.py
Outdated
dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')]) | ||
|
||
Structured arrays can also be assigned to scalar arrays, but only if the | ||
structured datatype has just a single field:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an example of that? Which side has the single field, or is that both sides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to modify the example, perhaps using different variable names,.
numpy/doc/structured_arrays.py
Outdated
dtype=[('a', '<i8'), ('b', '<i4'), ('c', '<f8')]) | ||
|
||
The resulting array is a view into the original array, such that assignment to | ||
the view modifies the original array. This view's fields will be in the order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "The view's" instead of "This view's".
numpy/doc/structured_arrays.py
Outdated
The resulting array is a view into the original array, such that assignment to | ||
the view modifies the original array. This view's fields will be in the order | ||
they were indexed. Note that unlike for single-field indexing, the view's dtype | ||
has the same itemsize as the original array and has fields at the same offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and <- comma
numpy/doc/structured_arrays.py
Outdated
has the same itemsize as the original array and has fields at the same offsets | ||
as in the original array, and unindexed fields are merely missing. | ||
|
||
Since this view is a structured array itself, it obeys the assignment rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this <- the
numpy/doc/structured_arrays.py
Outdated
>>> type(scalar) | ||
numpy.void | ||
|
||
Importantly, unlike other numpy scalars, structured scalars are mutable and act |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could omit "Importantly".
numpy/doc/structured_arrays.py
Outdated
numpy.void | ||
|
||
Importantly, unlike other numpy scalars, structured scalars are mutable and act | ||
like views into the original array, such that modifying the scalar will modify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"such that" <- "so that".
numpy/doc/structured_arrays.py
Outdated
|
||
Notice that `x` is created with a list of tuples. :: | ||
Thus, tuples might be though of as the native Python equivalent to numpy's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though <- thought
numpy/doc/structured_arrays.py
Outdated
>>> x[['y','x']] | ||
array([(2.5, 1.5), (4.0, 3.0), (3.0, 1.0)], | ||
dtype=[('y', '<f4'), ('x', '<f4')]) | ||
In order to prevent clobbering of object pointers in fields of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit "of".
numpy/doc/structured_arrays.py
Outdated
|
||
Structured arrays can be filled by field or row by row. :: | ||
If the dtypes of two structured arrays are equivalent, testing the equality of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "equivalent" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equal, will fix
numpy/doc/structured_arrays.py
Outdated
|
||
Structured arrays can be filled by field or row by row. :: | ||
If the dtypes of two structured arrays are equivalent, testing the equality of | ||
the arrays will result in a boolean array with the dimension of the original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimension <- dimesions.
numpy/doc/structured_arrays.py
Outdated
Structured arrays can be filled by field or row by row. :: | ||
If the dtypes of two structured arrays are equivalent, testing the equality of | ||
the arrays will result in a boolean array with the dimension of the original | ||
arrays, with elements set to True where all fields of the corresponding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``True`` ?
numpy/doc/structured_arrays.py
Outdated
If you fill it in row by row, it takes a take a tuple | ||
(but not a list or array!):: | ||
Currently, if the dtypes of two arrays are not equivalent all comparisons will | ||
return ``False``. This behavior is deprecated as of numpy 1.10 and may change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What might be the alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently in this case we get:
[1]: a = np.zeros(3, dtype='f,f,f')
[2]: b = np.zeros(3, dtype='f,f')
[3]: a == b
FutureWarning: elementwise == comparison failed and returning scalar instead; this will raise an error or perform elementwise comparison in the future
I'll reword the text to more accurately describe what happens.
numpy/doc/structured_arrays.py
Outdated
>>> arr | ||
array([(10.0, 20.0), (1.0, 0.0), (2.0, 0.0), (3.0, 0.0), (4.0, 0.0)], | ||
dtype=[('var1', '<f8'), ('var2', '<f8')]) | ||
Currently, the ``<`` and ``>`` operators will always return ``False`` when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "The .... operators always return ..."?
numpy/doc/structured_arrays.py
Outdated
array([(10.0, 20.0), (1.0, 0.0), (2.0, 0.0), (3.0, 0.0), (4.0, 0.0)], | ||
dtype=[('var1', '<f8'), ('var2', '<f8')]) | ||
Currently, the ``<`` and ``>`` operators will always return ``False`` when | ||
comparing structured arrays. Many other pairwise operators are not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many or all? Maybe "no other"?
numpy/doc/structured_arrays.py
Outdated
which allows field access by attribute on the individual elements of the array. | ||
As an optional convenience numpy provides an ndarray subclass, | ||
:class:`numpy.recarray`, and associated helper functions in the | ||
:mod:`numpy.rec` submodule, which allows access to fields of structured arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which <- that.
numpy/doc/structured_arrays.py
Outdated
As an optional convenience numpy provides an ndarray subclass, | ||
:class:`numpy.recarray`, and associated helper functions in the | ||
:mod:`numpy.rec` submodule, which allows access to fields of structured arrays | ||
by attribute, instead of only by index. Record arrays also use a special |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma not needed.
numpy/doc/structured_arrays.py
Outdated
:class:`numpy.recarray`, and associated helper functions in the | ||
:mod:`numpy.rec` submodule, which allows access to fields of structured arrays | ||
by attribute, instead of only by index. Record arrays also use a special | ||
datatype, :class:`numpy.record`, which allows field access by attribute on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which <- that
I'm not sure that we should be using 'title' instead of italicized NumPy should hire a copy editor, there should be plenty out there in this time of self publishing. |
47f1041
to
b282aff
Compare
Updated, thanks a lot. You make a great copy editor! For the styling of the parts of the dtype, my idea was to use them as normal nouns through most of the document (eg, "the field name is.."), except in the dtype-specification section where I need to make clear the format of the tuple, in which case I write |
6ca3fda
to
4280fb3
Compare
4280fb3
to
a08da3f
Compare
OK, let's get this in. Thanks Allan. |
These are updated structure arrays docs to reflect the changes in #6053.
Don't merge this before #6053.
I'm putting them here now for comments to accompany #6053.
While this PR is open I will maintain an HTML compiled version of these docs at https://ahaldane.github.io/user/basics.rec.html