Skip to content

DOC: update structured array docs to reflect #6053 #9056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 11, 2017

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented May 5, 2017

These are updated structure arrays docs to reflect the changes in #6053.

Don't merge this before #6053.

I'm putting them here now for comments to accompany #6053.

While this PR is open I will maintain an HTML compiled version of these docs at https://ahaldane.github.io/user/basics.rec.html

changes the structured array, the field view also changes: ::
If ``fieldname`` is the empty string (``''``) the field will be given a default
name of the form ``f#``, where ``#`` is the integer index of the field,
counting from 0 from the left::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a complaint about this PR, but I don't think I like this behaviour:

>>> np.dtype([('', 'f4'),('f0', 'i4'),('z', 'i8')])
ValueError: field 'f0' occurs more than once

In #9054, I change this in some cases to be "index within the unnamed values". Is that a good thing?

to a datatype, and shape is a tuple of integers specifying subarray shape.

>>> x = np.zeros(3, dtype=[('x', 'f4'), ('y', np.float32), ('z', 'f4', (2,2))])
>>> x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be clearer to show these as >>> dt = np.dtype(...), >>> np.zeros(1, dtype=dt)

array([(0, 0.0), (0, 0.0), (0, 0.0)],
dtype=[('col1', '>i4'), ('col2', '>f4')])
In this shorthand notation any of the :ref:`string dtype specifications
<arrays.dtypes.constructing>` may be used in a string, separated by commas. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that arrays.dtypes.rst duplicates a lot of the contents here, and perhaps these should be condensed into a single help page


Filling structured arrays
=========================
Note that unlike other numpy scalars void structured scalars act like views
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma

the arrays will result in a boolean array with the dimension of the original
arrays, with elements set to True where all fields of the correspnding
structures are equal. Structured dtypes are equivalent if the field names,
dtypes and titles are the same, ignoring endianness.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is misleading - we should clarify whether "equivalent dtypes" are such that dt1 == dt2, or if simply np.can_cast(dt1, dt2)

>>> np.zeros(3, dtype={'names': ['col1', 'col2'],
... 'formats': ['i4','f4'],
... 'offsets': [0, 4],
... 'itemsize': 12})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be tempting to write this in the following form:

>>> np.zeros(3, dtype=dict(names=    ['col1', 'col2'],
...                        formats=  ['i4','f4'],
...                        offsets=  [0, 4],
...                        itemsize= 12)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of course, raises the question of whether np.dtype(**dict) should be added as a shorhand for np.dtype(dict)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aligning the [...] isn't PEP8. It may take a bit of getting used to but eventually the unaligned versions become easier to read.

@charris
Copy link
Member

charris commented Sep 21, 2017

@ahaldane Now that #6053 is in we should get this finished up.

@charris
Copy link
Member

charris commented Sep 24, 2017

@ahaldane ping.

@ahaldane
Copy link
Member Author

Got it, I'll go over it soon.

@ahaldane ahaldane force-pushed the structure_docs branch 2 times, most recently from f07ea87 to 3a7f388 Compare September 25, 2017 21:33
@ahaldane
Copy link
Member Author

Updated, and ready to read through.

You can view an html version of the current state at https://ahaldane.github.io/user/basics.rec.html

@ahaldane ahaldane force-pushed the structure_docs branch 2 times, most recently from 602e22d to 5be7def Compare September 27, 2017 23:33
this array is a structure that contains three items, a 32-bit integer, a 32-bit
float, and a string of length 10 or less. If we index this array at the second
position we get the second structure: ::
Here ``x`` is a one-dimensional array length 2, whose datatype is a structure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"array of length two" and omit the comma. The clause after the comma is essential.

dtype=[('name', 'S10'), ('age', '<i4'), ('weight', '<f4')])

Structured arrays are designed for low-level manipulation of structured data,
for example for interpreting binary blobs. Structured datatypes are designed to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for example, interpreting ..."

Structured arrays are designed for low-level manipulation of structured data,
for example for interpreting binary blobs. Structured datatypes are designed to
mimic 'structs' in the C language, making them useful for interfacing with C
code. For these purposes numpy supports specialized features such as subarrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma after "purposes"

and nested datatypes, and allows manual control over the memory layout of the
structure.

If you only wish to manipulate tabular data with labelled columns, you are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "For simple manipulation of tabular data, other pydata projects, such as pandas, xarray, or DataArray, provide higher-level interfaces that may be more suitable."

structured datatypes, and it may also be a :term:`sub-array` which behaves like
an ndarray of a specified shape. The offsets of the fields are arbitrary, and
fields may even overlap. These offsets are usually determined automatically by
numpy but can also be manually specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"by numpy, but can be manually specified."

Structured Datatype Creation
----------------------------

Structured datatypes may be created using the function :func:`numpy.dtype` with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a rewrite. I'd probably start a new sentence instead of using "with".

:ref:`Data Type Objects <arrays.dtypes.constructing>` reference page, and in
summary they are:

1. A list of tuples, one tuple per field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to number subtitles? Maybe a simple enumerated list would do.

>>> np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2,2))])
dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))])

If ``fieldname`` is the empty string (``''``) the field will be given a default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is an empty string, '', the ..."

```````````````````````````````````````````````````

In this shorthand notation any of the :ref:`string dtype specifications
<arrays.dtypes.constructing>` may be used in a string, separated by commas. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... string and separated ..."

The dictionary has two required keys, 'names' and 'formats', and four optional
keys, 'offsets', 'itemsize', 'aligned' and 'titles'. 'names' and 'formats'
should respectively correspond to a list of field names and a list of dtype
specifications of the same length. The optional 'offsets' key must correspond
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"specifications, all of the same length."

keys, 'offsets', 'itemsize', 'aligned' and 'titles'. 'names' and 'formats'
should respectively correspond to a list of field names and a list of dtype
specifications of the same length. The optional 'offsets' key must correspond
to a list of integer byte-offsets of each field within the structure, of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The optional 'offsets' key is a list of integer byte offsets, one for each field within the structure."

same length. If 'offsets' is not given the offsets are determined
automatically. The optional 'itemsize' key should correspond to an integer
describing the total size in bytes of the dtype, which must be large enough
that all the fields are contained. ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... to contain all the fields."

Because of this, and because the ``names`` attribute preserves the field order
while the ``fields`` attribute may not, it is recommended to iterate through
the fields of a dtype using the ``names`` attribute of the dtype (which will
not list titles), as in::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commas rather than parenthesis. I don't know when the current parenthetical scourge originated, but it seems to be everywhere these days :-(

For the last example: ::
A scalar assigned to a structured element will be assigned to all fields. This
happens when a scalar is assigned to a structured array, or when a scalar array
is assigned to a structured array::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example of using a scalar array for the rhs? Does scalar array mean 1-D array here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess by "scalar array" I mean "unstructured array", will fix.

dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')])

Structured arrays can also be assigned to scalar arrays, but only if the
structured datatype has just a single field::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example of that? Which side has the single field, or is that both sides?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to modify the example, perhaps using different variable names,.

dtype=[('a', '<i8'), ('b', '<i4'), ('c', '<f8')])

The resulting array is a view into the original array, such that assignment to
the view modifies the original array. This view's fields will be in the order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "The view's" instead of "This view's".

The resulting array is a view into the original array, such that assignment to
the view modifies the original array. This view's fields will be in the order
they were indexed. Note that unlike for single-field indexing, the view's dtype
has the same itemsize as the original array and has fields at the same offsets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and <- comma

has the same itemsize as the original array and has fields at the same offsets
as in the original array, and unindexed fields are merely missing.

Since this view is a structured array itself, it obeys the assignment rules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this <- the

>>> type(scalar)
numpy.void

Importantly, unlike other numpy scalars, structured scalars are mutable and act
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could omit "Importantly".

numpy.void

Importantly, unlike other numpy scalars, structured scalars are mutable and act
like views into the original array, such that modifying the scalar will modify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"such that" <- "so that".


Notice that `x` is created with a list of tuples. ::
Thus, tuples might be though of as the native Python equivalent to numpy's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though <- thought

>>> x[['y','x']]
array([(2.5, 1.5), (4.0, 3.0), (3.0, 1.0)],
dtype=[('y', '<f4'), ('x', '<f4')])
In order to prevent clobbering of object pointers in fields of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit "of".


Structured arrays can be filled by field or row by row. ::
If the dtypes of two structured arrays are equivalent, testing the equality of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "equivalent" mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equal, will fix


Structured arrays can be filled by field or row by row. ::
If the dtypes of two structured arrays are equivalent, testing the equality of
the arrays will result in a boolean array with the dimension of the original
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dimension <- dimesions.

Structured arrays can be filled by field or row by row. ::
If the dtypes of two structured arrays are equivalent, testing the equality of
the arrays will result in a boolean array with the dimension of the original
arrays, with elements set to True where all fields of the corresponding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

``True`` ?

If you fill it in row by row, it takes a take a tuple
(but not a list or array!)::
Currently, if the dtypes of two arrays are not equivalent all comparisons will
return ``False``. This behavior is deprecated as of numpy 1.10 and may change
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What might be the alternative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in this case we get:

[1]: a = np.zeros(3, dtype='f,f,f')
[2]: b = np.zeros(3, dtype='f,f')
[3]: a == b
FutureWarning: elementwise == comparison failed and returning scalar instead; this will raise an error or perform elementwise comparison in the future

I'll reword the text to more accurately describe what happens.

>>> arr
array([(10.0, 20.0), (1.0, 0.0), (2.0, 0.0), (3.0, 0.0), (4.0, 0.0)],
dtype=[('var1', '<f8'), ('var2', '<f8')])
Currently, the ``<`` and ``>`` operators will always return ``False`` when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "The .... operators always return ..."?

array([(10.0, 20.0), (1.0, 0.0), (2.0, 0.0), (3.0, 0.0), (4.0, 0.0)],
dtype=[('var1', '<f8'), ('var2', '<f8')])
Currently, the ``<`` and ``>`` operators will always return ``False`` when
comparing structured arrays. Many other pairwise operators are not supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many or all? Maybe "no other"?

which allows field access by attribute on the individual elements of the array.
As an optional convenience numpy provides an ndarray subclass,
:class:`numpy.recarray`, and associated helper functions in the
:mod:`numpy.rec` submodule, which allows access to fields of structured arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which <- that.

As an optional convenience numpy provides an ndarray subclass,
:class:`numpy.recarray`, and associated helper functions in the
:mod:`numpy.rec` submodule, which allows access to fields of structured arrays
by attribute, instead of only by index. Record arrays also use a special
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comma not needed.

:class:`numpy.recarray`, and associated helper functions in the
:mod:`numpy.rec` submodule, which allows access to fields of structured arrays
by attribute, instead of only by index. Record arrays also use a special
datatype, :class:`numpy.record`, which allows field access by attribute on the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which <- that

@charris
Copy link
Member

charris commented Nov 10, 2017

I'm not sure that we should be using 'title' instead of italicized title or title, the same for other names for parts of the dtype.

NumPy should hire a copy editor, there should be plenty out there in this time of self publishing.

@ahaldane
Copy link
Member Author

Updated, thanks a lot. You make a great copy editor!

For the styling of the parts of the dtype, my idea was to use them as normal nouns through most of the document (eg, "the field name is.."), except in the dtype-specification section where I need to make clear the format of the tuple, in which case I write (fieldname, datatype, shape) and refer to the three variables in the tuple in code-styling like fieldname.

@ahaldane ahaldane force-pushed the structure_docs branch 2 times, most recently from 6ca3fda to 4280fb3 Compare November 10, 2017 02:51
@charris charris merged commit de26584 into numpy:master Nov 11, 2017
@charris
Copy link
Member

charris commented Nov 11, 2017

OK, let's get this in. Thanks Allan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants