-
Notifications
You must be signed in to change notification settings - Fork 6
Add metadata to array objects #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metadata to array objects #641
Conversation
See below for use cases: | ||
|
||
>>> # add metadata to/from an array | ||
>>> arr = ndtest((3, 3), meta=OrderedDict([('title', 'array for testing'), ('author', 'John Smith')]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, there is a missing parenthesis at the end, but wouldn't
>>> arr = ndtest((3, 3), meta=[('title', 'array for testing'), ('author', 'John Smith')])
be more readable anyway?
For Python 3.6+, I think we should push for (ie mention this in the release notes & examples):
>>> arr = ndtest((3, 3), meta=Metadata(title='array for testing', author='John Smith'))
This is longer but less punctuation-heavy and thus more readable IMO and Metadata does not require an extra import like OrderedDict does.
FWIW, when Python 3.7+ is out, we will also have the possibility to use:
>>> arr = ndtest((3, 3), meta=dict(title='array for testing', author='John Smith'))
@@ -29,6 +29,32 @@ New features | |||
>>> a02.equals(a['a1:a3'] >> 'group_a') | |||
False | |||
|
|||
* allowed arrays to have metadata (e.g. title, description, authors, ...). | |||
Metadata can be accesed using the syntax ``array.meta.name`` or ``array.meta[name]``. | |||
Warning: currently, only saving and loading to/from HDF file support metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence has a weird word order. Either:
- currently saving and loading metadata is only supported for the HDF (.h5) file format.
- currently, only the HDF (.h5) file format supports saving and loading metadata.
larray/core/array.py
Outdated
@@ -895,15 +901,18 @@ def __getattr__(self, key): | |||
|
|||
# needed to make *un*pickling work (because otherwise, __getattr__ is called before .axes exists, which leads to | |||
# an infinite recursion) | |||
# TODO: include metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably already the case because .meta should be in self.__dict__
. Now, pickle may not work on Metadata objects but that's another story.
larray/core/array.py
Outdated
def __getstate__(self): | ||
return self.__dict__ | ||
|
||
# TODO: include metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
larray/core/array.py
Outdated
return list(set(dir(self.__class__)) | set(self.__dict__.keys()) | names) | ||
metadata = set(self.meta.keys()) | ||
axis_names = set(axis.name for axis in self.axes if axis.name is not None) | ||
return list(set(dir(self.__class__)) | set(self.__dict__.keys()) | axis_names | metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change should be reverted (it is wrong AFAICT). The metadata keys are not available directly on LArray but rather on LArray.meta
larray/core/array.py
Outdated
@@ -6593,7 +6629,7 @@ def set_labels(self, axis=None, labels=None, inplace=False, **kwargs): | |||
self.axes = axes | |||
return self | |||
else: | |||
return LArray(self.data, axes) | |||
return LArray(self.data, axes, meta=self.meta) | |||
|
|||
def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): | |||
return LArray(self.data.astype(dtype, order, casting, subok, copy), self.axes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should copy metadata in this case.
larray/core/array.py
Outdated
@@ -7168,11 +7221,13 @@ def zeros_like(array, title='', dtype=None, order='K'): | |||
""" | |||
if not title: | |||
title = array.title | |||
return LArray(np.zeros_like(array, dtype, order), array.axes, title) | |||
if meta is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good idea to copy metadata by default in all the *_like functions. At least it does not match current usage. Usually, our users create new arrays to represent a different "concept" than the original array. They just want the same axes. Having an easy option to copy the metadata might be a good idea (e.g. meta='copy'). Unsure it worth it compared to the user doing it explicitly:
>>> a = ndtest((2, 3))
>>> b = zeros_like(a, meta=a.meta)
In either case, it would be nice if you mentioned in the docstring whether or not metadata is copied.
larray/tests/test_array.py
Outdated
assert res.meta == meta | ||
|
||
def test_compact(meta): | ||
arr = LArray([[1, 2], [1, 2]], [Axis('sex=M,F'), Axis('nat=BE,FO')], meta=meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LArray([[1, 2], [1, 2]], 'sex=M,F;nat=BE,FO') ?
larray/tests/test_array.py
Outdated
@@ -176,40 +174,53 @@ def small_array(meta): | |||
io_narrow_missing_values[2, 'b1', 'c1'] = np.nan | |||
|
|||
|
|||
def test_ndrange(): | |||
arr = ndtest('a=a0..a2') | |||
def test_concat(meta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of the function is misleading... it seems to be general while it only tests metadata.
>>> # delete an item | ||
>>> del arr.meta.city | ||
>>> # or | ||
>>> del arr.meta['city'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- More important than showing the two different syntaxes for modifying/deleting metadata would be to show metadata propagation and explain the rule used to decide whether a method propagates metadata or not. To avoid surprises for users, it is extremely important that we have a simple rule for this and that users understand it. I think the rule should be something like: all operations which only modify the axes (including reordering, subset, etc.) preserve metadata, all operations which modify existing values drop metadata. I already see one notable exception:
__setitem__
should not drop metadata since we modify the object in-place. - All functions which take a meta argument should have three options: keep previous metadata, take some new (explicit) metadata, or drop metadata. I am still unsure which should be the default. Maybe dropping by default with an easy option to keep it might be saner and an easier rule for users to remember (FWIW, this is what xarray advertise it does, but I don't see the option in their API documentation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all operations which only modify the axes (including reordering, subset, etc.) preserve metadata, all operations which modify the values drop metadata.
by modifying values, you include adding/removing labels? For example, what to do with insert
or with_total
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure about those but I would rather keep metadata. I thus propose the rule to be "all operations which modify existing values drop metadata"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work!
hmm, wait... I see a few remaining problems... |
larray/core/array.py
Outdated
@@ -287,6 +291,9 @@ def concat(arrays, axis=0, dtype=None): | |||
a0 0 1 2 0 0 1 | |||
a1 3 4 5 1 0 1 | |||
""" | |||
if meta is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we do not propagate metadata, I guess we should not propagate here either
larray/core/array.py
Outdated
Title of the array. It will be included in metadata but will be accessible via both syntax | ||
'.title' or '.meta.title'. | ||
meta : dict or OrderedDict or Metadata, optional | ||
Deprecated. See 'meta' bellow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
@@ -3286,16 +3322,16 @@ def indicesofsorted(self, axis=None, ascending=True, kind='quicksort'): | |||
def copy(self): | |||
"""Returns a copy of the array. | |||
""" | |||
return LArray(self.data.copy(), axes=self.axes[:], title=self.title) | |||
return LArray(self.data.copy(), axes=self.axes[:], meta=self.meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.copy() should either be mentioned in the release notes as an operation which propagates metadata OR avoid propagating them here too.
larray/core/array.py
Outdated
@@ -7076,14 +7116,19 @@ def aslarray(a): | |||
c 3.0 1.0 | |||
""" | |||
if isinstance(a, LArray): | |||
if meta is not None: | |||
a.meta = meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is bad practice to modify input arguments inplace. I guess we should use a.copy() somewhere in there.
larray/core/array.py
Outdated
"""Returns an array with the specified axes and filled with zeros. | ||
|
||
Parameters | ||
---------- | ||
axes : int, tuple of int, Axis or tuple/list/AxisCollection of Axis | ||
Collection of axes or a shape. | ||
title : str, optional | ||
Title. | ||
Deprecated. See 'meta' bellow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
larray/core/array.py
Outdated
"""Returns an array with the specified axes and filled with ones. | ||
|
||
Parameters | ||
---------- | ||
axes : int, tuple of int, Axis or tuple/list/AxisCollection of Axis | ||
Collection of axes or a shape. | ||
title : str, optional | ||
Title. | ||
Deprecated. See 'meta' bellow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
larray/core/array.py
Outdated
"""Returns an array with the same axes as array and filled with ones. | ||
|
||
Parameters | ||
---------- | ||
array : LArray | ||
Input array. | ||
title : str, optional | ||
Title. | ||
Deprecated. See 'meta' bellow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
larray/core/array.py
Outdated
@@ -7134,25 +7182,31 @@ def zeros(axes, title='', dtype=float, order='C'): | |||
BE 0.0 0.0 | |||
FO 0.0 0.0 | |||
""" | |||
if title is not None: | |||
import warnings | |||
warnings.warn("Argument title is deprecated. Use meta instead", FutureWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when title is provided, and shouldn't we use meta['title'] = title; title = None
(and initialize meta if None)? Otherwise, you will get twice the same warning: once in zeros (or other array creation functions) then once from LArray.__init__
. You probably want to make an helper function for this to avoid duplication this code over and over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still the "bellow" typo all over the place
larray/core/array.py
Outdated
meta = Metadata() | ||
if title is not None: | ||
import warnings | ||
warnings.warn("title argument is deprecated. Please use meta argument instead", FutureWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you test this? shouldn't that be stacklevel=3?
larray/core/array.py
Outdated
@@ -7076,14 +7115,20 @@ def aslarray(a): | |||
c 3.0 1.0 | |||
""" | |||
if isinstance(a, LArray): | |||
return a | |||
res = a.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only copy if meta is not None? (np.asarray does not copy if possible)
larray/core/array.py
Outdated
warnings.warn("title argument is deprecated. Please use meta argument instead", FutureWarning, stacklevel=2) | ||
meta['title'] = title | ||
title = None | ||
return meta, title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, you don't have to return title since it is (or at least should) never (be) used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for the ndrange change.
larray/core/array.py
Outdated
@@ -7681,10 +7680,10 @@ def array_or_full(a, axis, initial): | |||
create_sequential = renamed_to(sequence, 'create_sequential') | |||
|
|||
@_check_axes_argument | |||
def ndrange(axes, start=0, title=None, dtype=int): | |||
def ndrange(axes, start=0, title=None, dtype=int, meta=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should modify deprecated functions.
13dea31
to
bf98b50
Compare
- replaced XXX by TODO in a comment
- fix larray-project#79 : included metadata when saving/loading an array to/from an HDF file.
bf98b50
to
0499907
Compare
See comment #78 (comment) for specifications