Skip to content

Add metadata to array objects #641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alixdamman
Copy link
Collaborator

See comment #78 (comment) for specifications

@alixdamman alixdamman requested a review from gdementen June 5, 2018 13:06
@alixdamman alixdamman changed the title Add metadata to larray objects Add metadata to array objects Jun 5, 2018
See below for use cases:

>>> # add metadata to/from an array
>>> arr = ndtest((3, 3), meta=OrderedDict([('title', 'array for testing'), ('author', 'John Smith')])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, there is a missing parenthesis at the end, but wouldn't

>>> arr = ndtest((3, 3), meta=[('title', 'array for testing'), ('author', 'John Smith')])

be more readable anyway?

For Python 3.6+, I think we should push for (ie mention this in the release notes & examples):

>>> arr = ndtest((3, 3), meta=Metadata(title='array for testing', author='John Smith'))

This is longer but less punctuation-heavy and thus more readable IMO and Metadata does not require an extra import like OrderedDict does.

FWIW, when Python 3.7+ is out, we will also have the possibility to use:

>>> arr = ndtest((3, 3), meta=dict(title='array for testing', author='John Smith'))

@@ -29,6 +29,32 @@ New features
>>> a02.equals(a['a1:a3'] >> 'group_a')
False

* allowed arrays to have metadata (e.g. title, description, authors, ...).
Metadata can be accesed using the syntax ``array.meta.name`` or ``array.meta[name]``.
Warning: currently, only saving and loading to/from HDF file support metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence has a weird word order. Either:

  • currently saving and loading metadata is only supported for the HDF (.h5) file format.
  • currently, only the HDF (.h5) file format supports saving and loading metadata.

@@ -895,15 +901,18 @@ def __getattr__(self, key):

# needed to make *un*pickling work (because otherwise, __getattr__ is called before .axes exists, which leads to
# an infinite recursion)
# TODO: include metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably already the case because .meta should be in self.__dict__. Now, pickle may not work on Metadata objects but that's another story.

def __getstate__(self):
return self.__dict__

# TODO: include metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

return list(set(dir(self.__class__)) | set(self.__dict__.keys()) | names)
metadata = set(self.meta.keys())
axis_names = set(axis.name for axis in self.axes if axis.name is not None)
return list(set(dir(self.__class__)) | set(self.__dict__.keys()) | axis_names | metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change should be reverted (it is wrong AFAICT). The metadata keys are not available directly on LArray but rather on LArray.meta

@@ -6593,7 +6629,7 @@ def set_labels(self, axis=None, labels=None, inplace=False, **kwargs):
self.axes = axes
return self
else:
return LArray(self.data, axes)
return LArray(self.data, axes, meta=self.meta)

def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True):
return LArray(self.data.astype(dtype, order, casting, subok, copy), self.axes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should copy metadata in this case.

@@ -7168,11 +7221,13 @@ def zeros_like(array, title='', dtype=None, order='K'):
"""
if not title:
title = array.title
return LArray(np.zeros_like(array, dtype, order), array.axes, title)
if meta is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea to copy metadata by default in all the *_like functions. At least it does not match current usage. Usually, our users create new arrays to represent a different "concept" than the original array. They just want the same axes. Having an easy option to copy the metadata might be a good idea (e.g. meta='copy'). Unsure it worth it compared to the user doing it explicitly:

>>> a = ndtest((2, 3))
>>> b = zeros_like(a, meta=a.meta)

In either case, it would be nice if you mentioned in the docstring whether or not metadata is copied.

assert res.meta == meta

def test_compact(meta):
arr = LArray([[1, 2], [1, 2]], [Axis('sex=M,F'), Axis('nat=BE,FO')], meta=meta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LArray([[1, 2], [1, 2]], 'sex=M,F;nat=BE,FO') ?

@@ -176,40 +174,53 @@ def small_array(meta):
io_narrow_missing_values[2, 'b1', 'c1'] = np.nan


def test_ndrange():
arr = ndtest('a=a0..a2')
def test_concat(meta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the function is misleading... it seems to be general while it only tests metadata.

>>> # delete an item
>>> del arr.meta.city
>>> # or
>>> del arr.meta['city']
Copy link
Contributor

@gdementen gdementen Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. More important than showing the two different syntaxes for modifying/deleting metadata would be to show metadata propagation and explain the rule used to decide whether a method propagates metadata or not. To avoid surprises for users, it is extremely important that we have a simple rule for this and that users understand it. I think the rule should be something like: all operations which only modify the axes (including reordering, subset, etc.) preserve metadata, all operations which modify existing values drop metadata. I already see one notable exception: __setitem__ should not drop metadata since we modify the object in-place.
  2. All functions which take a meta argument should have three options: keep previous metadata, take some new (explicit) metadata, or drop metadata. I am still unsure which should be the default. Maybe dropping by default with an easy option to keep it might be saner and an easier rule for users to remember (FWIW, this is what xarray advertise it does, but I don't see the option in their API documentation).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all operations which only modify the axes (including reordering, subset, etc.) preserve metadata, all operations which modify the values drop metadata.

by modifying values, you include adding/removing labels? For example, what to do with insert or with_total?

Copy link
Contributor

@gdementen gdementen Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about those but I would rather keep metadata. I thus propose the rule to be "all operations which modify existing values drop metadata"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better

@alixdamman alixdamman mentioned this pull request Jun 8, 2018
@alixdamman alixdamman requested a review from gdementen June 11, 2018 12:36
Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work!

@gdementen
Copy link
Contributor

hmm, wait... I see a few remaining problems...

@@ -287,6 +291,9 @@ def concat(arrays, axis=0, dtype=None):
a0 0 1 2 0 0 1
a1 3 4 5 1 0 1
"""
if meta is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we do not propagate metadata, I guess we should not propagate here either

Title of the array. It will be included in metadata but will be accessible via both syntax
'.title' or '.meta.title'.
meta : dict or OrderedDict or Metadata, optional
Deprecated. See 'meta' bellow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below

@@ -3286,16 +3322,16 @@ def indicesofsorted(self, axis=None, ascending=True, kind='quicksort'):
def copy(self):
"""Returns a copy of the array.
"""
return LArray(self.data.copy(), axes=self.axes[:], title=self.title)
return LArray(self.data.copy(), axes=self.axes[:], meta=self.meta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.copy() should either be mentioned in the release notes as an operation which propagates metadata OR avoid propagating them here too.

@@ -7076,14 +7116,19 @@ def aslarray(a):
c 3.0 1.0
"""
if isinstance(a, LArray):
if meta is not None:
a.meta = meta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bad practice to modify input arguments inplace. I guess we should use a.copy() somewhere in there.

"""Returns an array with the specified axes and filled with zeros.

Parameters
----------
axes : int, tuple of int, Axis or tuple/list/AxisCollection of Axis
Collection of axes or a shape.
title : str, optional
Title.
Deprecated. See 'meta' bellow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below

"""Returns an array with the specified axes and filled with ones.

Parameters
----------
axes : int, tuple of int, Axis or tuple/list/AxisCollection of Axis
Collection of axes or a shape.
title : str, optional
Title.
Deprecated. See 'meta' bellow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below

"""Returns an array with the same axes as array and filled with ones.

Parameters
----------
array : LArray
Input array.
title : str, optional
Title.
Deprecated. See 'meta' bellow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below

@@ -7134,25 +7182,31 @@ def zeros(axes, title='', dtype=float, order='C'):
BE 0.0 0.0
FO 0.0 0.0
"""
if title is not None:
import warnings
warnings.warn("Argument title is deprecated. Use meta instead", FutureWarning, stacklevel=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when title is provided, and shouldn't we use meta['title'] = title; title = None (and initialize meta if None)? Otherwise, you will get twice the same warning: once in zeros (or other array creation functions) then once from LArray.__init__. You probably want to make an helper function for this to avoid duplication this code over and over.

@alixdamman alixdamman requested a review from gdementen June 12, 2018 14:16
Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still the "bellow" typo all over the place

meta = Metadata()
if title is not None:
import warnings
warnings.warn("title argument is deprecated. Please use meta argument instead", FutureWarning, stacklevel=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test this? shouldn't that be stacklevel=3?

@@ -7076,14 +7115,20 @@ def aslarray(a):
c 3.0 1.0
"""
if isinstance(a, LArray):
return a
res = a.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only copy if meta is not None? (np.asarray does not copy if possible)

warnings.warn("title argument is deprecated. Please use meta argument instead", FutureWarning, stacklevel=2)
meta['title'] = title
title = None
return meta, title
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, you don't have to return title since it is (or at least should) never (be) used

@alixdamman alixdamman requested a review from gdementen June 13, 2018 07:28
Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for the ndrange change.

@@ -7681,10 +7680,10 @@ def array_or_full(a, axis, initial):
create_sequential = renamed_to(sequence, 'create_sequential')

@_check_axes_argument
def ndrange(axes, start=0, title=None, dtype=int):
def ndrange(axes, start=0, title=None, dtype=int, meta=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should modify deprecated functions.

@alixdamman alixdamman force-pushed the add_metadata_to_larray_objects branch from 13dea31 to bf98b50 Compare June 14, 2018 07:44
- replaced XXX by TODO in a comment
- fix larray-project#79 : included metadata when saving/loading an array to/from an HDF file.
@alixdamman alixdamman force-pushed the add_metadata_to_larray_objects branch from bf98b50 to 0499907 Compare June 14, 2018 07:53
@alixdamman alixdamman merged commit 613f097 into larray-project:master Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants