Skip to content

implement arbitrary metadata for LArray objects #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gdementen opened this issue Feb 1, 2017 · 8 comments
Closed

implement arbitrary metadata for LArray objects #78

gdementen opened this issue Feb 1, 2017 · 8 comments
Assignees
Milestone

Comments

@gdementen
Copy link
Contributor

all array creation functions should be updated + I/O functions
we probably want to generalize to any metadata (store title and description in a .meta dictionary) or something like that.

@alixdamman alixdamman self-assigned this Feb 8, 2017
@gdementen gdementen changed the title implement description implement arbitrary metadata for LArray objects Aug 8, 2017
alixdamman added a commit to alixdamman/larray that referenced this issue Aug 22, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Aug 29, 2017
+ handle metadata in I/O functions relying on HDF format
@alixdamman alixdamman added this to the 0.27 milestone Sep 4, 2017
alixdamman added a commit to alixdamman/larray that referenced this issue Sep 14, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Oct 3, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Oct 9, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Oct 9, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Oct 12, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Oct 12, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Oct 13, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Oct 22, 2017
+ handle metadata in I/O functions relying on HDF format
alixdamman added a commit to alixdamman/larray that referenced this issue Oct 22, 2017
+ handle metadata in I/O functions relying on HDF format
@alixdamman
Copy link
Collaborator

alixdamman commented Oct 22, 2017

Let us sum up a little bit:

  1. access name:
    meta or attrs?

  2. passing metadata at array creation:

  • solution a: by passing an (ordered) dict
       >>> arr = LArray(data, axes, {'title': 'my title'})
  • solution b: via kwargs
       >>> arr = LArray(data, axes, title='my title')
  1. What to do with current argument title?
  • solution a: implement "blessed" metadata. Question is "which ones?"
  • solution b: passing metadata via kwargs.
  • solution c: if we pass metadata via a dict and we do not implement blessed metadata, we could keep current title argument for some releases and display a FuturWarning
  1. Usage:
    Current proposal is:
   # add new metadata
   >>> arr.attrs.creation_date = datetime(2017, 2, 10)
   # access metadata
   >>> arr.attrs.creation_date
   datetime.datetime(2017, 2, 10, 0, 0)
   # modify metadata
   >>> arr.attrs.creation_date = datetime(2017, 2, 16)
   # delete metadata
   >>> del arr.attrs.creation_date

@gdementen
Copy link
Contributor Author

gdementen commented Oct 23, 2017

I mostly have still the same opinion.

  1. either is fine by me. I have a slight personal preference for .attrs but we need to poll our users to see what is most intuitive for them.
  2. a) must be implemented in any case to disambiguate metadata keywords which collide with other arguments.
    b) could be done in addition, but I am reluctant to do it because that means that all array creation functions should treat extra kwargs like metadata and it means the list of "forbidden" metadata keywords is different depending on the function. It is also problematic for stack() which already uses kwargs for something else. Also, providing a way to set metadata could be useful in fact for any method returning an LArray (and some of them already uses **kwargs for other purposes e.g. aggregates) and loosing argument names validation (which is the flip side of the coin when you use **kwargs) for all methods is a heavy price to pay IMO.
  3. given the above, either solution a or c. We could postpone the decision by starting with only "title" as a blessed argument, and later either deprecate it or add a few other "blessed" arguments depending on how it goes.
  4. ok

@alixdamman
Copy link
Collaborator

alixdamman commented Apr 23, 2018

Specifications for metadata:

  1. Usage:
   # add new metadata
   >>> arr.attrs.creation_date = datetime(2017, 2, 10)
   # access metadata
   >>> arr.attrs.creation_date
   datetime.datetime(2017, 2, 10, 0, 0)
   # modify metadata
   >>> arr.attrs.creation_date = datetime(2017, 2, 16)
   # delete metadata
   >>> del arr.attrs.creation_date
  1. Passing metadata at array creation:
    Users do it by passing an (ordered) dict or a Metadata instance
   >>> arr = LArray(data, axes, attrs=OrderedDict([('title', 'my title')]))
  1. Backward compatibility with argument title of LArray.init:
    We implement a "blessed" metadata title.

  2. Copying metadata:
    a) When to copy them:

  • arr.copy()
  • arr.set_axes()
  • arr.rename()
  • arr.set_labels()
  • arr.drop_labels()
  • arr.combine_axes()
  • arr.split_axes()
  • arr.sort_axes()
  • arr.sort_values()
  • arr.reshape()
  • arr.reshape_like()
  • arr.compact()
  • arr.transpose()
  • arr.expand()
  • arr.broadcast_with()
  • arr.reindex()
  • arr.append()
  • arr.prepend()
  • arr.extend()
  • arr.insert()
  • arr.align()
  • arr.filter()
  • arr.with_total()
  • arr.__setitem__()
  • arr.set()

c) When NOT to copy them:

  • all aggregation functions
  • all ufuncs
  • all unary and binary ops
  • to_frame, to_series
  • arr.nonzero()
  • arr.describe() and describe_by()
  • all, any, min, max, labelofmin, ...
  • arr.ratio() and arr.rationot0()
  • arr.percent()
  • arr.divnot0()
  • arr.shift() and arr.diff()
  • arr.growth_rate()

@gdementen feel free to edit this comment. I will use it as specifications for implementing metadata.

@gdementen
Copy link
Contributor Author

  1. I would still like to poll our users for deciding attrs vs meta (unless you did already). Maybe they don't care, maybe they do.
  2. This is also possible:
>>> arr = LArray(data, axes, Attrs(title='my title'))

and I think it would be useful to implement this in addition to the dict syntax, in the same vein that we support an AxisCollection object or a string or xyz... We do not necessarily need to advertise this though.

@alixdamman
Copy link
Collaborator

How would like to make the poll? Via the Google group larray-users? Mail?

@alixdamman
Copy link
Collaborator

Result of the poll is meta

@alixdamman
Copy link
Collaborator

For the record: after some discussion, we decided to NOT propagate metadata. Propagating metadata through the code may lead to final arrays with irrelevant metadata.

@gdementen
Copy link
Contributor Author

gdementen commented Jun 6, 2018

Please keep a backup of your branch somewhere so that if we change our minds later (or are pressured to do so) you do not need to redo everything...

alixdamman added a commit to alixdamman/larray that referenced this issue Jun 14, 2018
alixdamman added a commit to alixdamman/larray that referenced this issue Jun 14, 2018
- fix larray-project#79 : included metadata when saving/loading an array to/from an HDF file.
gdementen pushed a commit that referenced this issue Aug 31, 2018
- fix #79 : included metadata when saving/loading an array to/from an HDF file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants