Skip to content

ENH: drop dtype metadata #23371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 23, 2023
Merged

ENH: drop dtype metadata #23371

merged 7 commits into from
May 23, 2023

Conversation

ninousf
Copy link
Contributor

@ninousf ninousf commented Mar 10, 2023

the continuation of the PR #23185

(Doesn't actually fix the descr part including things, but the important thing here)
Closes gh-23169, gh-15488

@ninousf ninousf changed the title ENH: rename dtype_without_metadata to drop_metadata, and do not use d… ENH: drop dtype metadata Mar 10, 2023
Copy link
Contributor

@MatteoRaso MatteoRaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It looks great, just a few small suggestions.

Comment on lines 1151 to 1156
l.append((
n, # name
drop_metadata(t[0]), # format
t[1], # offset
t[2] if len(t) == 3 else None, # title
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
l.append((
n, # name
drop_metadata(t[0]), # format
t[1], # offset
t[2] if len(t) == 3 else None, # title
))
if len(t) == 3:
l.append((n, drop_metadata(t[0]), t[1], t[2]))
else:
l.append((n, drop_metadata(t[0]), t[1], None))

I think this is much more readable.

@@ -1140,4 +1140,31 @@ def _opt_info():
enabled_features += f" {feature}?"

return enabled_features

def drop_metadata(dt):
align = dt.isalignedstruct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
align = dt.isalignedstruct

This variable is unused, so it should be removed.

@seberg
Copy link
Member

seberg commented Apr 5, 2023

@ninousf sorry for not following up earlier? Did you have a use-case beyond dropping metadata in .npy?
Otherwise, I am tempted to follow-up to use this here and consider this semi-private (which it effectively is).

@ninousf
Copy link
Contributor Author

ninousf commented Apr 6, 2023

Hi Sebastian
We can keep it in utils.py
My use case is to be able to copy a hdf5 file which contains arrays with dtypes containing metadata in a zarr file
See #1152

@seberg
Copy link
Member

seberg commented Apr 12, 2023

It looks a bit like we may add np.dtypes as a namespace (I am starting to assume we will have enough use for it). That is a bit annoying, because adding it here might be too much right now, so I can follow-up on it.

One thing I am wondering: I somewhat like the idea of adding the behavior:

  • Check whether metadata is None ahead of time, if it is return the dtype unmodified (the identity).
  • For a structured/subarray dtypes we would have to also return the identity if all fields/the subdtype and the dtype itself has no metadata.

That would allow us to do:

no_metadata = drop_metadata(dtype)
if no_metadata is not dtype:
    warnings.warn("Dropping metadata which cannot be stored in an npy/npz file", UserWarning)

Although, I am not sure we actually need the warning there, maybe it is useful to be able to know that a change happened?

@seberg seberg force-pushed the dtype-without-metadata branch from 856ba1a to e2cbe53 Compare April 27, 2023 15:56
@seberg
Copy link
Member

seberg commented Apr 27, 2023

@ninousf could you review my changes? I made some heavier changes to use this for np.save and np.savez and also just to make the code cleaner, IMO.

Now, I don't like np.lib.utils.drop_metadata as a home, so I added a fuzzy "this is somewhat public" comment, but 🤷.
I would have liked to put this into the new np.dtypes namespace. But the function really has some problems (poor record dtype and results are plain wrong for user dtypes). That fits OK with npy saving, which has the same problem, but makes me not want to put it there yet. We should find a proper solution (probably in C).

The new code is a bit more robust than the previous one, because it returns the original dtype when there is no metadata attached. So at least if no metadata is used things should be fine even for user dtypes.

@ninousf
Copy link
Contributor Author

ninousf commented May 12, 2023

@seberg it's ok for me, thanks

@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label May 17, 2023
@seberg seberg added this to the 1.25.0 release milestone May 17, 2023
@charris
Copy link
Member

charris commented May 19, 2023

This could use a release note.

@charris charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label May 19, 2023
@charris
Copy link
Member

charris commented May 22, 2023

@ninousf This needs a release note. If you cannot add that today, I'm going to push this off to the next release.

@ninousf
Copy link
Contributor Author

ninousf commented May 22, 2023

@charris Tell me if it's ok for you

@charris
Copy link
Member

charris commented May 22, 2023

Looks fine, thanks.

@charris charris merged commit 0a2c895 into numpy:main May 23, 2023
@charris
Copy link
Member

charris commented May 23, 2023

Thanks @ninousf

@ntessore
Copy link
Contributor

ntessore commented Jun 13, 2023

Please consider marking this as fully public. Metadata is great to forego ndarray subclasses, but until this function came along, there was no good way to e.g. update a value in the metadata.

PS: Actually, maybe consider changing the order of the metadata update, so that all new_metadata survives in np.dtype(dtype, metadata=new_metadata).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes triage review Issue/PR to be discussed at the next triage meeting
Projects
Development

Successfully merging this pull request may close these issues.

BUG: unable to read .npy file with non-eval'able metadata
5 participants