Skip to content

Add Index.should_add_coord_to_array #10137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
May 6, 2025

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Mar 17, 2025

Same goal than #10116 using the alternative approach suggested in #10116 (comment) where the propagation (validation) of coordinates in a DataArray is delegated to their index (if any).

I find this approach cleaner than #10116. Here is the same notebook example adapted for this PR.

Index API alternatives

  1. Index.validate_dataarray_coord(self, name: Hashable, var: Variable, dims: set[Hashable]) -> None

    • check one index coordinate at a time, thus called multiple times for a multi-coordinate index
  2. Index.validate_dataarray_coords(self, variables: dict[Hashable, Variable], dims: set[Hashable]) -> None

    • check all coordinates of an index at once
    • perhaps could make things easier for developers of custom Xarray indexes (perhaps not?), but it would make Xarray internals more complicated with possible significant impact on performance (we'd have to rely on Indexes.group_by_index while in option 1 we simply need to iterate over _variables, _coords and/or _indexes).
  3. Index.validate_dataarray_coords(self, variables: dict[Hashable, Variable], dims: set[Hashable]) -> Coordinates

    • Same than option 2 but allow returning a new set of coordinate variables and their indexes.
    • This has much more flexibility (e.g., convert a custom 2D spatial index to a 1D default PandasIndex when the result has only one of the two spatial dimensions), but I find it also complicated and prone to confusion.

Option 1, the simplest and working well with IntervalIndex, is currently implemented in this PR.

cc @shoyer @dcherian

@benbovy benbovy marked this pull request as ready for review March 17, 2025 12:19
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick thoughts

benbovy added 4 commits March 18, 2025 12:09
Functions with a leading underscore are marked by pyright as unused if
they are not used from within the module in which they are defined.

Also remove unneeded nested import.
Move check_dataarray_coords in xarray.core.coordinates module and rename
it to validate_dataarray_coords (name consistent with
Index.validate_dataarray_coord).

Move CoordinateValidationError from xarray.core.indexes to
xarray.core.coordinates module.
@benbovy
Copy link
Member Author

benbovy commented Mar 18, 2025

Thanks @shoyer for taking a look!

This is now ready for (another round of) review.

Compared to #10116 this PR still represents a major data model change for DataArray (in the sense that it allows overriding strict enforcement of the DataArray model in certain cases), but I think that the risk is mitigated since it is here done explicitly via Index.validate_dataarray_coord and the default behavior is to keep conforming to the DataArray model. Limiting an index to (in)validate its coordinates may also be good to keep things simple and predictable for end-users.

@dcherian regarding your #10116 (comment), in the example notebook the DataArray reprs shown without any list of dimensions for the Coordinates section do not strike me much. That said, the Coordinates repr is still basic and might benefit from such addition (it doesn't have any html repr BTW).

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@dcherian
Copy link
Contributor

dcherian commented Mar 29, 2025

That said, the Coordinates repr is still basic and might benefit from such addition (it doesn't have any html repr BTW).

Yeah I think it could be improved, but certainly not a blocker here.

IMO we can merge. I'm not a huge fan of the method name validate_dataarray_coord but I couldn't come up with anything better

benbovy and others added 2 commits March 31, 2025 09:14
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@dcherian
Copy link
Contributor

Yeah this method's a hard one to name. The only thing that comes to mind is should_propagate_variable_with_dataarray :/

@benbovy
Copy link
Member Author

benbovy commented Apr 25, 2025

Another possible name: is_coord_needed_in_dataarray()

@dcherian
Copy link
Contributor

I added suggestions for should_add_coord_to_array to save a few characters

@dcherian dcherian merged commit e42aa6f into pydata:main May 6, 2025
30 of 32 checks passed
@benbovy benbovy changed the title Add Index.validate_dataarray_coord Add Index.should_add_coord_to_array May 7, 2025
@benbovy benbovy deleted the index-validate-dataarray-coords branch May 7, 2025 06:58
dcherian added a commit to dcherian/xarray that referenced this pull request May 7, 2025
* main:
  dev whats-new (pydata#10294)
  Add SeasonGrouper, SeasonResampler (pydata#9524)
  (fix): remove `PandasExtensionArray` from repr (pydata#10291)
  Do not rely on `np.broadcast_to` to perform trivial dimension insertion (pydata#10277)
  DOC: Remove reference to absolufy (pydata#10290)
  Fix for scalar detection (pydata#8821)
  Add Index.validate_dataarray_coord (pydata#10137)
  add redirect for contributing guide (pydata#10282)
  Update pre-commit hooks (pydata#10288)
  Adding xarray-eopf to ecosystem.rst (pydata#10289)
  Add public typing.py module (pydata#10215)
  Rename Twitter to X (pydata#10283)
  Add `xarray-lmfit` extension for curve fitting to ecosystem documentation (pydata#10262)
@benbovy benbovy mentioned this pull request May 7, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants