Skip to content

ENH: Added atleast_nd #7804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

madphysicist
Copy link
Contributor

@madphysicist madphysicist commented Jul 6, 2016

Introduced generalized atleast_nd function. Similar to atleast_1d, atleast_2d, atleast_3d. Also added for masked arrays. Tests included.

This PR also fixes an issue with masked arrays that would crash the atleast_* functions if multiple scalars were input, e.g. np.ma.atleast_1d(1.0, 2.0, 3.0) would raise an error.

"""
res = []
for ary in arys:
ary = asanyarray(ary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted on the mailing list, if we go for this at all, you can just write

ary = array(ary, copy=False, subok=True, ndim=n)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if I add an option to place the dimensions at the end. I may still work this in without the ndim argument for that case though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not use your suggestion literally because zero-dim arrays need special handling, but I did something quite similar, just without the ndmin parameter (it's not ndim for array).

@madphysicist
Copy link
Contributor Author

So yeah...

@madphysicist
Copy link
Contributor Author

bump

@juliantaylor
Copy link
Contributor

can you implement the lower dimension variants in terms of this one?

@madphysicist
Copy link
Contributor Author

Not atleast_3d until I introduce a more complex spec for adding dimensions. atleast_3d adds dimensions on both sides of a 1D array. The other two are fine.

@juliantaylor
Copy link
Contributor

unless this function is significantly slower I would suggest to implement 1d and 2d in terms of this one for the extra test coverage and less code.

@madphysicist
Copy link
Contributor Author

Done. Reduced atleast_[12]d to three lines of code. Could have probably been one, but the input parameters are weird.

@homu
Copy link
Contributor

homu commented Aug 27, 2016

☔ The latest upstream changes (presumably #7823) made this pull request unmergeable. Please resolve the merge conflicts.

-------
res : ndarray
An array with ``a.ndim >= n``. Copies are avoided where
possible, and a view with `n` or more dimensions is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a copy ever needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know enough about the internals to know if res = array(ary, copy=False, subok=True) will ever return a copy under normal circumstances. My guess is no, but I wanted to do some mild CYA in the docs, just in case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only make a copy for non numpy arrays (e.g., lists).
On Mon, Aug 29, 2016 at 8:12 AM Joseph Fox-Rabinovitz <
notifications@github.com> wrote:

In numpy/core/shape_base.py
#7804 (comment):


  • ary : array_like
  •    The input array. Non-array inputs are converted to arrays.
    
  •    Arrays that already have `n` or more dimensions are preserved.
    
  • n : scalar
  •    The minimum number of dimensions required for each of the input
    
  •    arrays.
    
  • pos : {'start', 'end'}, optional
  •    Where to add new dimensions. All new dimensions are added on the
    
  •    same side of the shape. Default is 'start'.
    
  • Returns

  • res : ndarray
  •    An array with `a.ndim >= n`. Copies are avoided where
    
  •    possible, and a view with `n` or more dimensions is returned.
    

I do not know enough about the internals to know if res = array(ary,
copy=False, subok=True) will ever return a copy under normal
circumstances. My guess is no, but I wanted to do some mild CYA in the
docs, just in case.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/numpy/numpy/pull/7804/files/5b443132ec6c30aef974c73b30b510e6b60e8755#r76622781,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABKS1p0XcxxYEGc_49tnk41VdMQUHXgoks5qkvSFgaJpZM4JFvBv
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the docstring to reflect that now.

@madphysicist madphysicist force-pushed the atleast_nd branch 2 times, most recently from 8794a2e to 0f13b53 Compare August 29, 2016 15:42
@homu
Copy link
Contributor

homu commented Oct 20, 2016

☔ The latest upstream changes (presumably #7922) made this pull request unmergeable. Please resolve the merge conflicts.

@mattharrigan
Copy link
Contributor

see issue 8214

Arrays that already have `n` or more dimensions are preserved.
n : scalar
The minimum number of dimensions required for each of the input
arrays.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's only ever 1 array, for each of the input arrays doesn't seem to make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Fixed now.

@shoyer
Copy link
Member

shoyer commented Oct 26, 2016

This looks pretty solid to me. Did we ever discuss it on the mailing list?

Two suggestions:

  1. Note that the default pos='start' creates arrays that are consistent with NumPy's broadcasting rules.
  2. atleast_3d should have a loud note that its implementation is inconsistent with the others, for legacy reasons.

@madphysicist
Copy link
Contributor Author

Mailing list thread start: https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075759.html. There was no solid consensus to merge, but no strong objections either if I remember correctly.

@madphysicist
Copy link
Contributor Author

I think that suggestion 2 is adequately covered in the notes section already. I added another entry for suggestion 1.

@shoyer
Copy link
Member

shoyer commented Apr 17, 2018

A couple of comments on the API:

  • Let's call the parameter for specifying the number of dimension ndim. atleast_nd(array, ndim=3) is self-explanatory in a way that atleast_nd(array, n=3) is not. Many people will skip the keyword argument, but if you're going to use it anyways it's better to have something descriptive. NumPy has too many functions already with non-descriptive argument names.
  • I am -1 on an inplace argument, for numerous reasons:
    • As @eric-wieser notes, modifying shape in place is an antipattern, which should be discouraged as much as possible.
    • None of the other array manipulation functions in NumPy have an inplace argument.
    • It's not clear if what the return value is now. Is it None like most inplace functions or the original array?
    • There is nothing to be gained from an efficiency or code expressiveness perspective from using atleast_nd(array, 3, inplace=True) rather than array = atleast_nd(array, 3)

@madphysicist
Copy link
Contributor Author

@shoyer. I've squashed in all the changes you requested. inplace really does not make sense when you put it that way.

The minimum number of dimensions required.
pos : int, optional
The index to insert the new dimensions. May range from
``-ary.ndim`` to ``+ary.ndim``. Non-negative indices indicate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true - the lower bound is -ary.ndim - 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. I had a feeling at the back of my head that the indices should not be the same, but was in too much of a hurry to place it. Fixed now.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 17, 2018

@Erotemic

In these cases there is only one new dimension or all dimensions are inserted (so ordering doesn't matter)

I agree

I would absolutely expect the new axis to be grouped together.

Can you give a concrete example of where you'd expect this? As you note, for the examples you've given so far, it doesn't matter.

Given that we've come up with no convincing use cases for intermediate values, perhaps we should just allow only pos=0 and pos=-1?

@madphysicist
Copy link
Contributor Author

@eric-wieser The only use-case I can think of is something I did a while back where I needed to do a dot product over the last two dimensions in a 2D image that contained essentially small 2D matrices for each pixel, which would eventually be converted to color values (so a 4D array). After all the axis rolling, it turned out that the best way to broadcast to the particular RGB format I was interested in was to insert two unit axes at index 2. I honestly can't remember the exact details, and I'm sure there is a better way to do it, but I think it's better to split the existing dimensions rather than the new ones. I would also like to leave the ability to insert into an arbitrary index, not just 0 and -1.

@madphysicist
Copy link
Contributor Author

I forgot to remove the inplace parameter in the function signature. Fixed now. I believe that fixes the last remaining issue, until the next one.

@madphysicist
Copy link
Contributor Author

Rebased. Is everyone more or less happy with the state of this PR?

@Erotemic
Copy link

@eric-wieser

Can you give a concrete example of where you'd expect this? As you note, for the examples you've given so far, it doesn't matter.

Say I have a 1D array with shape (N,), and I want to interpret it as an image so I can pass it to my image processing tools that all expect 3D arrays where the first two dimensions are spatial and the last are channels. To ensure that inputs are in correct formats I would use atleast_nd with n=3 and pos=-1. In this case I would expect the new shape to be (N, 1, 1). A similar case might arise when working with 4D tensor libraries like pytorch. I don't think its an entirely common case, but I think it would exist and in that rare case I would expect new axis to be grouped.

I think the key idea that illustrates my expectation is what happens when there are 3 new axes? Will there be two at the front and one at the end or the other way. It seems ambiguous to allow for any other case than grouped axes.

Given that we've come up with no convincing use cases for intermediate values, perhaps we should just allow only pos=0 and pos=-1?

This is my preference, but I'm not attached to it. I just think its conceptually easier to understand document and use. It prevents users from shooting themselves in the foot.

@madphysicist
Copy link
Contributor Author

@Erotemic While I agree with you that grouping the new axes rather than the existing ones is by far the more sensible approach, I prefer to allow positions other than 0 and -1. For the work that I do, the color planes are stored separately, so it is often easier to have the channel number as the first rather than the last dimension. In that case it would be very helpful to insert the unit dimension into index -3.

@madphysicist madphysicist force-pushed the atleast_nd branch 3 times, most recently from c74febd to c8a0caa Compare July 9, 2018 20:53
@madphysicist
Copy link
Contributor Author

Bump?

Function only accepts one array and an indexed placement spec.
Masked arrays fully supported.
atleast_1d and atleast_2d re-implemented in terms of new func.

Tests included.
@charris
Copy link
Member

charris commented Sep 30, 2018

@eric-wieser Want to revisit this? You seemed the most critical.

@madphysicist Was this discussed on the list at some point?

@madphysicist
Copy link
Contributor Author

@charris. At some length, but a very long time ago: https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html

@ahaldane
Copy link
Member

ahaldane commented Oct 2, 2018

I only skimmed the discussions, but:

As syntactic sugar, I think it's a good addition. I was just looking at code that would use this: The block PRs happen to define their own _atleast_nd for clarity, and in multiarraymodule.c there is a C function _prepend_ones which essentially does this, used by the C code for np.concatenate, and for the ndim kwd of np.array.

Suggestion for this PR: Optimize for the most common case where pos == 0, by having a clause if pos == 0: return np.array(a, ndmin=ndim, copy=False, subok=True) at the start. That would make nice use of the _prepend_ones code path.

[[[[1, 2, 3]]]]
"""
ary = array(ary, copy=False, subok=True)
if ary.ndim:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this if?

Why not move the pos = line into the if extra >0 block?

pos = normalize_axis_index(pos, ary.ndim + 1)
extra = ndim - ary.ndim
if extra > 0:
ind = pos * (slice(None),) + extra * (None,) + (Ellipsis,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code reminds me it would be nice to have an axis argument to put: see issue #8765

@charris
Copy link
Member

charris commented Dec 17, 2020

This addition looks desirable. @madphysicist I'm going to close this, but feel free to rebase and resubmit. That will bring it closer to the head of the list :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants