Skip to content

BUG: Revert sort optimization in np.unique. #10608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 21 additions & 15 deletions numpy/lib/arraysetops.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,16 +135,18 @@ def unique(ar, return_index=False, return_inverse=False,
return_counts : bool, optional
If True, also return the number of times each unique item appears
in `ar`.

.. versionadded:: 1.9.0
axis : int or None, optional
The axis to operate on. If None, `ar` will be flattened beforehand.
Otherwise, duplicate items will be removed along the provided axis,
with all the other axes belonging to the each of the unique elements.
Object arrays or structured arrays that contain objects are not
supported if the `axis` kwarg is used.
.. versionadded:: 1.13.0

axis : int or None, optional
The axis to operate on. If None, `ar` will be flattened. If an integer,
the subarrays indexed by the given axis will be flattened and treated
as the elements of a 1-D array with the dimension of the given axis,
see the notes for more details. Object arrays or structured arrays
that contain objects are not supported if the `axis` kwarg is used. The
default is None.

.. versionadded:: 1.13.0

Returns
-------
Expand All @@ -166,6 +168,17 @@ def unique(ar, return_index=False, return_inverse=False,
numpy.lib.arraysetops : Module with a number of other functions for
performing set operations on arrays.

Notes
-----
When an axis is specified the subarrays indexed by the axis are sorted.
This is done by making the specified axis the first dimension of the array
and then flattening the subarrays in C order. The flattened subarrays are
then viewed as a structured type with each element given a label, with the
effect that we end up with a 1-D array of structured types that can be
treated in the same way as any other 1-D array. The result is that the
flattened subarrays are sorted in lexicographic order starting with the
first element.

Examples
--------
>>> np.unique([1, 1, 2, 2, 3, 3])
Expand Down Expand Up @@ -217,14 +230,7 @@ def unique(ar, return_index=False, return_inverse=False,
ar = ar.reshape(orig_shape[0], -1)
ar = np.ascontiguousarray(ar)

if ar.dtype.char in (np.typecodes['AllInteger'] +
np.typecodes['Datetime'] + 'S'):
# Optimization: Creating a view of your data with a np.void data type of
# size the number of bytes in a full row. Handles any type where items
# have a unique binary representation, i.e. 0 is only 0, not +0 and -0.
dtype = np.dtype((np.void, ar.dtype.itemsize * ar.shape[1]))
else:
dtype = [('f{i}'.format(i=i), ar.dtype) for i in range(ar.shape[1])]
dtype = [('f{i}'.format(i=i), ar.dtype) for i in range(ar.shape[1])]

try:
consolidated = ar.view(dtype)
Expand Down
9 changes: 9 additions & 0 deletions numpy/lib/tests/test_arraysetops.py
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,15 @@ def test_unique_masked(self):
assert_array_equal(v.data, v2.data, msg)
assert_array_equal(v.mask, v2.mask, msg)

def test_unique_sort_order_with_axis(self):
# These tests fail if sorting along axis is done by treating subarrays
# as unsigned byte strings. See gh-10495.
fmt = "sort order incorrect for integer type '%s'"
for dt in 'bhilq':
a = np.array([[-1],[0]], dt)
b = np.unique(a, axis=0)
assert_array_equal(a, b, fmt % dt)

def _run_axis_tests(self, dtype):
data = np.array([[0, 1, 0, 0],
[1, 0, 0, 0],
Expand Down