Skip to content

The norm of a size zero array should be zero. #5420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
argriffing opened this issue Jan 3, 2015 · 20 comments
Closed

The norm of a size zero array should be zero. #5420

argriffing opened this issue Jan 3, 2015 · 20 comments

Comments

@argriffing
Copy link
Contributor

Just as the empty sum is 0 and the empty product is 1, the empty norm should be 0 in my opinion. In numpy, this convention has been implemented and is unit-tested for the default vector and matrix norms:

    def test_empty(self):
        assert_equal(norm([]), 0.0)
        assert_equal(norm(array([], dtype=self.dt)), 0.0)
        assert_equal(norm(atleast_2d(array([], dtype=self.dt))), 0.0)

Some of the non-default empty norms raise exceptions. The induced matrix 2-norm raises exceptions for two reasons -- first, attempting to compute the singular values of a size-0 matrix raises an exception, and second, even if the the empty array of singular values were to be returned, attempting to compute the max of an empty sequence also raises an exception. The empty induced matrix 1-norm and np.inf-norm each also raise an exception because they try to compute an empty max.

@maniteja123
Copy link
Contributor

@seberg

For a inf-norm of vectors, the empty array is causing exception.
max is a ufunc without identity. The function PyArray_InitializeReduceResult is used for ufunc without identity like max function.

>>>la.norm(a,np.inf)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 2072,
 in norm
 return abs(x).max(axis=axis)
 File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 26, i
 n _amax
 return umr_maximum(a, axis, None, out, keepdims)
 ValueError: zero-size array to reduction operation maximum which has no identity

I think this is the control flow
max
_amax which is um.maximum.reduce
PyUfunc_Reduce
PyUFunc_ReduceWrapper
PyArray_InitializeReduceResult casues exception for zero-sized array

In PR #3861 the cases for accumulate and reduceat are handled, but for this case of reduce, I don't think it will prevent this from happening. Correct me if I am wrong please. :-)

@argriffing
Copy link
Contributor Author

Personally I'd like to have max_abs and sum_abs numpy ufuncs. Unlike max, the max_abs ufunc could have empty max 0. Also these ufuncs could save time and memory, but of course they would bloat the code, pollute the namespace, and invite discussion about which other combinations should be added. Yesterday I looked at adding these myself, but I was too intimidated by the code.

@ewmoore
Copy link
Contributor

ewmoore commented Jan 5, 2015

On Monday, January 5, 2015, argriffing notifications@github.com wrote:

Personally I'd like to have max_abs and sum_abs numpy ufuncs. Unlike max,
the max_abs ufunc could have empty max 0. Also these ufuncs could save
time and memory, but of course they would bloat the code, pollute the
namespace, and invite discussion about which other combinations should be
added. Yesterday I looked at adding these myself, but I was too intimidated
by the code.


Reply to this email directly or view it on GitHub
#5420 (comment).

Since the first and obvious use would be in linalg.norm perhaps they could
be added, and exposed only through there. If doing so, it might be worth
copying the dnrm2 algorthm too instead of calling dot as norm does now.

@seberg
Copy link
Member

seberg commented Jan 5, 2015

@maniteja123, yeah that PR has nothing to do with normal ufuncs. It is to allow more things for gufuncs, not sure that it would have any changes to accumulate/reduceat. In that case would need to check if everything is right there.

@maniteja123
Copy link
Contributor

@seberg Thanks for giving a clear picture. I couldn't distinguish between ufunc and gufunc. I was thinking that the changes could be affecting reduceat and accumulate since they were modifying edge-case code under PyUFunc_Reduceat and PyUFunc_Accumulate.

@argriffing
Copy link
Contributor Author

Since the first and obvious use would be in linalg.norm perhaps they could
be added, and exposed only through there.

This sounds great!

At first I had hoped that I could add a max_abs ufunc by just copying a block of ufunc code, replacing a max(...) somewhere with a max(abs(...)), and rename the function, but it looks more complicated than this. For example it would need to modify generate_umath.py by describing a new ufunc that blends or composes

    'maximum':
    Ufunc(2, 1, ReorderableNone,
          docstrings.get('numpy.core.umath.maximum'),
          'PyUFunc_SimpleBinaryOperationTypeResolver',
          TD(noobj),
          TD(O, f='npy_ObjectMax')
          ),

and

    'absolute':
    Ufunc(1, 1, None,
          docstrings.get('numpy.core.umath.absolute'),
          'PyUFunc_AbsoluteTypeResolver',
          TD(bints+flts+timedeltaonly),
          TD(cmplx, out=('f', 'd', 'g')),
          TD(O, f='PyNumber_Absolute'),
          ),

I think it would look like

    'max_abs':
    Ufunc(2, 1, Zero,
          docstrings.get('numpy.core.umath.max_abs'),
          [...],
          ),

but I don't know what goes in the [...] and I don't know what other functions to create or modify to make this work.

@seberg
Copy link
Member

seberg commented Jan 6, 2015

Well, those things define the dtype stuff basically. Type resolving should be fine. I think
TD(bints+flts+timedeltaonly) should probabyl be TD(bints+flts+timedeltaonly+O) (or such) and you need to define your own object inner loop and remove TD(O, f='PyNumber_Absolute') since there is no naive python function that can do what you want.
I never actually did this, so I might be wrong though :).

@argriffing
Copy link
Contributor Author

I looked through the ufunc tutorial but I couldn't find an example of an ndim-reducing ufunc. I guess these functions are called "reductions" in numpy, and are called aggregate functions, "aggregations" or "aggfuncs" in pandas and other literature. I'm also unsure about the connection between numpy ufunc "reductions" and the "combining function" of a fold. For example, numpy sum and max reduction ufuncs correspond to the "combining functions" add(x, y) and maximum(x, y). Would the numpy ufunc framework for a max_abs reduction ufunc require a corresponding max_abs "combining function" max_abs_combine(x, y) = maximum(absolute(x), absolute(y)) in addition to defining an "object inner loop"?

@ewmoore
Copy link
Contributor

ewmoore commented Jan 6, 2015

@argriffing, Have you looked at loops.c.src? When running a regular ufunc (e.g. numpy.add) reduction the same function is called as is used for the binary call. The IS_BINARY_REDUCE macro shows how this can be detected if necessary.

@argriffing
Copy link
Contributor Author

That looks a lot more complicated than I was hoping. There are 7 loop implementations for absolute in that file, not counting all of the templated variables within each implementation, and 6 maximum loop implementations. I assume this would mean I'd need to add between 7 and 42 loop implementations for a new max_abs ufunc, in that file alone.

@ewmoore
Copy link
Contributor

ewmoore commented Jan 6, 2015

I think it looks a lot worse than it actually ends up being. Here is a branch where I add a max_abs ufunc that works for all integer and floating point types. It takes 64 lines to do so which doesn't seem so bad.

@ewmoore
Copy link
Contributor

ewmoore commented Jan 6, 2015

Eh, I guess it needs the same special handing in the type resolver that the absolute ufunc has since it throws a warning with complex reductions right now. So a few more than 64 lines.

@argriffing
Copy link
Contributor Author

Thanks, this looks like a great start! Leaving aside the docstring and the timedelta and complex dtypes, this would need more hacking before it actually does a reduction rather than just a binary operation, right?

>>> import numpy as np
>>> a = np.array([[1, 2], [-3, 4]])
>>> np.max_abs(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid number of arguments
>>> np.max_abs(a, axis=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid number of arguments
>>> np.max_abs(a, -3)
array([[3, 3],
       [3, 4]])

Edit: I see that np.core.umath.max_abs.reduce(a, axis=..., keepdims=...) works! I guess max_abs should be aliased to that reduction somewhere.

@ewmoore
Copy link
Contributor

ewmoore commented Jan 6, 2015

No, it works now. Try np.max_abs.reduce(arr)

On Tuesday, January 6, 2015, argriffing notifications@github.com wrote:

Thanks, this looks like a great start! Leaving aside the docstring and the
timedelta and complex dtypes, this would need more hacking before it
actually does a reduction rather than just a binary operation, right?

import numpy as np>>> a = np.array([[1, 2], [-3, 4]])>>> np.max_abs(a)
Traceback (most recent call last):
File "", line 1, in ValueError: invalid number of arguments>>> np.max_abs(a, axis=0)
Traceback (most recent call last):
File "", line 1, in ValueError: invalid number of arguments>>> np.max_abs(a, -3)
array([[3, 3],
[3, 4]])


Reply to this email directly or view it on GitHub
#5420 (comment).

@argriffing
Copy link
Contributor Author

Yes, this is great, it works analogously to np.maximum. The np.amax wrapper of the maximum ufunc is defined here (_methods._amax is numpy.core.umath.maximum.reduce). I was wondering if max_abs should also have a wrapper. Or maybe the current max_abs could be renamed to maximum_absolute and its wrapper could be called max_abs or amax_abs?

@ewmoore
Copy link
Contributor

ewmoore commented Jan 7, 2015

I should point out, that on a glibc system, this is actually slower than the float64 path in np.linalg.norm for the inf norm until you get to very large blocks of vectors. I'm fairly sure that this is entirely due to @juliantaylor's excellent work adding vector code paths (#3419). I'm not sure off hand how to produce a build that disables those paths to check that. For complex128, where there are no SIMD code paths it does handily beat norm for relatively small sizes.

This means that for this to really be a viable replacement in norm it almost certainly needs to be vectorized. This shouldn't be too bad, since this can build from the SIMD loops for maximum that already exist.

Alex, do you want to work on this? I made the branch that I linked to above mostly as a demonstration for adding the ufunc since I touched that code before. I can in principle work on it or parts of it, but I don't want to step on your toes. And in fairness, I can't commit to a timeline nor have I written code using vector intrinsics before and I'll probably work on getting scipy/scipy#4249 merged first.

@juliantaylor
Copy link
Contributor

as implemented non-vectorized code has to branch due to the nan check which likely makes it slower, this could be handled via the fpu flags like the vector code does, but of course its better to just also vectorize it.
I can do this when something is ready (we need the non-vector code in any case for other architectures) or I can help out it if someone is interested.

@oldclesleycode
Copy link

what was the conclusion as to how to handle vectors of size 0?

@eric-wieser
Copy link
Member

I think this is a duplicate of #3763

@seberg
Copy link
Member

seberg commented Nov 18, 2019

Closing as duplicate as noted by Eric. (inf norm on empty array raising error since it calls max([])). It sounds like there may have been more issues here, but those have been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants