Skip to content

API: Remove broadcasting ambiguity from np.linalg.solve #25914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 6, 2024

Conversation

asmeurer
Copy link
Member

@asmeurer asmeurer commented Mar 1, 2024

Previously the np.linalg.solve documentation stated:

a : (..., M, M) array_like
    Coefficient matrix.
b : {(..., M,), (..., M, K)}, array_like

however, this is inherently ambiguous. For example, if a has shape (2, 2, 2) and b has shape (2, 2), b could be treated as a (2,) stack of (2,) column vectors, in which case the result should have shape (2, 2), or as a single 2x2 matrix, in which case, the result should have shape (2, 2, 2).

NumPy resolved this ambiguity in a confusing way, which was to treat b as (..., M) whenever b.ndim == a.ndim - 1, and as (..., M, K) otherwise.

A much more consistent way to handle this ambiguity is to treat b as a single vector if and only if it is 1-dimensional, i.e., use

b : {(M,), (..., M, K)}, array_like

This is consistent with, for instance, the matmul operator, which only uses the special 1-D vector logic if an operand is exactly 1-dimensional, and treats the operands as (stacks of) 2-D matrices otherwise.

This updates np.linalg.solve() to use this behavior.

This is a backwards compatibility break, as any instance where the b array has more than one dimension and exactly one fewer dimension than the a array will now use the matrix logic, potentially returning a different result with a different shape. The previous behavior can be manually emulated with something like

np.solve(a, b[..., None])[..., 0]

since b as a (M,) vector is equivalent to b as a (M, 1) matrix (or the user could manually import and use the internal solve1() gufunc which implements the b-as-vector logic).

This change aligns the solve() function with the array API, which resolves this broadcasting ambiguity in this way. See
https://data-apis.org/array-api/latest/extensions/generated/array_api.linalg.solve.html#array_api.linalg.solve and data-apis/array-api#285.

Fixes #15349
Fixes #25583

Previously the np.linalg.solve documentation stated:

a : (..., M, M) array_like
    Coefficient matrix.
b : {(..., M,), (..., M, K)}, array_like

however, this is inherently ambiguous. For example, if a has shape (2, 2, 2)
and b has shape (2, 2), b could be treated as a (2,) stack of (2,) column
vectors, in which case the result should have shape (2, 2), or as a single 2x2
matrix, in which case, the result should have shape (2, 2, 2).

NumPy resolved this ambiguity in a confusing way, which was to treat b as
(..., M) whenever b.ndim == a.ndim - 1, and as (..., M, K) otherwise.

A much more consistent way to handle this ambiguity is to treat b as a single
vector if and only if it is 1-dimensional, i.e., use

b : {(M,), (..., M, K)}, array_like

This is consistent with, for instance, the matmul operator, which only uses
the special 1-D vector logic if an operand is exactly 1-dimensional, and
treats the operands as (stacks of) 2-D matrices otherwise.

This updates np.linalg.solve() to use this behavior.

This is a backwards compatibility break, as any instance where the b array has
more than one dimension and exactly one fewer dimension than the a array will
now use the matrix logic, potentially returning a different result with a
different shape. The previous behavior can be manually emulated with something
like

np.solve(a, b[..., None])[..., 0]

since b as a (M,) vector is equivalent to b as a (M, 1) matrix (or the user
could manually import and use the internal solve1() gufunc which implements
the b-as-vector logic).

This change aligns the solve() function with the array API, which resolves
this broadcasting ambiguity in this way. See
https://data-apis.org/array-api/latest/extensions/generated/array_api.linalg.solve.html#array_api.linalg.solve
and data-apis/array-api#285.

Fixes numpy#15349
Fixes numpy#25583
@asmeurer
Copy link
Member Author

asmeurer commented Mar 2, 2024

spin test is passing for me locally, but doesn't seem to be including the tests that are failing on CI. Is spin test not the proper way to run the test suite?

@rgommers
Copy link
Member

rgommers commented Mar 3, 2024

spin test is passing for me locally, but doesn't seem to be including the tests that are failing on CI. Is spin test not the proper way to run the test suite?

It is. spin test -m full -- numpy/linalg/tests/test_linalg.py -k TestSolve will reproduce the failures. When tests are marked with @pytest.mark.slow they only run when you add the -m full.

@rgommers
Copy link
Member

rgommers commented Mar 3, 2024

The failures are for:

>>> a = np.ones((3, 0, 0))
>>> b = np.ones((3, 0))

>>> np.linalg.solve(a, b)  # with `main`
array([], shape=(3, 0), dtype=float64)

>>> np.linalg.solve(a, b)  # with this PR
ValueError: solve: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)->(m,n) (size 3 is different from 0)

Second failure is the same, but then with shapes (3, 2, 2) / (3, 2).

@seberg seberg added this to the 2.0.0 release milestone Mar 4, 2024
@asmeurer
Copy link
Member Author

asmeurer commented Mar 4, 2024

That's really annoying. Why are the full tests not run by default? The full test suite only takes two and half minutes to run on my machine. And this option isn't even documented in spin test -h.

@asmeurer
Copy link
Member Author

asmeurer commented Mar 4, 2024

The linalg tests are formulated in a rather odd way. All the test cases have b matrices, but almost no tests use them. Only solve and lstsq do, and of them, only solve uses these particular "generalized" cases where a and b are stacked. All the other cases take b as an argument but ignore it, e.g.,

class InvCases(LinalgSquareTestCase, LinalgGeneralizedSquareTestCase):
def do(self, a, b, tags):
a_inv = linalg.inv(a)
assert_almost_equal(dot_generalized(a, a_inv),
identity_like_generalized(a))
assert_(consistent_subclass(a_inv, a))

asmeurer added 2 commits March 4, 2024 16:11
The dot_generalized() function is no longer needed. It was primarily there to
handle the odd case for solve(), but the updated logic for that is now put in
TestSolve itself, and dot_generalized is replaced with matmul() everywhere
else.
@asmeurer
Copy link
Member Author

asmeurer commented Mar 4, 2024

OK, I've updated the tests. The linalg test cases had some odd dot_generalized helper which seemed to exist primarily to handle the odd solve() behavior. I've replaced it with matmul everywhere and moved the (updated) special handling of the solve logic into the TestSolve test itself.

The linalg test cases are still not great, but I'm not sure I'm up for trying to improve them here. Like I said, they all oddly reuse test cases with a and b even for functions that only test a. A probably more pertinent issue is that none of them really test broadcasting at all. I suppose that sort of thing might be standard since broadcasting logic is always handled by the (g)ufunc, but in this case, it might be a good idea to test it explicitly since solve() isn't exactly a gufunc, but rather two different gufuncs depending on the dimensionality of b.

@mattip
Copy link
Member

mattip commented Mar 6, 2024

Aside:

And this option isn't even documented in spin test -h

For me spin test --help has the line

By default, spin will run -m 'not slow'. To run the full test suite, use spin -m full

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, there is a difference bewteen the broadcast rules between np.dot and np.matmul, maybe the current behaviour is due to that discrepancy somehow. The rule here and in the Array API is much clearer.

This does need a release note. I sent a message to the mailing list since it is an API change.

@stefanv
Copy link
Contributor

stefanv commented Mar 6, 2024

Agreed that reducing this ambiguity is a good idea, especially since we have the opportunity to do so with 2.0.

@mattip
Copy link
Member

mattip commented Mar 6, 2024

Let's merge this, we can revert if the mailing list conversation gets heated (unlikely).

@mattip mattip merged commit 7fc3d0f into numpy:main Mar 6, 2024
@mattip
Copy link
Member

mattip commented Mar 6, 2024

Thanks @asmeurer

@asmeurer
Copy link
Member Author

asmeurer commented Mar 6, 2024

Ah, I completely forgot that I never wrote a release note for this. Do you still want me to do that, or did you take care of it?

@rgommers
Copy link
Member

rgommers commented Mar 8, 2024

Ah, I completely forgot that I never wrote a release note for this. Do you still want me to do that, or did you take care of it?

For completeness: the release note is included in gh-25937. Thanks for this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants