Skip to content

Fix _fill_or_add_to_diagonal when reshape returns copy #31445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

lucyleeow
Copy link
Member

@lucyleeow lucyleeow commented May 28, 2025

Reference Issues/PRs

Related https://github.com/scikit-learn/scikit-learn/pull/29822/files#r2101935180

What does this implement/fix? Explain your changes.

When array is not C-contiguous (e.g., after transpose), reshape must create a copy, and cannot return a view. Thus we need to return reshaped array_flat, instead of not returning anything. (i.e. we cannot rely on modifying the view, as array_flat is not always a view). Also it is advised to avoid mutating a view: https://data-apis.org/array-api/latest/design_topics/copies_views_and_mutation.html

Note numpy.fill_diagonal avoids this problem as it uses .flat instead of reshape

I checked that the new test fails with main, but passes with the new changes.

Edit: I guess this didn't get picked up previously with test_pairwise_parallel (which checks n_jobs=1 and n_jobs=2 give the same results), as we created the array with order=F instead of transposing, which does not seem to cause the same view/copy problem.

Any other comments?

@lesteve you were right, it was completely due to the transpose/C->F order change. The failures/passing tests found in test_pairwise_parallel_array_api were probably because the numerical instability in euclidean calculation varied. e.g., I think may have passed with array api strict float32 but not float64 just because with lower precision float32, the numerical instability was under tolerance?

cc @ogrisel as you reviewed #29822

Copy link

github-actions bot commented May 28, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 7c237dc. Link to the linter CI: here

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

lucyleeow and others added 3 commits May 29, 2025 12:42
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @lucyleeow

@@ -557,6 +557,9 @@ def _fill_or_add_to_diagonal(array, value, xp, add_value=True, wrap=False):
array_flat[:end:step] += value
else:
array_flat[:end:step] = value
# When `array` is not C-contiguous, `reshape` must create
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# When `array` is not C-contiguous, `reshape` must create
# When `array` is not C-contiguous, `reshape(array, (-1,))` creates

@lesteve
Copy link
Member

lesteve commented Jun 3, 2025

Maybe we should go back to using .flat for numpy and a for loop for non-numpy, similarly to what it was before #29433?

I have a PoC in a branch: https://github.com/lesteve/scikit-learn/blob/4a270381c2c21c715f58923893df87b8c503af0f/sklearn/utils/_array_api.py#L530-L575

The main arguments I have for using .flat/loop (vs reshape + use return value):

  • avoid a copy in the numpy case for a non C-contiguous array. This also avoids a copy for other namespaces but needs to do a for loop.
  • slightly less tricky code maybe? I guess this avoids the caveat that .reshape can be a copy in non C-contiguous arrays.
  • slightly harder to misuse. It's kind of easy to think that _fill_or_add_diagonal is in-place like np.fill_diagonal and not use the return value. This will mostly work in tests because array will be C-contiguous but will be broken in non C-contiguous cases as we discovered.

@lucyleeow
Copy link
Member Author

lucyleeow commented Jun 5, 2025

Thanks @lesteve ! You make some good points, I'm pretty convinced, especially by your last point. I had a look at your implementation, which is clever but I thought it was a bit confusing with the assignment_function.

I created another implementation option and tested the performance your implementation, the original implementation and my new implemenation. Here are the functions in a gist. Summary of the 6 functions tested:

  • fill_reshape and add_reshape -> using the original _fill_or_add_to_diagonal, which is named _fill_or_add_to_diagonal_reshape in the gist, as it uses reshape. I checked with both add_value as True and False.
  • fill_loop and add_loop -> my implementation that updates value (with diagonal() + value when add_value=True so we always assign with arr[i,i] = value[i]). I checked with both add_value` as True and False
  • fill_helper and add_helper your implemention, using the _fill_diagonal_helper

With array (xp.zeros) of size 1e4, 1e4 (note I couldn't make the array much bigger as it resulted in maxing out all the memory in my colab session):

Benchmarking with array shape (10000, 10000)

Timing Results (seconds):
------------------------------------------------------------
Function             NumPy           CuPy           
------------------------------------------------------------
fill_reshape               0.000449       0.001825
add_reshape                0.000523       0.001568
fill_loop                  0.000853       0.157981
add_loop                   0.000919       0.106993
fill_helper                0.000714       0.102225
add_helper                 0.001321       0.178947

With transposed array (xp.zeros) of size 1e4, 1e4:

Benchmarking with array shape (10000, 10000)

Timing Results (seconds):
------------------------------------------------------------
Function             NumPy           CuPy           
------------------------------------------------------------
fill_reshape               1.268526       0.017735
add_reshape                1.268547       0.017209
fill_loop                  0.000769       0.106146
add_loop                   0.001124       0.101842
fill_helper                0.001136       0.101706
add_helper                 0.001776       0.180543

I also tried to measure the difference in peak memory usage with numpy with array and transposed array with memory_profiler.

C-contiguous array:

Profiling normal array:
Array info:
  Strides: (80000, 8)
  C-contiguous: True
  F-contiguous: False

Memory profile:
  Initial memory: 1010.2 MB
  Peak memory:    1010.6 MB
  Memory increase: 0.4 MB
  Final memory:   1010.6 MB

Transposed F-contiguous array

Profiling transposed array:
Array info:
  Strides: (8, 80000)
  C-contiguous: False
  F-contiguous: True

Memory profile:
  Initial memory: 1010.6 MB
  Peak memory:    1773.6 MB
  Memory increase: 763.0 MB
  Final memory:   1773.6 MB

Performance:

  • Transposed array slows down the 'reshape' implementation but no consistent change to other 2 implementations that use loops
  • The 'reshape' implementation is much faster for non transposed arrays but peak memory is higher
  • My implementation is faster than yours when adding value but yours is faster for fill value

wrap

  • Currently we never use wrap=True.
  • It would be difficult to implement wrap=True using diagonal() + value as diagonal does not have a wrap parameter

Maybe we could have 2 separate functions, with fill_diagonal supporting wrap (just because numpy does) and add_to_diagonal not supporting wrap ?

Summary:

reshape is much faster for C-contiguous arrays, which is a shame, but I think I agree with your points (note that we avoid copy also for torch, which also returned a copy). Let me push a suggestion using my implementation for add and yours for fill.

I have now gone down this rabbit hole way too far 😅

Copy link
Member Author

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realised that wrap is a lot harder to implement in a loop,

I think I would vote for having a minimal implementation, that does not include things that we never use - wrap and 'repeating' value (see comment below).

wrap would be easier to implement than the 'repeating' value, but we currently never use it and maybe I would go with YAGNI. We can always amend, if in some future case, it becomes necessary.

Edit: note that for fill_loop and add_loop (in the performance checks above), I've used numpy.fill_diagonal directly, so current implementation should be slower than that shown above)

if add_value:
array_flat[:end:step] += value
if _is_numpy_namespace(xp):
assignment_function(array.flat, slice(None, end, step), value)
Copy link
Member Author

@lucyleeow lucyleeow Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realised that this implementation technically differs from our loop implementation, because if value was 1D but shorter than the diagonal, value would be repeated to fill all entries, as is done in numpy see docs:

If array-like, the flattened val is written along the diagonal, repeating if necessary to fill all diagonal entries.

(Though looking at all our use cases, value should always be the correct size)

With the loop implementation, there is no repeating.

I am tempted to use the loop, even for numpy namespace, just to ensure consistent results...

@lucyleeow
Copy link
Member Author

lucyleeow commented Jun 6, 2025

Test failures are because array-api-strict doesn't have diagonal which is technically in the linalg section.
The array api compat has:

# xp.diagonal and xp.trace operate on the first two axes whereas these
# operates on the last two
def diagonal(x: Array, /, xp: Namespace, *, offset: int = 0, **kwargs: object) -> Array:
return xp.diagonal(x, offset=offset, axis1=-2, axis2=-1, **kwargs)

which does not work for array api strict...?

Edit: Ignore this, I did not realise there was a linalg extension 😬

array[i, i] = value
else:
for i in range(min_rows_columns):
array[i, i] = value[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
array[i, i] = value[i]
array[i, i] = value[i]
return array

We use this as x = _fill_or_add_to_diagonal() so we need to return something (or change how we use it). Otherwise x ends up being None.

Copy link
Member

@ogrisel ogrisel Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Furthermore, for functions that mutate the values of the array passed as argument, we might want to use "inplace" in their name to avoid bad surprises: _inplace_fill_or_add_to_diagonal.

Finally, we might want to anticipate the fact that __setitem__ is not supported by jax. The alternative would be to use the .at operator of array-api-extra (https://data-apis.org/array-api-extra/generated/array_api_extra.at.html) to anticipate the support of libraries that do not support inplace operations but instead emulate them.

BTW, I wouldn't mind splitting this helper function into two functions:

  • _inplace_add_to_diagonal
  • _inplace_fill_diagonal

I think that would help make the callers' code more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we decided to change this to be in-place because:

It's kind of easy to think that _fill_or_add_diagonal is in-place like np.fill_diagonal and not use the return value. This will mostly work in tests because array will be C-contiguous but will be broken in non C-contiguous cases as we discovered.

(from above)

I was not familiar with numpy diagonal before, so I can't comment on whether this would be a common mistake that could be made.
I do see that most functions in _array_api.py do return a value, but I think the original numpy functions also did, unlike np.fill_diagonal, which is in-place

@lesteve may like to comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants