Fix `_fill_or_add_to_diagonal` when `reshape` returns copy #31445

lucyleeow · 2025-05-28T10:21:52Z

Reference Issues/PRs

Related https://github.com/scikit-learn/scikit-learn/pull/29822/files#r2101935180

What does this implement/fix? Explain your changes.

When array is not C-contiguous (e.g., after transpose), reshape must create a copy, and cannot return a view. Thus we need to return reshaped array_flat, instead of not returning anything. (i.e. we cannot rely on modifying the view, as array_flat is not always a view). Also it is advised to avoid mutating a view: https://data-apis.org/array-api/latest/design_topics/copies_views_and_mutation.html

Note numpy.fill_diagonal avoids this problem as it uses .flat instead of reshape

I checked that the new test fails with main, but passes with the new changes.

Edit: I guess this didn't get picked up previously with test_pairwise_parallel (which checks n_jobs=1 and n_jobs=2 give the same results), as we created the array with order=F instead of transposing, which does not seem to cause the same view/copy problem.

Any other comments?

@lesteve you were right, it was completely due to the transpose/C->F order change. The failures/passing tests found in test_pairwise_parallel_array_api were probably because the numerical instability in euclidean calculation varied. e.g., I think may have passed with array api strict float32 but not float64 just because with lower precision float32, the numerical instability was under tolerance?

cc @ogrisel as you reviewed #29822

github-actions · 2025-05-28T10:23:03Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 7c237dc. Link to the linter CI: here}

thomasjpfan

Overall looks good.

sklearn/utils/tests/test_array_api.py

sklearn/utils/_array_api.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

OmarManzoor

LGTM. Thanks @lucyleeow

OmarManzoor · 2025-05-29T05:21:36Z

sklearn/utils/_array_api.py

@@ -557,6 +557,9 @@ def _fill_or_add_to_diagonal(array, value, xp, add_value=True, wrap=False):
        array_flat[:end:step] += value
    else:
        array_flat[:end:step] = value
+    # When `array` is not C-contiguous, `reshape` must create


Suggested change

# When `array` is not C-contiguous, `reshape` must create

# When `array` is not C-contiguous, `reshape(array, (-1,))` creates

lesteve · 2025-06-03T10:40:57Z

Maybe we should go back to using .flat for numpy and a for loop for non-numpy, similarly to what it was before #29433?

I have a PoC in a branch: https://github.com/lesteve/scikit-learn/blob/4a270381c2c21c715f58923893df87b8c503af0f/sklearn/utils/_array_api.py#L530-L575

The main arguments I have for using .flat/loop (vs reshape + use return value):

avoid a copy in the numpy case for a non C-contiguous array. This also avoids a copy for other namespaces but needs to do a for loop.
slightly less tricky code maybe? I guess this avoids the caveat that .reshape can be a copy in non C-contiguous arrays.
slightly harder to misuse. It's kind of easy to think that _fill_or_add_diagonal is in-place like np.fill_diagonal and not use the return value. This will mostly work in tests because array will be C-contiguous but will be broken in non C-contiguous cases as we discovered.

lucyleeow · 2025-06-05T05:48:14Z

Thanks @lesteve ! You make some good points, I'm pretty convinced, especially by your last point. I had a look at your implementation, which is clever but I thought it was a bit confusing with the assignment_function.

I created another implementation option and tested the performance your implementation, the original implementation and my new implemenation. Here are the functions in a gist. Summary of the 6 functions tested:

fill_reshape and add_reshape -> using the original _fill_or_add_to_diagonal, which is named _fill_or_add_to_diagonal_reshape in the gist, as it uses reshape. I checked with both add_value as True and False.
fill_loop and add_loop -> my implementation that updates value (with diagonal() + value when add_value=True so we always assign with arr[i,i] = value[i]). I checked with both add_value` as True and False
fill_helper and add_helper your implemention, using the _fill_diagonal_helper

With array (xp.zeros) of size 1e4, 1e4 (note I couldn't make the array much bigger as it resulted in maxing out all the memory in my colab session):

Benchmarking with array shape (10000, 10000)

Timing Results (seconds):
------------------------------------------------------------
Function             NumPy           CuPy           
------------------------------------------------------------
fill_reshape               0.000449       0.001825
add_reshape                0.000523       0.001568
fill_loop                  0.000853       0.157981
add_loop                   0.000919       0.106993
fill_helper                0.000714       0.102225
add_helper                 0.001321       0.178947

With transposed array (xp.zeros) of size 1e4, 1e4:

Benchmarking with array shape (10000, 10000)

Timing Results (seconds):
------------------------------------------------------------
Function             NumPy           CuPy           
------------------------------------------------------------
fill_reshape               1.268526       0.017735
add_reshape                1.268547       0.017209
fill_loop                  0.000769       0.106146
add_loop                   0.001124       0.101842
fill_helper                0.001136       0.101706
add_helper                 0.001776       0.180543

I also tried to measure the difference in peak memory usage with numpy with array and transposed array with memory_profiler.

C-contiguous array:

Profiling normal array:
Array info:
  Strides: (80000, 8)
  C-contiguous: True
  F-contiguous: False

Memory profile:
  Initial memory: 1010.2 MB
  Peak memory:    1010.6 MB
  Memory increase: 0.4 MB
  Final memory:   1010.6 MB

Transposed F-contiguous array

Profiling transposed array:
Array info:
  Strides: (8, 80000)
  C-contiguous: False
  F-contiguous: True

Memory profile:
  Initial memory: 1010.6 MB
  Peak memory:    1773.6 MB
  Memory increase: 763.0 MB
  Final memory:   1773.6 MB

Performance:

Transposed array slows down the 'reshape' implementation but no consistent change to other 2 implementations that use loops
The 'reshape' implementation is much faster for non transposed arrays but peak memory is higher
My implementation is faster than yours when adding value but yours is faster for fill value

wrap

Currently we never use wrap=True.
It would be difficult to implement wrap=True using diagonal() + value as diagonal does not have a wrap parameter

Maybe we could have 2 separate functions, with fill_diagonal supporting wrap (just because numpy does) and add_to_diagonal not supporting wrap ?

Summary:

reshape is much faster for C-contiguous arrays, which is a shame, but I think I agree with your points (note that we avoid copy also for torch, which also returned a copy). Let me push a suggestion using my implementation for add and yours for fill.

I have now gone down this rabbit hole way too far 😅

lucyleeow

I've realised that wrap is a lot harder to implement in a loop,

I think I would vote for having a minimal implementation, that does not include things that we never use - wrap and 'repeating' value (see comment below).

wrap would be easier to implement than the 'repeating' value, but we currently never use it and maybe I would go with YAGNI. We can always amend, if in some future case, it becomes necessary.

Edit: note that for fill_loop and add_loop (in the performance checks above), I've used numpy.fill_diagonal directly, so current implementation should be slower than that shown above)

lucyleeow · 2025-06-05T11:34:06Z

sklearn/utils/_array_api.py

-    if add_value:
-        array_flat[:end:step] += value
+    if _is_numpy_namespace(xp):
+        assignment_function(array.flat, slice(None, end, step), value)


I've realised that this implementation technically differs from our loop implementation, because if value was 1D but shorter than the diagonal, value would be repeated to fill all entries, as is done in numpy see docs:

If array-like, the flattened val is written along the diagonal, repeating if necessary to fill all diagonal entries.

(Though looking at all our use cases, value should always be the correct size)

With the loop implementation, there is no repeating.

I am tempted to use the loop, even for numpy namespace, just to ensure consistent results...

lucyleeow · 2025-06-06T12:35:03Z

Test failures are because array-api-strict doesn't have diagonal which is technically in the linalg section.
The array api compat has:

scikit-learn/sklearn/externals/array_api_compat/common/_linalg.py

Lines 203 to 207 in 34e46b0

    
           # xp.diagonal and xp.trace operate on the first two axes whereas these 
        
           # operates on the last two 
        
           def diagonal(x: Array, /, xp: Namespace, *, offset: int = 0, **kwargs: object) -> Array: 
        
               return xp.diagonal(x, offset=offset, axis1=-2, axis2=-1, **kwargs)

which does not work for array api strict...?

Edit: Ignore this, I did not realise there was a linalg extension 😬

betatim · 2025-06-10T13:57:08Z

sklearn/utils/_array_api.py

+            array[i, i] = value
+    else:
+        for i in range(min_rows_columns):
+            array[i, i] = value[i]


Suggested change

array[i, i] = value[i]

array[i, i] = value[i]

return array

We use this as x = _fill_or_add_to_diagonal() so we need to return something (or change how we use it). Otherwise x ends up being None.

I agree. Furthermore, for functions that mutate the values of the array passed as argument, we might want to use "inplace" in their name to avoid bad surprises: _inplace_fill_or_add_to_diagonal.

Finally, we might want to anticipate the fact that __setitem__ is not supported by jax. The alternative would be to use the .at operator of array-api-extra (https://data-apis.org/array-api-extra/generated/array_api_extra.at.html) to anticipate the support of libraries that do not support inplace operations but instead emulate them.

BTW, I wouldn't mind splitting this helper function into two functions:

_inplace_add_to_diagonal

_inplace_fill_diagonal

I think that would help make the callers' code more readable.

So we decided to change this to be in-place because:

It's kind of easy to think that _fill_or_add_diagonal is in-place like np.fill_diagonal and not use the return value. This will mostly work in tests because array will be C-contiguous but will be broken in non C-contiguous cases as we discovered.

(from above)

I was not familiar with numpy diagonal before, so I can't comment on whether this would be a common mistake that could be made.
I do see that most functions in _array_api.py do return a value, but I think the original numpy functions also did, unlike np.fill_diagonal, which is in-place

@lesteve may like to comment?

lucyleeow added 4 commits May 28, 2025 14:10

wip

f954d45

wip

230c9b5

wip

70d46ff

fix fill or add diagonal

7976488

lucyleeow added the Array API label May 28, 2025

github-actions bot added module:decomposition module:metrics module:utils labels May 28, 2025

lucyleeow added the No Changelog Needed label May 28, 2025

merge main

5155245

lucyleeow mentioned this pull request May 29, 2025

Array API support for pairwise kernels #29822

Open

4 tasks

thomasjpfan reviewed May 29, 2025

View reviewed changes

sklearn/utils/tests/test_array_api.py Outdated Show resolved Hide resolved

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

lucyleeow and others added 3 commits May 29, 2025 12:42

Update sklearn/utils/tests/test_array_api.py

2f2fcec

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Update sklearn/utils/_array_api.py

e44e527

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

format

e7ceb53

OmarManzoor approved these changes May 29, 2025

View reviewed changes

lucyleeow added 2 commits May 29, 2025 18:49

review

7e59fe6

format

86bde6c

wip

bc75c42

wip

ad49cc4

lucyleeow commented Jun 5, 2025

View reviewed changes

lucyleeow added 2 commits June 5, 2025 21:46

add back wrap to make tests pass

2e10b83

fix default add_value True

0d0b4cb

linalg diagonal

7c237dc

betatim reviewed Jun 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix `_fill_or_add_to_diagonal` when `reshape` returns copy #31445

Fix `_fill_or_add_to_diagonal` when `reshape` returns copy #31445

Uh oh!

lucyleeow commented May 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 28, 2025 •

edited

Loading

Uh oh!

thomasjpfan left a comment

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Uh oh!

OmarManzoor May 29, 2025

Uh oh!

lesteve commented Jun 3, 2025 •

edited

Loading

Uh oh!

lucyleeow commented Jun 5, 2025 •

edited

Loading

Uh oh!

lucyleeow left a comment •

edited

Loading

Uh oh!

lucyleeow Jun 5, 2025 •

edited

Loading

Uh oh!

lucyleeow commented Jun 6, 2025 •

edited

Loading

Uh oh!

betatim Jun 10, 2025

Uh oh!

ogrisel Jun 10, 2025 •

edited

Loading

Uh oh!

lucyleeow Jun 11, 2025

Uh oh!

Uh oh!

	# When `array` is not C-contiguous, `reshape` must create
	# When `array` is not C-contiguous, `reshape(array, (-1,))` creates

Uh oh!

Fix _fill_or_add_to_diagonal when reshape returns copy #31445

Are you sure you want to change the base?

Fix _fill_or_add_to_diagonal when reshape returns copy #31445

Uh oh!

Conversation

lucyleeow commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor May 29, 2025

Choose a reason for hiding this comment

Uh oh!

lesteve commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucyleeow commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucyleeow left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

betatim Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fix `_fill_or_add_to_diagonal` when `reshape` returns copy #31445

Fix `_fill_or_add_to_diagonal` when `reshape` returns copy #31445

lucyleeow commented May 28, 2025 •

edited

Loading

github-actions bot commented May 28, 2025 •

edited

Loading

lesteve commented Jun 3, 2025 •

edited

Loading

lucyleeow commented Jun 5, 2025 •

edited

Loading

lucyleeow left a comment •

edited

Loading

lucyleeow Jun 5, 2025 •

edited

Loading

lucyleeow commented Jun 6, 2025 •

edited

Loading

ogrisel Jun 10, 2025 •

edited

Loading