`tf.linalg.lu_solve` returns inconsistent output from CPU vs GPU

### Issue type

Bug

### Have you reproduced the bug with TensorFlow Nightly?

Yes

### Source

source

### TensorFlow version

2.21.0-dev20250801

### Custom code

Yes

### OS platform and distribution

Linux Ubuntu 24.04

### Mobile device

_No response_

### Python version

3.12

### Bazel version

_No response_

### GCC/compiler version

_No response_

### CUDA/cuDNN version

_No response_

### GPU model and memory

_No response_

### Current behavior?

When running `tf.linalg.lu_solve` with a `float64` tesnor of large (but not overflowing) values, CPU and GPU returns wildly different values.

Also reproducible with Tensorflow `2.19.0`, please take a look at the [gist](https://colab.research.google.com/gist/jiren-the-gray/d9e7de1423286e756a272e7a3d263a74/tf_playground.ipynb).

### Standalone code to reproduce the issue

```shell
import tensorflow as tf
import numpy as np

print("TensorFlow version:", tf.__version__)	# TensorFlow version: 2.21.0-dev20250801

rng = np.random.default_rng(957)

lower_upper = tf.constant(rng.uniform(-9223372036854772000., -9223372036854771000., size=(3, 3)), dtype=tf.float64)
perm = tf.constant(rng.uniform(0., 0., size=(3,)), dtype=tf.int64)
rhs = tf.constant(rng.uniform(-100., 100., size=(3, 3)), dtype=tf.float64)
validate_args = True
name = "constant"

with tf.device("/CPU:0"):
        result = tf.linalg.lu_solve(
            lower_upper=lower_upper,
            perm=perm,
            rhs=rhs,
            validate_args=validate_args,
            name=name
        )
        print("Result (CPU):\n", result.numpy()[0, 2])

with tf.device("/GPU:0"):
        result = tf.linalg.lu_solve(
            lower_upper=lower_upper,
            perm=perm,
            rhs=rhs,
            validate_args=validate_args,
            name=name
        )
        print("Result (GPU):\n", result.numpy()[0, 2])
```

### Relevant log output

```shell
...
TensorFlow version: 2.21.0-dev20250801
...
Result (CPU):
 -0.0
Result (GPU):
 -47201.82360354644
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`tf.linalg.lu_solve` returns inconsistent output from CPU vs GPU #98345

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tf.linalg.lu_solve returns inconsistent output from CPU vs GPU #98345

Description

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`tf.linalg.lu_solve` returns inconsistent output from CPU vs GPU #98345