[WIP] cast to bf16 before mul op in flex bwd #154922

danielvegamyhre · 2025-06-02T22:43:11Z

fp8 flex attention works for forward but errors in backward due to mul op being supported for fp8 dtypes.

This PR is WIP figuring out the best way to address this.

gist with flex attention backward repro

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot · 2025-06-02T22:43:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154922

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit be3407e with merge base 48807d5 ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_inductor/kernel/flex_attention.py:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3-clang12-executorch / build (gh) (#150261)
Final attempt failed. Child_process exited with error code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-06-02T22:48:41Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

bdhirsh · 2025-06-03T14:52:35Z

torch/_inductor/kernel/flex_attention.py

+    # If mul was upcasted from fp8 to bf16, we need to downcast it back to fp8.
+    if upcast_from_fp8:
+        mul_delta = lowerings[prims.convert_element_type](mul_delta, orig_fp8_dtype)
+
    delta = lowerings[aten.sum](mul_delta, axis=-1)
    delta = lowerings[aten.sub](delta, grad_lse_exp2)


hmm separate from the error you're seeing, are you going to run into similar issues trying to run aten.sum/sub in triton with fp8 inputs? You may need to delay downcasting back to fp8 til after these ops.

eellison

I feel like this is something we could do elsewhere. potentially similar to how we upcast fp16 for unsupported operators.

see

pytorch/torch/_inductor/codegen/triton.py

Line 775 in a1a268a

def maybe_upcast_float32(convert_output: bool = True) -> Callable[[_T], _T]:

although that's just pointwise, not for sum..

github-actions · 2025-08-02T15:37:05Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

danielvegamyhre marked this pull request as draft June 2, 2025 22:43

pytorch-bot bot added ciflow/inductor module: inductor labels Jun 2, 2025

cast to bf16 before mul op in flex bwd

be3407e

danielvegamyhre force-pushed the flex-lowering branch from a15c930 to be3407e Compare June 2, 2025 23:08

danielvegamyhre mentioned this pull request Jun 2, 2025

Inductor codegens invalid fp8 elementwise mul op #154750

Open

bdhirsh reviewed Jun 3, 2025

View reviewed changes

eellison reviewed Jun 3, 2025

View reviewed changes

github-actions bot added the Stale label Aug 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] cast to bf16 before mul op in flex bwd #154922

[WIP] cast to bf16 before mul op in flex bwd #154922

danielvegamyhre commented Jun 2, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

bdhirsh Jun 3, 2025

Uh oh!

eellison left a comment •

edited

Loading

Uh oh!

github-actions bot commented Aug 2, 2025

Uh oh!

Uh oh!

[WIP] cast to bf16 before mul op in flex bwd #154922

Are you sure you want to change the base?

[WIP] cast to bf16 before mul op in flex bwd #154922

Conversation

danielvegamyhre commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154922

❌ 2 New Failures, 1 Unrelated Failure

Uh oh!

github-actions bot commented Jun 2, 2025

This PR needs a release notes: label

Uh oh!

bdhirsh Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

eellison left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 2, 2025

Uh oh!

Uh oh!

danielvegamyhre commented Jun 2, 2025 •

edited

Loading

pytorch-bot bot commented Jun 2, 2025 •

edited

Loading

This PR needs a `release notes:` label

eellison left a comment •

edited

Loading