[cuDNN] cuDNN frontend for LayerNorm RMSNorm #159682

AaronWang04 · 2025-08-01T22:13:32Z

cuDNN performance as of 9.10 is still not great

if it ever becomes good, this PR could streamline process of adding experimental cudnn backend for layernorm and rmsnorm

Follows examples at: https://github.com/NVIDIA/cudnn-frontend/tree/main/samples/cpp/norm

pytorch-bot · 2025-08-01T22:13:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159682

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'

❌ 2 New Failures, 2 Unrelated Failures

As of commit 6030457 with merge base aeb5321 ():

NEW FAILURES - The following jobs have failed:

Lint / Link checks / lint-urls / linux-job (gh)
RuntimeError: Command docker exec -t 0db45f368af7ed9be81055b6f89246092c6e7ec72e24019d1976072fa59d3307 /exec failed with exit code 1
pull / linux-jammy-rocm-py3.10 / build (gh)
ninja: build stopped: subcommand failed

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

Check Labels / Check labels (gh) (#159894)
pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
/var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:476:14: Compiling torch_xla/csrc/runtime/xla_util_test.cpp failed: (Exit 1): gcc failed: error executing CppCompile command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 229 arguments skipped)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-08-01T22:14:07Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Skylion007 · 2025-08-03T16:11:40Z

FYI: @eqy had a stale PR doing it

eqy · 2025-08-03T16:54:51Z

haha yes this is based on that

Skylion007 · 2025-08-04T15:48:59Z

aten/src/ATen/native/cudnn/LayerNorm_v8.cpp

+};
+
+void setLayerNormParams(LayerNormParams& params, const Tensor& X, int64_t M, int64_t N) {
+  memset(&params, 0, sizeof(params));


Use stdlib std::memset

Skylion007 · 2025-08-04T15:50:10Z

aten/src/ATen/native/cudnn/LayerNorm_v8.cpp

+      {Y_fe, Y->data_ptr()}};
+    variant_pack = std::move(variant_pack_);
+    auto result = std::make_tuple(layernorm_graph, X_fe, mean_fe, inv_variance_fe, scale_fe, bias_fe, Y_fe);
+    layernorm_forward_graph_cache.update(key, result);


Can result not be moved here too?

Skylion007 · 2025-08-04T15:50:44Z

aten/src/ATen/native/cudnn/LayerNorm_v8.cpp

+      {DX_fe, dX->data_ptr()}};
+    variant_pack = std::move(variant_pack_);
+    auto result = std::make_tuple(layernorm_graph, X_fe, DY_fe, mean_fe, inv_variance_fe, scale_fe, dscale_fe, dbias_fe, DX_fe);
+    layernorm_backward_graph_cache.update(key, result);


Skylion007 · 2025-08-04T15:51:01Z

aten/src/ATen/native/cudnn/RMSNorm_v8.cpp

+};
+
+void setRMSNormParams(RMSNormParams& params, const Tensor& X, int64_t M, int64_t N) {
+  memset(&params, 0, sizeof(params));


Skylion007 · 2025-08-04T15:51:21Z

aten/src/ATen/native/cudnn/RMSNorm_v8.cpp

+  return &(it->second);
+}
+
+void update(const KeyType& key, T& results) {


Should use perfect forwarding, no?

Skylion007 · 2025-08-04T15:51:41Z

aten/src/ATen/native/cudnn/RMSNorm_v8.cpp

+      {DX_fe, dX->data_ptr()}};
+    variant_pack = std::move(variant_pack_);
+    auto result = std::make_tuple(rmsnorm_graph, X_fe, DY_fe, inv_variance_fe, scale_fe, dscale_fe, DX_fe);
+    rmsnorm_backward_graph_cache.update(key, result);


AaronWang04 · 2025-08-05T21:10:43Z

@pytorchbot rebase

pytorchmergebot · 2025-08-05T21:12:15Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-08-05T21:12:18Z

Successfully rebased cudnn_layer_norm onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cudnn_layer_norm && git pull --rebase)

pytorchbot added the open source label Aug 1, 2025

Skylion007 reviewed Aug 4, 2025

View reviewed changes

AaronWang04 added 5 commits August 5, 2025 21:12

wip

195a845

epsilojn change?

77e4411

idek man

c53acdd

it works i think

b86d944

lint

6030457

pytorchmergebot force-pushed the cudnn_layer_norm branch from d4144f0 to 6030457 Compare August 5, 2025 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cuDNN] cuDNN frontend for LayerNorm RMSNorm #159682

[cuDNN] cuDNN frontend for LayerNorm RMSNorm #159682

Uh oh!

AaronWang04 commented Aug 1, 2025

Uh oh!

pytorch-bot bot commented Aug 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

Skylion007 commented Aug 3, 2025

Uh oh!

eqy commented Aug 3, 2025

Uh oh!

Skylion007 Aug 4, 2025

Uh oh!

Skylion007 Aug 4, 2025

Uh oh!

Skylion007 Aug 4, 2025

Uh oh!

Skylion007 Aug 4, 2025

Uh oh!

Skylion007 Aug 4, 2025

Uh oh!

Skylion007 Aug 4, 2025

Uh oh!

AaronWang04 commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

Uh oh!

[cuDNN] cuDNN frontend for LayerNorm RMSNorm #159682

Are you sure you want to change the base?

[cuDNN] cuDNN frontend for LayerNorm RMSNorm #159682

Uh oh!

Conversation

AaronWang04 commented Aug 1, 2025

Uh oh!

pytorch-bot bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159682

❗ 1 Active SEVs

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

github-actions bot commented Aug 1, 2025

This PR needs a release notes: label

Uh oh!

Skylion007 commented Aug 3, 2025

Uh oh!

eqy commented Aug 3, 2025

Uh oh!

Skylion007 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

AaronWang04 commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 1, 2025 •

edited

Loading

This PR needs a `release notes:` label