Allow exposing more functions during initial template expansion #159554

charlie-wt · 2025-07-31T08:24:39Z

Also adds a _register_hook utility, and documents & type annotates PartialRender.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

Also adds a `_register_hook` utility, and documents & type annotates PartialRender.

pytorch-bot · 2025-07-31T08:24:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159554

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'

❌ 4 New Failures, 2 Unrelated Failures

As of commit 4c5f0c4 with merge base 2a286cb ():

NEW FAILURES - The following jobs have failed:

pull / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 1, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_causal_mask_cuda
pull / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_pow_2_headdim_head_dim_121_float16_cuda_float16
pull / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 3, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_bool
pull / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 5, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_block_sizes_dynamic_shapes_cuda

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 4, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (similar failure)
inductor/test_torchinductor.py::GPUTests::test_large_block_sizes_cuda

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

charlie-wt · 2025-07-31T08:26:19Z

@pytorchbot label "topic: not user facing"

eellison · 2025-08-05T01:44:43Z

@laithsakka want to take this ?

charlie-wt · 2025-08-06T09:39:32Z

these failures seem to be coming from the assert hook_name not in self.render_hooks in the new TritonTemplateKernel._register_hook, for the hook <ARGDEFS>; sure enough, this was the one place i replaced with a call to _register_hook where there wasn't that assert before. i can have a look in the meantime, but there'll be others more familiar with that part of the code that might be able to tell me if it's intentional that the <ARGDEFS> can be overwritten where the others aren't.

charlie-wt · 2025-08-06T13:16:19Z

have added an allow_overwriting param to _register_hook, only set to True in gen_argdefs—still happy to hear if people have other opinions

charlie-wt · 2025-08-06T16:43:20Z

don't think the 'new failures' relate to me: they're all cudaErrorNoKernelImageForDevices, which i believe are complaining about an incompatibility between the compiled-for cuda version and the one supported by the hardware. that might relate to this earlier log message:

2025-08-06T14:46:54.3732695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:283: UserWarning: 
2025-08-06T14:46:54.3733931Z     Found GPU0 NVIDIA L4 which is of cuda capability 8.9.
2025-08-06T14:46:54.3734423Z     Minimum and Maximum cuda capability supported by this version of PyTorch is
2025-08-06T14:46:54.3734834Z     (5.2) - (5.2)
2025-08-06T14:46:54.3735024Z     
2025-08-06T14:46:54.3735207Z   warnings.warn(
2025-08-06T14:46:54.3735629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:304: UserWarning: 
2025-08-06T14:46:54.3736141Z     Please install PyTorch with a following CUDA
2025-08-06T14:46:54.3736510Z     configurations:  12.6 12.8 12.9 following instructions at
2025-08-06T14:46:54.3736885Z     https://pytorch.org/get-started/locally/
2025-08-06T14:46:54.3737184Z     
2025-08-06T14:46:54.3737384Z   warnings.warn(matched_cuda_warn.format(matched_arches))
2025-08-06T14:46:54.3737821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:326: UserWarning: 
2025-08-06T14:46:54.3738322Z NVIDIA L4 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
2025-08-06T14:46:54.3738769Z The current PyTorch install supports CUDA capabilities sm_52.
2025-08-06T14:46:54.3739259Z If you want to use the NVIDIA L4 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

the single 'flaky' failure is for the same reason too.

locally, two of the 'newly-failing' tests pass but i don't have enough memory for the other two.

laithsakka · 2025-08-11T12:30:21Z

can you explain in the summary more what is the user case of this change, and some context.
for example who is not very familiar with the code.

laithsakka · 2025-08-11T12:35:49Z

torch/_inductor/select_algorithm.py

+        assert hook != "finalized", "hook_key can only be called once"
+        self._code = self._code.replace(hook_key, hook())
+
+        self.replacement_hooks[hook_key] = "finalized"


did you consider using None as proxy for finalized?
MaybeHookFn being Optional[HookFn]

you can use FINALIZED=None.

yeah, to be honest i kept it as something other than None because it actually used to be None but got changed to something else; tbh i'm not totally sure the reason for the change (i can't find any references to FINALIZED_HOOK outside of PartialRender). i just made it a literal cause i couldn't see a way to type-annotate the raw object() nicely.

laithsakka · 2025-08-11T12:38:45Z

torch/_inductor/select_algorithm.py

+
+    def _register_extra_template_env_fns(self, *fns: Callable[..., Any]):
+        """
+        Register some extra functions to expose when performing the initial


can you explain more what "to expose" means here

i guess you could say it means "to make available to be used by jinja expressions": jinja template strings can include expressions inside double curly brackets, and these expressions can include things like calls to python functions. the template_env dictionary being passed to template.render inside TritonTemplateKernel.render is laying out what functions you want to be in scope for any expressions in the template being rendered.

shall i add more info to the comment along these lines?

charlie-wt · 2025-08-11T17:27:43Z

can you explain in the summary more what is the user case of this change, and some context. for example who is not very familiar with the code.

the main use case i had for this was for people implementing their own triton inductor backends, who'd want to reuse most existing constructs but may want to use their own hooks when rendering template kernels; at the moment the list of hooks passed to template.render is hard-coded so you'd need to monkey-patch to make them available during rendering, but this lets you register them more properly.

Allow exposing more functions during initial template expansion

a9d9c67

Also adds a `_register_hook` utility, and documents & type annotates PartialRender.

pytorch-bot bot added the module: inductor label Jul 31, 2025

pytorch-bot bot added the topic: not user facing topic category label Jul 31, 2025

pytorchbot added the open source label Jul 31, 2025

Merge commit '2a286cb' into charliew/register-template-hooks

0356de3

janeyx99 requested a review from eellison August 4, 2025 23:03

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 4, 2025

eellison requested review from laithsakka and removed request for eellison August 5, 2025 01:44

allow overwriting some templates

4c5f0c4

laithsakka reviewed Aug 11, 2025

View reviewed changes

laithsakka approved these changes Aug 11, 2025

View reviewed changes

laithsakka requested a review from kundaMwiza August 11, 2025 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow exposing more functions during initial template expansion #159554

Allow exposing more functions during initial template expansion #159554

charlie-wt commented Jul 31, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 31, 2025 •

edited

Loading

Uh oh!

charlie-wt commented Jul 31, 2025

Uh oh!

eellison commented Aug 5, 2025

Uh oh!

charlie-wt commented Aug 6, 2025 •

edited

Loading

Uh oh!

charlie-wt commented Aug 6, 2025

Uh oh!

charlie-wt commented Aug 6, 2025 •

edited

Loading

Uh oh!

laithsakka commented Aug 11, 2025

Uh oh!

laithsakka Aug 11, 2025 •

edited

Loading

Uh oh!

charlie-wt Aug 11, 2025 •

edited

Loading

Uh oh!

laithsakka Aug 11, 2025

Uh oh!

charlie-wt Aug 11, 2025

Uh oh!

charlie-wt commented Aug 11, 2025

Uh oh!

Uh oh!

Allow exposing more functions during initial template expansion #159554

Are you sure you want to change the base?

Allow exposing more functions during initial template expansion #159554

Conversation

charlie-wt commented Jul 31, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159554

❗ 1 Active SEVs

❌ 4 New Failures, 2 Unrelated Failures

Uh oh!

charlie-wt commented Jul 31, 2025

Uh oh!

eellison commented Aug 5, 2025

Uh oh!

charlie-wt commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charlie-wt commented Aug 6, 2025

Uh oh!

charlie-wt commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laithsakka commented Aug 11, 2025

Uh oh!

laithsakka Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlie-wt Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laithsakka Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

charlie-wt Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

charlie-wt commented Aug 11, 2025

Uh oh!

Uh oh!

charlie-wt commented Jul 31, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 31, 2025 •

edited

Loading

charlie-wt commented Aug 6, 2025 •

edited

Loading

charlie-wt commented Aug 6, 2025 •

edited

Loading

laithsakka Aug 11, 2025 •

edited

Loading

charlie-wt Aug 11, 2025 •

edited

Loading