[inductor] dont reuse buffers if it affects peak (#145883) #159530

v0i0 · 2025-07-30T23:38:21Z

Stack from ghstack (oldest at bottom):

-> [inductor] dont reuse buffers if it affects peak (#145883) #159530

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

[ghstack-poisoned]

pytorch-bot · 2025-07-30T23:38:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159530

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 45698e2 with merge base 2507ae6 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable) (gh) (#158876)
/var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:476:14: Compiling torch_xla/csrc/runtime/xla_util_test.cpp failed: (Exit 1): gcc failed: error executing CppCompile command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 229 arguments skipped)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: bfdb45a Pull Request resolved: #159530

v0i0 · 2025-07-30T23:43:13Z

@pytorchbot label "topic: not user facing"

eellison · 2025-07-31T16:49:18Z

torch/_inductor/codegen/wrapper.py

+        ) * get_dtype_size(self.node.get_dtype())
+        if free_line_scheduler_node >= self_scheduler_node:
+            return False
+        peak_memory_in_range = max(


after we reuse a buffer, we need update the memory of nodes for its reuse window

eellison · 2025-07-31T16:54:11Z

torch/_inductor/codegen/wrapper.py

+            peak_memory_per_scheduler_node[free_line_scheduler_node:self_scheduler_node]
+        )


This will be potentially O(n^2), because at each node we are iterating through O(n) nodes.

Is there an O(n log n) solution we could do ? From looking around a bit - maybe https://en.wikipedia.org/wiki/Fenwick_tree ? note - I haven't looked especially closely at this yet.

If we can't figure out an O(n log n) solution we could also potentially do a sliding window, or add other heuristics like disallow buffer reuse of tensors if they are above a certain size.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 0a8f049 Pull Request resolved: #159530

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: caae545 Pull Request resolved: #159530

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 26fbe62 Pull Request resolved: #159530

[ghstack-poisoned]

ghstack-source-id: 74f2dec Pull Request resolved: #159530

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 0178bf6 Pull Request resolved: #159530

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 1b81d20 Pull Request resolved: #159530

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: a220714 Pull Request resolved: #159530

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: a632bb5 Pull Request resolved: #159530

eellison

looks good, a couple questions about the tree. would you mind doing one dashboard run ? I believe we expect to see memory improvmenets in timm benchmark.

eellison · 2025-08-09T01:52:39Z

torch/_inductor/codegen/segmented_tree.py

+        if lazy_node is not None:
+            # Apply lazy update to current node


nit: early return instead of nesting ?

eellison · 2025-08-09T01:54:57Z

torch/_inductor/codegen/segmented_tree.py

+        self.size = 1
+        while self.size < self.n:
+            self.size *= 2
+        self.size *= 2


is it worth describing the data layout of the tree

[1 ... n ] base values then [1... n // 2, n // 2... n //4 ] ?

the opposite. will add a comment

eellison · 2025-08-09T01:56:29Z

torch/_inductor/codegen/segmented_tree.py

+
+        # Initialize tree and lazy arrays
+        self.tree = [identity_element] * self.size
+        self.lazy: list[Optional[T]] = [None] * self.size


nit: describe what the lazy array will do ?

added a comment

eellison · 2025-08-09T01:56:47Z

torch/_inductor/codegen/segmented_tree.py

+        self._build(values, right_child, mid + 1, end)
+
+        # Update current node with summary of children
+        self.tree[node] = self.summary_op(self.tree[left_child], self.tree[right_child])


Nice to use the identity element instead of having to handle special cases.

oh i meant, i think this is what the code is already doing, right.

eellison · 2025-08-09T02:10:05Z

torch/_inductor/codegen/wrapper.py

+            ) * get_dtype_size(self.node.get_dtype())
+            if self.should_reuse_buffer(free_line, size):
+                free_line.is_reused = True
+                self.wrapper.estimate_peak.update_peak_between(free_line, self)


So i guess the difference is, it used to be queries are O(n), now just updates are O(n). is that correct ?

both are O(log n): consider the binary tree of nodes, and an interval between two nodes. to process the interval, we need to look at the path from the tree root to the left and the right boundary of the interval (2 * log n). For updates, it update those nodes and their direct neighbors within the interval. For queries, it reads those nodes, and potentially pushes lazy updates to the direct neighbors. So both are 2 * 2 * log_2 n.

eellison · 2025-08-09T02:12:16Z

torch/_inductor/codegen/segmented_tree.py

+            return
+
+        mid = (start + end) // 2
+        left_child = 2 * node


Thinking aloud, Do we need both _push_lazy and build ? I guess build could be replaced by iteratively _push_lazy each element in values ?

i'd keep it as is. replacing build by update_range/push_lazy turns it from O(n) to O(n log n).

eellison · 2025-08-11T23:26:40Z

torch/_inductor/codegen/segmented_tree.py

+                left_child = 2 * node
+                right_child = 2 * node + 1
+
+                # Propagate to children
+                lazy_left_child = self.lazy[left_child]
+                if lazy_left_child is None:
+                    self.lazy[left_child] = lazy_node
+                else:
+                    self.lazy[left_child] = self.update_op(lazy_left_child, lazy_node)
+
+                lazy_right_child = self.lazy[right_child]
+                if lazy_right_child is None:
+                    self.lazy[right_child] = lazy_node
+                else:
+                    self.lazy[right_child] = self.update_op(lazy_right_child, lazy_node)


nit: for loop over left/right?

eellison · 2025-08-11T23:43:21Z

torch/_inductor/codegen/segmented_tree.py

+                lazy_left_child = self.lazy[left_child]
+                if lazy_left_child is None:
+                    self.lazy[left_child] = value
+                else:
+                    self.lazy[left_child] = self.update_op(lazy_left_child, value)
+
+                lazy_right_child = self.lazy[right_child]
+                if lazy_right_child is None:
+                    self.lazy[right_child] = value
+                else:
+                    self.lazy[right_child] = self.update_op(lazy_right_child, value)


nit: for loop ? also, this seems the same as lines 87-104 above. refactor ?

v0i0 · 2025-08-12T00:16:47Z

Dashboard run here: https://github.com/pytorch/pytorch/actions/runs/16895400336

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: d2c153f Pull Request resolved: #159530

eellison · 2025-08-12T13:40:35Z

torch/_inductor/codegen/wrapper.py

+
+        self.overall_peak_memory, peak_by_scheduler_node = estimate_peak_memory(
+            V.graph.scheduler.nodes,
+            {},


Sorry, one last thing, names_to_freeable_bufs is important for the backward when we need to know which activations will deallocate in order to have an accurate memory estimate

[inductor] dont reuse buffers if it affects peak (#145883)

e65045d

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Jul 30, 2025

v0i0 added a commit that referenced this pull request Jul 30, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

ecdb9ec

ghstack-source-id: bfdb45a Pull Request resolved: #159530

pytorch-bot bot added the topic: not user facing topic category label Jul 30, 2025

v0i0 requested a review from eellison July 30, 2025 23:43

eellison reviewed Jul 31, 2025

View reviewed changes

Update on "[inductor] dont reuse buffers if it affects peak (#145883)"

fee1189

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

v0i0 added a commit that referenced this pull request Jul 31, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

79dba1d

ghstack-source-id: 0a8f049 Pull Request resolved: #159530

Update on "[inductor] dont reuse buffers if it affects peak (#145883)"

ef0a1b8

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

v0i0 added a commit that referenced this pull request Jul 31, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

aa4878d

ghstack-source-id: caae545 Pull Request resolved: #159530

Update on "[inductor] dont reuse buffers if it affects peak (#145883)"

7321ef6

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

v0i0 added a commit that referenced this pull request Aug 2, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

880f6a1

ghstack-source-id: 26fbe62 Pull Request resolved: #159530

Update

a5bd343

[ghstack-poisoned]

v0i0 added a commit that referenced this pull request Aug 5, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

e52e66d

ghstack-source-id: 74f2dec Pull Request resolved: #159530

Update on "[inductor] dont reuse buffers if it affects peak (#145883)"

43f768d

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

v0i0 added a commit that referenced this pull request Aug 7, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

fe19796

ghstack-source-id: 0178bf6 Pull Request resolved: #159530

Update on "[inductor] dont reuse buffers if it affects peak (#145883)"

c88ba1f

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

v0i0 added a commit that referenced this pull request Aug 7, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

10f50c4

ghstack-source-id: 1b81d20 Pull Request resolved: #159530

Update on "[inductor] dont reuse buffers if it affects peak (#145883)"

3b6b12d

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

v0i0 added a commit that referenced this pull request Aug 7, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

f29e41c

ghstack-source-id: a220714 Pull Request resolved: #159530

Update on "[inductor] dont reuse buffers if it affects peak (#145883)"

50ac986

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

v0i0 added a commit that referenced this pull request Aug 7, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

8277b8a

ghstack-source-id: a632bb5 Pull Request resolved: #159530

eellison self-requested a review August 11, 2025 23:16

eellison reviewed Aug 11, 2025

View reviewed changes

Update on "[inductor] dont reuse buffers if it affects peak (#145883)"

45698e2

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

v0i0 added a commit that referenced this pull request Aug 12, 2025

[inductor] dont reuse buffers if it affects peak (#145883)

182d036

ghstack-source-id: d2c153f Pull Request resolved: #159530

eellison reviewed Aug 12, 2025

View reviewed changes

		peak_memory_per_scheduler_node[free_line_scheduler_node:self_scheduler_node]
		)

		if lazy_node is not None:
		# Apply lazy update to current node

[inductor] dont reuse buffers if it affects peak (#145883) #159530

Are you sure you want to change the base?

[inductor] dont reuse buffers if it affects peak (#145883) #159530

Conversation

v0i0 commented Jul 30, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159530

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

v0i0 commented Jul 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v0i0 commented Aug 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

v0i0 commented Jul 30, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 30, 2025 •

edited

Loading