Allow inplacing buffer when other users are inconsequential #138383

exclamaforte · 2024-10-19T01:11:33Z

Summary:
I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer.

Implements:
#132826

Test Plan:
New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-10-19T01:11:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138383

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e9fe5cc with merge base e080c89 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/_inductor/scheduler.py

eellison

Cool! nice to see this required minimal changes. cc @shunting314

Before we land - maybe let's do a dashboard run for safety. this should be motivation for landing donated buffers and turning on #133368 cc @BoyuanFeng

eellison · 2024-10-23T22:05:16Z

torch/_inductor/scheduler.py

@@ -408,6 +408,12 @@ def decide_inplace_update(self) -> None:
        }

        ordered_reads = sorted(self.read_writes.reads, key=lambda x: x.name)
+        # NOTE remove V.graph.removed_operations once deps issue is fixed
+        inconsequential_nodes = (
+            (self.ancestors - {self.get_name()})


i'm surprised a node is an ancestor of itself...

I'm not sure if it's possible, do you know if the IR has been de-cycled when it gets to the scheduler? I added it defensibly in case there was a cycle, to prevent inplacing.

Pretty sure a node should not be an ancestor of itself - you can add an assertion, run tests, then remove

eellison · 2024-10-23T22:05:52Z

test/inductor/test_torchinductor.py

@@ -12544,6 +12544,29 @@ def fn(x):
            self.assertTrue("in_out_ptr" not in code)
            self.assertEqual(fn_opt(*inps), fn(*inps))

+        @config.patch(inplace_buffers=True)
+        def test_layer_norm_inplaces_after_matmul(self):


nit - can make test work for both devices by passing in self.device and using above

shunting314

Cool!

Would you share the generated in-place kernel?

Can we do some benchmarking on the toy examples regarding

memory saving
perf impact

shunting314 · 2024-10-23T23:51:19Z

test/inductor/test_torchinductor.py

@@ -12531,7 +12531,7 @@ def fn(x):
            FileCheck().check("copy_").check_same("True").run(code)

        @config.patch(inplace_buffers=True)


Can we remove these config overriding since 'inplace_buffers' is on by default

shunting314 · 2024-10-23T23:51:28Z

test/inductor/test_torchinductor.py

@@ -12544,6 +12544,29 @@ def fn(x):
            self.assertTrue("in_out_ptr" not in code)
            self.assertEqual(fn_opt(*inps), fn(*inps))

+        @config.patch(inplace_buffers=True)


exclamaforte · 2024-10-28T18:00:12Z

Looks like the perf impact is slightly negative 🤔

eellison · 2024-10-28T18:09:45Z

Is the base commit of your benchmarking run the same one you are comparing to ?

exclamaforte · 2024-10-28T18:14:52Z

Yeah unfortunately

eellison · 2024-10-28T18:17:49Z

In your screenshot there you're comparing to 96b not 86d. 86d shows speedup: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Mon%2C%2021%20Oct%202024%2018%3A16%3A04%20GMT&stopTime=Mon%2C%2028%20Oct%202024%2018%3A16%3A04%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&deviceName=cuda%20(a100)&lBranch=exclamaforte/inplacing-reductions&lCommit=ce0a7f711bcabdf888654eae312a0e692d0ad720&rBranch=main&rCommit=86d4b7d60b264cae5a04a1b20719bcd7a5752a4c

You can copy link to comparison with link here:

exclamaforte · 2024-10-28T18:24:10Z

Ah cool! Sorry, still getting used to the perf dashboard, so I got the order flipped. Anyways, good to merge on perf after tests pass?

shunting314 · 2024-10-28T18:38:56Z

Can you share a generated inplace kernel? In some odd situations, Inductor codegen an 'in_out_ptr' but does not really do load/store..

Summary: I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer. Implements: #132826 resolves: #138383

exclamaforte · 2024-11-04T23:30:29Z

@pytorchbot merge

pytorchmergebot · 2024-11-04T23:33:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer. Implements: #132826 resolves: #138383

pytorchmergebot · 2024-11-04T23:59:53Z

Merge failed

Reason: 1 jobs have failed, first few of them are: pull / linux-focal-py3.11-clang10 / test (default, 1, 5, linux.4xlarge)

Details for Dev Infra team

Raised by workflow job

exclamaforte · 2024-11-05T00:00:52Z

@pytorchbot merge

pytorchmergebot · 2024-11-05T00:02:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…138383) Summary: I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer. Implements: pytorch#132826 Test Plan: New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer. Pull Request resolved: pytorch#138383 Approved by: https://github.com/eellison

…ytorch#138383)" This reverts commit 8840889. Reverted pytorch#138383 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems to break trunk after landing ([comment](pytorch#138383 (comment)))

…138383) Summary: I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer. Implements: pytorch#132826 Test Plan: New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer. Pull Request resolved: pytorch#138383 Approved by: https://github.com/eellison

…ytorch#138383)" This reverts commit 030f70b. Reverted pytorch#138383 on behalf of https://github.com/huydhn due to Sorry for reverting this again, but I think it has a test failing internally and also on ROCm ([comment](pytorch#138383 (comment)))

…138383) Summary: I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer. Implements: pytorch#132826 Test Plan: New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer. Pull Request resolved: pytorch#138383 Approved by: https://github.com/eellison

pytorch-bot bot added ciflow/inductor module: inductor labels Oct 19, 2024

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch 2 times, most recently from 6a512a4 to 5a1d35a Compare October 19, 2024 01:39

exclamaforte commented Oct 19, 2024

View reviewed changes

torch/_inductor/scheduler.py Outdated Show resolved Hide resolved

exclamaforte changed the title ~~WIP Allow inplacing buffer when other users are inconsequential~~ Allow inplacing buffer when other users are inconsequential Oct 19, 2024

exclamaforte requested a review from eellison October 19, 2024 01:42

exclamaforte added release notes: inductor topic: performance topic category labels Oct 19, 2024

exclamaforte removed the request for review from eellison October 19, 2024 20:28

exclamaforte changed the title ~~Allow inplacing buffer when other users are inconsequential~~ WIP Allow inplacing buffer when other users are inconsequential Oct 19, 2024

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch 3 times, most recently from be4fc34 to 6d063e9 Compare October 22, 2024 19:19

exclamaforte requested a review from eellison October 22, 2024 22:24

exclamaforte changed the title ~~WIP Allow inplacing buffer when other users are inconsequential~~ Allow inplacing buffer when other users are inconsequential Oct 22, 2024

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch 2 times, most recently from 05c4937 to ed2f275 Compare October 23, 2024 17:45

eellison approved these changes Oct 23, 2024

View reviewed changes

eellison requested a review from shunting314 October 23, 2024 22:06

shunting314 reviewed Oct 24, 2024

View reviewed changes

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch 2 times, most recently from 51486ee to ce0a7f7 Compare October 25, 2024 19:26

pytorchmergebot reopened this Nov 2, 2024

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from 066b2d7 to a316a64 Compare November 4, 2024 19:02

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from a316a64 to f759d6b Compare November 4, 2024 22:03

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from f759d6b to a9b6b6e Compare November 4, 2024 22:07

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from a9b6b6e to 6be89ca Compare November 4, 2024 22:21

pytorchmergebot added the merging label Nov 4, 2024

exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from 6be89ca to e9fe5cc Compare November 4, 2024 23:59

pytorchmergebot removed the merging label Nov 4, 2024

pytorchmergebot added the merging label Nov 5, 2024

pytorchmergebot closed this in a766d84 Nov 5, 2024

pytorchmergebot removed the merging label Nov 5, 2024

github-actions bot deleted the exclamaforte/inplacing-reductions branch December 6, 2024 02:13

		@@ -12531,7 +12531,7 @@ def fn(x):
		FileCheck().check("copy_").check_same("True").run(code)

		@config.patch(inplace_buffers=True)

Allow inplacing buffer when other users are inconsequential #138383

Allow inplacing buffer when other users are inconsequential #138383

Uh oh!

Conversation

exclamaforte commented Oct 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138383

✅ No Failures

Uh oh!

Uh oh!

eellison left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

exclamaforte Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

eellison Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

eellison Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

shunting314 left a comment

Choose a reason for hiding this comment

Uh oh!

shunting314 Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

shunting314 Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

exclamaforte commented Oct 28, 2024

Uh oh!

eellison commented Oct 28, 2024

Uh oh!

exclamaforte commented Oct 28, 2024

Uh oh!

eellison commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

exclamaforte commented Oct 28, 2024

Uh oh!

shunting314 commented Oct 28, 2024

Uh oh!

exclamaforte commented Nov 4, 2024

Uh oh!

pytorchmergebot commented Nov 4, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 4, 2024

Merge failed

Uh oh!

exclamaforte commented Nov 5, 2024

Uh oh!

pytorchmergebot commented Nov 5, 2024

Merge started

Uh oh!

Uh oh!

exclamaforte commented Oct 19, 2024 •

edited

Loading

pytorch-bot bot commented Oct 19, 2024 •

edited

Loading

eellison left a comment •

edited

Loading

eellison commented Oct 28, 2024 •

edited

Loading