CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Allow inplacing buffer when other users are inconsequential #138383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138383
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e9fe5cc with merge base e080c89 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
6a512a4
to
5a1d35a
Compare
be4fc34
to
6d063e9
Compare
05c4937
to
ed2f275
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! nice to see this required minimal changes. cc @shunting314
Before we land - maybe let's do a dashboard run for safety. this should be motivation for landing donated buffers and turning on #133368 cc @BoyuanFeng
torch/_inductor/scheduler.py
Outdated
@@ -408,6 +408,12 @@ def decide_inplace_update(self) -> None: | |||
} | |||
|
|||
ordered_reads = sorted(self.read_writes.reads, key=lambda x: x.name) | |||
# NOTE remove V.graph.removed_operations once deps issue is fixed | |||
inconsequential_nodes = ( | |||
(self.ancestors - {self.get_name()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm surprised a node is an ancestor of itself...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's possible, do you know if the IR has been de-cycled when it gets to the scheduler? I added it defensibly in case there was a cycle, to prevent inplacing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure a node should not be an ancestor of itself - you can add an assertion, run tests, then remove
@@ -12544,6 +12544,29 @@ def fn(x): | |||
self.assertTrue("in_out_ptr" not in code) | |||
self.assertEqual(fn_opt(*inps), fn(*inps)) | |||
|
|||
@config.patch(inplace_buffers=True) | |||
def test_layer_norm_inplaces_after_matmul(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - can make test work for both devices by passing in self.device and using above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Would you share the generated in-place kernel?
Can we do some benchmarking on the toy examples regarding
- memory saving
- perf impact
test/inductor/test_torchinductor.py
Outdated
@@ -12531,7 +12531,7 @@ def fn(x): | |||
FileCheck().check("copy_").check_same("True").run(code) | |||
|
|||
@config.patch(inplace_buffers=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove these config overriding since 'inplace_buffers' is on by default
test/inductor/test_torchinductor.py
Outdated
@@ -12544,6 +12544,29 @@ def fn(x): | |||
self.assertTrue("in_out_ptr" not in code) | |||
self.assertEqual(fn_opt(*inps), fn(*inps)) | |||
|
|||
@config.patch(inplace_buffers=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
51486ee
to
ce0a7f7
Compare
Is the base commit of your benchmarking run the same one you are comparing to ? |
In your screenshot there you're comparing to 96b not 86d. 86d shows speedup: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Mon%2C%2021%20Oct%202024%2018%3A16%3A04%20GMT&stopTime=Mon%2C%2028%20Oct%202024%2018%3A16%3A04%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&deviceName=cuda%20(a100)&lBranch=exclamaforte/inplacing-reductions&lCommit=ce0a7f711bcabdf888654eae312a0e692d0ad720&rBranch=main&rCommit=86d4b7d60b264cae5a04a1b20719bcd7a5752a4c You can copy link to comparison with link here: ![]() |
Ah cool! Sorry, still getting used to the perf dashboard, so I got the order flipped. Anyways, good to merge on perf after tests pass? |
Can you share a generated inplace kernel? In some odd situations, Inductor codegen an 'in_out_ptr' but does not really do load/store.. |
066b2d7
to
a316a64
Compare
a316a64
to
f759d6b
Compare
f759d6b
to
a9b6b6e
Compare
a9b6b6e
to
6be89ca
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
6be89ca
to
e9fe5cc
Compare
Merge failedReason: 1 jobs have failed, first few of them are: pull / linux-focal-py3.11-clang10 / test (default, 1, 5, linux.4xlarge) Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…138383) Summary: I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer. Implements: pytorch#132826 Test Plan: New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer. Pull Request resolved: pytorch#138383 Approved by: https://github.com/eellison
…ytorch#138383)" This reverts commit 8840889. Reverted pytorch#138383 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems to break trunk after landing ([comment](pytorch#138383 (comment)))
…138383) Summary: I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer. Implements: pytorch#132826 Test Plan: New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer. Pull Request resolved: pytorch#138383 Approved by: https://github.com/eellison
…ytorch#138383)" This reverts commit 030f70b. Reverted pytorch#138383 on behalf of https://github.com/huydhn due to Sorry for reverting this again, but I think it has a test failing internally and also on ROCm ([comment](pytorch#138383 (comment)))
…138383) Summary: I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer. Implements: pytorch#132826 Test Plan: New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer. Pull Request resolved: pytorch#138383 Approved by: https://github.com/eellison
Summary:
I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer.
Implements:
#132826
Test Plan:
New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov