CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[inductor] [cpp] fix max-autotune for single-thread dynamic shapes #136418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136418
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f0e56a7 with merge base e9bfbf7 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
512, | ||
1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just using 512
would be good enough? I'm concerned about the added test time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to just keep 512
.
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…no epilogue_nodes (#136518) The `template_buffer_has_other_users` function checks the case where there're epilogue nodes and the template output has users other than these epilogue nodes. When there's no epilogue nodes, the function could return `False` directly. Pull Request resolved: #136518 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #136418
…ytorch#136418) Fixes the compilation error of max-autotune for `maml_omniglot` (AMP and FP32) and `soft_actor_critic` (AMP) in Torchbench for single-thread dynamic shapes case: ``` /tmp/torchinductor_user/uv/cuvq6wenwp7us423onuvntkfx4cspmagha5beiknob7tiebzhupa.cpp: In function ‘void kernel(const bfloat16*, const bfloat16*, const bfloat16*, bfloat16*, int64_t)’: /tmp/torchinductor_user/uv/cuvq6wenwp7us423onuvntkfx4cspmagha5beiknob7tiebzhupa.cpp:279:41: error: the value of ‘Mr_blocks’ is not usable in a constant expression 279 | constexpr int64_t m_block_end = Mr_blocks; | ^~~~~~~~~ /tmp/torchinductor_user/uv/cuvq6wenwp7us423onuvntkfx4cspmagha5beiknob7tiebzhupa.cpp:237:19: note: ‘Mr_blocks’ was not initialized with a constant expression 237 | const int64_t Mr_blocks = (M + Mr - 1) / Mr; | ^~~~~~~~~ ``` The PR also updates the UT to add a test for `BS`=512 in single thread. The previous case has `BS`=1024 equal to the `K` and `N` value. The generated code does not have symbolic shapes thus fails to capture the above issue. By adding a case of `BS`=512, the generated code will have symbolic shape for the M dim and is able to reproduce the issue that this PR is addressing. Pull Request resolved: pytorch#136418 Approved by: https://github.com/jgong5
…no epilogue_nodes (pytorch#136518) The `template_buffer_has_other_users` function checks the case where there're epilogue nodes and the template output has users other than these epilogue nodes. When there's no epilogue nodes, the function could return `False` directly. Pull Request resolved: pytorch#136518 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: pytorch#136418
Fixes the max-autotune failure of `soft_actor_critic` of Torchbench in FP32 single thread dynamic shape case: ```log File "/home/user/inductor/pytorch/torch/_inductor/codegen/cpp_micro_gemm.py", line 136, in codegen_call C_ptr = f"&({kernel.index(C, [0, 0])})" File "/home/user/inductor/pytorch/torch/_inductor/codegen/cpp_template_kernel.py", line 135, in index else self.args.input(node.get_name()) File "/home/user/inductor/pytorch/torch/_inductor/codegen/common.py", line 1251, in input assert name not in V.graph.removed_buffers, name AssertionError: buf_GemmOut ``` The 1st and 2nd linear does not need to use local buffer while the 3rd linear needs to use local buffer. The 3rd linear which uses local buffer will add its global buffer (named as `buf_GemmOut`) into `V.graph.removed_buffers`. When scheduling the nodes, the 1st linear (won't use local buffer) will get its output buffer (also named as `buf_GemmOut`) from the input and found that it's in the `V.graph.removed_buffers` and raise AssertionError. The issue is that the output buffer of all these linears are all names with `buf_GemmOut`, which have a conflict. Rename these buffers by adding the name of the `template_buffer` as the prefix. Pull Request resolved: #136419 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #136418, #136518
Stack from ghstack (oldest at bottom):
Fixes the compilation error of max-autotune for
maml_omniglot
(AMP and FP32) andsoft_actor_critic
(AMP) in Torchbench for single-thread dynamic shapes case:The PR also updates the UT to add a test for
BS
=512 in single thread.The previous case has
BS
=1024 equal to theK
andN
value. The generated code does not have symbolic shapes thus fails to capture the above issue.By adding a case of
BS
=512, the generated code will have symbolic shape for the M dim and is able to reproduce the issue that this PR is addressing.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang