CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[AOTI] Assert misaligned input #142136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AOTI] Assert misaligned input #142136
Conversation
Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists. [ghstack-poisoned]
π Helpful Linksπ§ͺ See artifacts and rendered test results at hud.pytorch.org/pr/142136
Note: Links to docs will display an error until the docs builds have been completed. β No FailuresAs of commit 4f8fd16 with merge base 91d3054 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know that we should error here vs not error in torch.compile.. but this is still an improvement, and i think a rare case, given our lack of imas so far.
if ((long({name}.data_ptr()) & ({GPU_ALIGN_BYTES} -1)) != 0) {{ | ||
throw std::runtime_error("{name} is not aligned to {GPU_ALIGN_BYTES} bytes"); | ||
}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within inductor, we don't necessarily specialize on all inputs being aligned. We'll infer alignment from first invocation. It's taken from inputs_to_check
.
# JIT Inductor does not guard on input alignment. It relies on copy_misaligned_inputs to | ||
# copy misaligned inputs to aligned buffers. For AOTInductor, we expect users to use it | ||
# as non-Python deployment for its best performance, so implicitly copying misaligned inputs | ||
# to aligned buffers is going to bring a surprising performance hit. Instead, we check input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not convinced it's that much of a performance hit. a single copy is usually not that slow..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenyang78 , soliciting your opinion. Which one do you think makes more sense for production scenario, clone or assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head, I don't recall any production model with misaligned inputs. I slightly prefer adding checks rather than creating clones. If we hit any assertion failures in a production model, we can conduct a real benchmark to check the performance impact of using clones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must not do this test on inputs that don't have alignment requirement
@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists. ghstack-source-id: bba4c68 Pull Request resolved: #142136
@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Fixes pytorch#141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists. Differential Revision: [D66881038](https://our.internmc.facebook.com/intern/diff/D66881038) Pull Request resolved: pytorch#142136 Approved by: https://github.com/eellison, https://github.com/ezyang ghstack dependencies: pytorch#142133
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. [ghstack-poisoned]
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. ghstack-source-id: 76513df Pull Request resolved: #143236
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov [ghstack-poisoned]
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov [ghstack-poisoned]
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. ghstack-source-id: 989e60a Pull Request resolved: #143236
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. Differential Revision: [D67287922](https://our.internmc.facebook.com/intern/diff/D67287922) Pull Request resolved: #143236 Approved by: https://github.com/malfet, https://github.com/chenyang78
+1 @desertfire, having dynamo/inductor silently copying inputs is kind of nuts, and from what I have seen in my workload it is a serious overhead for small compiled functions. Is there a way exposed to users to assert (raise an error) instead of copy when using the high-level |
@fxmarty-amd yes, this should even be easy, not sure if there's already an issue, if there isn't one can you file us one |
Stack from ghstack (oldest at bottom):
Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @chauhang @aakhundov
Differential Revision: D66881038