[AOTI] Assert misaligned input #142136

desertfire · 2024-12-05T15:17:06Z

Stack from ghstack (oldest at bottom):

Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @chauhang @aakhundov

Differential Revision: D66881038

Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists. [ghstack-poisoned]

pytorch-bot · 2024-12-05T15:17:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142136

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4f8fd16 with merge base 91d3054 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov [ghstack-poisoned]

eellison

I don't really know that we should error here vs not error in torch.compile.. but this is still an improvement, and i think a rare case, given our lack of imas so far.

eellison · 2024-12-05T23:14:16Z

torch/_inductor/codegen/cpp_wrapper_gpu.py

+                    if ((long({name}.data_ptr()) & ({GPU_ALIGN_BYTES} -1)) != 0) {{
+                        throw std::runtime_error("{name} is not aligned to {GPU_ALIGN_BYTES} bytes");
+                    }}


Within inductor, we don't necessarily specialize on all inputs being aligned. We'll infer alignment from first invocation. It's taken from inputs_to_check.

eellison · 2024-12-05T23:15:14Z

torch/_inductor/codegen/cpp_wrapper_gpu.py

+        # JIT Inductor does not guard on input alignment. It relies on copy_misaligned_inputs to
+        # copy misaligned inputs to aligned buffers. For AOTInductor, we expect users to use it
+        # as non-Python deployment for its best performance, so implicitly copying misaligned inputs
+        # to aligned buffers is going to bring a surprising performance hit. Instead, we check input


i'm not convinced it's that much of a performance hit. a single copy is usually not that slow..

@chenyang78 , soliciting your opinion. Which one do you think makes more sense for production scenario, clone or assert?

Off the top of my head, I don't recall any production model with misaligned inputs. I slightly prefer adding checks rather than creating clones. If we hit any assertion failures in a production model, we can conduct a real benchmark to check the performance impact of using clones.

ezyang

Must not do this test on inputs that don't have alignment requirement

[ghstack-poisoned]

desertfire · 2024-12-06T15:17:00Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]

desertfire · 2024-12-06T15:58:39Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]

desertfire · 2024-12-06T17:33:51Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]

Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists. ghstack-source-id: bba4c68 Pull Request resolved: #142136

desertfire · 2024-12-08T01:05:11Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-12-08T15:05:39Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2024-12-08T15:07:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: Fixes pytorch#141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists. Differential Revision: [D66881038](https://our.internmc.facebook.com/intern/diff/D66881038) Pull Request resolved: pytorch#142136 Approved by: https://github.com/eellison, https://github.com/ezyang ghstack dependencies: pytorch#142133

Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. [ghstack-poisoned]

Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. ghstack-source-id: 76513df Pull Request resolved: #143236

Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov [ghstack-poisoned]

Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. ghstack-source-id: 989e60a Pull Request resolved: #143236

Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead. Differential Revision: [D67287922](https://our.internmc.facebook.com/intern/diff/D67287922) Pull Request resolved: #143236 Approved by: https://github.com/malfet, https://github.com/chenyang78

fxmarty-amd · 2025-06-24T16:21:43Z

+1 @desertfire, having dynamo/inductor silently copying inputs is kind of nuts, and from what I have seen in my workload it is a serious overhead for small compiled functions.

Is there a way exposed to users to assert (raise an error) instead of copy when using the high-level torch.compile?

ezyang · 2025-06-25T01:43:47Z

@fxmarty-amd yes, this should even be easy, not sure if there's already an issue, if there isn't one can you file us one

This was referenced Dec 5, 2024

[AOTI] Remove WrapperCodegen.expr_printer #141388

Closed

[AOIT] Remove several overloaded members from WrapperCodegen #141387

Closed

[AOTI] Refactor additional_files generation #141979

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Dec 5, 2024

This was referenced Dec 5, 2024

[AOTI] Refactor codegen_inputs in wrapper codegen #141965

Closed

[AOTI] Refactor codegen_inputs signature #142133

Closed

desertfire added topic: bug fixes topic category release notes: inductor labels Dec 5, 2024

desertfire requested review from eellison and ezyang December 5, 2024 21:02

eellison approved these changes Dec 5, 2024

View reviewed changes

ezyang requested changes Dec 6, 2024

View reviewed changes

Update

ba9271d

[ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2024

Update

2fbd578

[ghstack-poisoned]

Update

a08e6ed

[ghstack-poisoned]

ezyang approved these changes Dec 7, 2024

View reviewed changes

Update

4f8fd16

[ghstack-poisoned]

pytorchmergebot added the merging label Dec 8, 2024

pytorchmergebot added the Merged label Dec 8, 2024

pytorchmergebot closed this in 2c6d094 Dec 8, 2024

pytorchmergebot removed the merging label Dec 8, 2024

desertfire mentioned this pull request Dec 13, 2024

[AOTI] Relax input alignment assertion #143236

Closed

github-actions bot deleted the gh/desertfire/518/head branch January 8, 2025 02:05

[AOTI] Assert misaligned input #142136

[AOTI] Assert misaligned input #142136

Uh oh!

Conversation

desertfire commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142136

✅ No Failures

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

eellison Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

eellison Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

desertfire Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

chenyang78 Dec 8, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

desertfire commented Dec 6, 2024

Uh oh!

desertfire commented Dec 6, 2024

Uh oh!

desertfire commented Dec 6, 2024

Uh oh!

desertfire commented Dec 8, 2024

Uh oh!

facebook-github-bot commented Dec 8, 2024

Uh oh!

pytorchmergebot commented Dec 8, 2024

Merge started

Uh oh!

fxmarty-amd commented Jun 24, 2025

Uh oh!

ezyang commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

desertfire commented Dec 5, 2024 •

edited

Loading

pytorch-bot bot commented Dec 5, 2024 •

edited

Loading