CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[AOTI] Fix a two-pass kernel missmatch #141041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. [ghstack-poisoned]
Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. ghstack-source-id: e431769 Pull Request resolved: #141041
đź”— Helpful Linksđź§Ş See artifacts and rendered test results at hud.pytorch.org/pr/141041
Note: Links to docs will display an error until the docs builds have been completed. âť— 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: âś… No FailuresAs of commit cbe5448 with merge base b379a28 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov Differential Revision: [D66203298](https://our.internmc.facebook.com/intern/diff/D66203298) [ghstack-poisoned]
Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. ghstack-source-id: 4fbeb19 Pull Request resolved: #141041
@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Fixes pytorch#140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. Differential Revision: [D66203298](https://our.internmc.facebook.com/intern/diff/D66203298) Pull Request resolved: pytorch#141041 Approved by: https://github.com/shunting314
Summary: Fixes pytorch#140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. Differential Revision: [D66203298](https://our.internmc.facebook.com/intern/diff/D66203298) Pull Request resolved: pytorch#141041 Approved by: https://github.com/shunting314
Stack from ghstack (oldest at bottom):
Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @chauhang @aakhundov
Differential Revision: D66203298