[Inductor][CPP] Fix issue in CPP GEMM Template Prune Tensor #141798

leslie-fang-intel · 2024-11-29T08:27:53Z

Stack from ghstack (oldest at bottom):

-> [Inductor][CPP] Fix issue in CPP GEMM Template Prune Tensor #141798

Summary
When addressing issue #134998, we will verify if any node in the current graph shares the same storage as the node we intend to prune. In the implementation, we assumed that when creating the GraphLowering in post-grad phase, there would be no submodules, and all get_attr nodes would correspond to a torch.Tensor. However, this assumption proves incorrect when enabling FlexAttention. In this scenario, submodules are present as get_attr node in post-grad phase. For example:

V1128 23:23:47.071000 1965794 torch/_inductor/compile_fx.py:875] [0/1] [__post_grad_graphs]     class sdpa_score30(torch.nn.Module):
V1128 23:23:47.071000 1965794 torch/_inductor/compile_fx.py:875] [0/1] [__post_grad_graphs]         def forward(self, arg0_1: "bf16[][]cpu", arg1_1: "i32[][]cpu", arg2_1: "i32[][]cpu", arg3_1: "i32[][]cpu", arg4_1: "i32[][]cpu"):
V1128 23:23:47.071000 1965794 torch/_inductor/compile_fx.py:875] [0/1] [__post_grad_graphs]             return arg0_1
V1128 23:23:45.482000 1965794 torch/_inductor/freezing.py:118] [0/1]         sdpa_score30 = self.sdpa_score30
V1128 23:23:45.482000 1965794 torch/_inductor/freezing.py:118] [0/1]         sdpa_mask30 = self.sdpa_mask30
V1128 23:23:45.482000 1965794 torch/_inductor/freezing.py:118] [0/1]         flex_attention_30 = torch.ops.higher_order.flex_attention(add_276, index_put_60, index_put_61, sdpa_score30, (_frozen_param293, _frozen_param295, _frozen_param296, _frozen_param297, _frozen_param298, _frozen_param299, _frozen_param300, _frozen_param301, 64, 64, sdpa_mask30), 0.08838834764831843, {'SKIP_MASK_SCORE': True, 'PRESCALE_QK': False, 'ROWS_GUARANTEED_SAFE': False, 'BLOCKS_ARE_CONTIGUOUS': False, 'OUTPUT_LOGSUMEXP': False}, (), (_frozen_param294,));  add_276 = sdpa_score30 = sdpa_mask30 = None
V1128 23:23:45.482000 1965794 torch/_inductor/freezing.py:118] [0/1]         getitem_60: "bf16[1, 32, 1, 128]" = flex_attention_30[0];  flex_attention_30 = None

We added an extra check in the implementation to ensure only comparing the get_attr node with torch.Tensor. It is difficult to reproduce this issue using pure high-order operators. Adding a unit test after #141453 lands would be more straightforward.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-11-29T08:27:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141798

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 8eb1f98 with merge base b7a45db ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-focal-rocm6.2-py3.10 / test (distributed, 1, 1, linux.rocm.gpu) (gh) (similar failure)
##[error]Process completed with exit code 1.

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141703)
convnext_base
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

leslie-fang-intel · 2024-11-29T08:29:05Z

cc @jianan-gu

[ghstack-poisoned]

ghstack-source-id: c5d7b1a Pull Request resolved: #141798

leslie-fang-intel · 2024-12-02T07:31:24Z

@pytorchbot merge

pytorchmergebot · 2024-12-02T07:33:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…141798) **Summary** When addressing [issue pytorch#134998](pytorch#134998), we will verify if any node in the current graph shares the same storage as the node we intend to prune. In the implementation, we assumed that when creating the `GraphLowering` in post-grad phase, there would be no `submodules`, and all `get_attr` nodes would correspond to a `torch.Tensor`. However, this assumption proves incorrect when enabling `FlexAttention`. In this scenario, `submodules` are present as `get_attr` node in post-grad phase. For example: ``` V1128 23:23:47.071000 1965794 torch/_inductor/compile_fx.py:875] [0/1] [__post_grad_graphs] class sdpa_score30(torch.nn.Module): V1128 23:23:47.071000 1965794 torch/_inductor/compile_fx.py:875] [0/1] [__post_grad_graphs] def forward(self, arg0_1: "bf16[][]cpu", arg1_1: "i32[][]cpu", arg2_1: "i32[][]cpu", arg3_1: "i32[][]cpu", arg4_1: "i32[][]cpu"): V1128 23:23:47.071000 1965794 torch/_inductor/compile_fx.py:875] [0/1] [__post_grad_graphs] return arg0_1 V1128 23:23:45.482000 1965794 torch/_inductor/freezing.py:118] [0/1] sdpa_score30 = self.sdpa_score30 V1128 23:23:45.482000 1965794 torch/_inductor/freezing.py:118] [0/1] sdpa_mask30 = self.sdpa_mask30 V1128 23:23:45.482000 1965794 torch/_inductor/freezing.py:118] [0/1] flex_attention_30 = torch.ops.higher_order.flex_attention(add_276, index_put_60, index_put_61, sdpa_score30, (_frozen_param293, _frozen_param295, _frozen_param296, _frozen_param297, _frozen_param298, _frozen_param299, _frozen_param300, _frozen_param301, 64, 64, sdpa_mask30), 0.08838834764831843, {'SKIP_MASK_SCORE': True, 'PRESCALE_QK': False, 'ROWS_GUARANTEED_SAFE': False, 'BLOCKS_ARE_CONTIGUOUS': False, 'OUTPUT_LOGSUMEXP': False}, (), (_frozen_param294,)); add_276 = sdpa_score30 = sdpa_mask30 = None V1128 23:23:45.482000 1965794 torch/_inductor/freezing.py:118] [0/1] getitem_60: "bf16[1, 32, 1, 128]" = flex_attention_30[0]; flex_attention_30 = None ``` We added an extra check in the implementation to ensure only comparing the `get_attr` node with `torch.Tensor`. It is difficult to reproduce this issue using pure high-order operators. Adding a unit test after pytorch#141453 lands would be more straightforward. Pull Request resolved: pytorch#141798 Approved by: https://github.com/jgong5

Update

d44ced7

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 29, 2024

leslie-fang-intel requested review from jgong5 and chunyuan-w November 29, 2024 08:28

leslie-fang-intel added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Nov 29, 2024

pytorchbot added the open source label Nov 29, 2024

Update

8eb1f98

[ghstack-poisoned]

leslie-fang-intel added a commit that referenced this pull request Nov 29, 2024

[Inductor][CPP] Fix issue in CPP GEMM Template Prune Tensor

4f808a6

ghstack-source-id: c5d7b1a Pull Request resolved: #141798

jgong5 approved these changes Dec 2, 2024

View reviewed changes

pytorchmergebot added the merging label Dec 2, 2024

pytorchmergebot added the Merged label Dec 2, 2024

pytorchmergebot closed this in 96d2a51 Dec 2, 2024

pytorchmergebot removed the merging label Dec 2, 2024

github-actions bot deleted the gh/leslie-fang-intel/166/head branch January 2, 2025 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor][CPP] Fix issue in CPP GEMM Template Prune Tensor #141798

[Inductor][CPP] Fix issue in CPP GEMM Template Prune Tensor #141798

Uh oh!

leslie-fang-intel commented Nov 29, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 29, 2024 •

edited

Loading

Uh oh!

leslie-fang-intel commented Nov 29, 2024

Uh oh!

leslie-fang-intel commented Dec 2, 2024

Uh oh!

pytorchmergebot commented Dec 2, 2024

Uh oh!

Uh oh!

[Inductor][CPP] Fix issue in CPP GEMM Template Prune Tensor #141798

[Inductor][CPP] Fix issue in CPP GEMM Template Prune Tensor #141798

Uh oh!

Conversation

leslie-fang-intel commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141798

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

leslie-fang-intel commented Nov 29, 2024

Uh oh!

leslie-fang-intel commented Dec 2, 2024

Uh oh!

pytorchmergebot commented Dec 2, 2024

Merge started

Uh oh!

Uh oh!

leslie-fang-intel commented Nov 29, 2024 •

edited

Loading

pytorch-bot bot commented Nov 29, 2024 •

edited

Loading