[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136858

masnesral · 2024-09-27T14:25:01Z

Stack from ghstack (oldest at bottom):

-> [inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136858

This is a retry of #136594, which is having trouble landing.

Summary: We have an internal report of a Triton compiler error ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048] coming from a line like this:

tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK])

#135260 is the cause, presumably because we turn a constant into a 1-element tensor with: (tl.full([1], const, tl.float64)). It looks like changing the syntax to (tl.full([], const, tl.float64)) gives us what we want?

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Differential Revision: D63540693

…ead of 1-element tensor This is a retry of #136594, which is having trouble landing. Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK])` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? [ghstack-poisoned]

pytorch-bot · 2024-09-27T14:25:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136858

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit e9b2c2f with merge base a2d2a30 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100) (gh) (similar failure)
moco

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ead of 1-element tensor This is a retry of #136594, which is having trouble landing. Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK])` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? ghstack-source-id: 141efbd Pull Request resolved: #136858

masnesral · 2024-09-27T14:26:02Z

@masnesral has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

atalman

lgtm

atalman · 2024-09-27T15:12:14Z

@pytorchmergebot merge -f "Tests already executed, lint is green"

pytorchmergebot · 2024-09-27T15:14:01Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 27, 2024

masnesral requested a review from atalman September 27, 2024 14:25

masnesral added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Sep 27, 2024

atalman approved these changes Sep 27, 2024

View reviewed changes

masnesral mentioned this pull request Sep 27, 2024

[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136594

Closed

pytorchmergebot added the merging label Sep 27, 2024

pytorchmergebot closed this in 45a8b56 Sep 27, 2024

pytorchmergebot added Merged and removed merging labels Sep 27, 2024

github-actions bot deleted the gh/masnesral/119/head branch October 28, 2024 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136858

[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136858

Uh oh!

masnesral commented Sep 27, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 27, 2024 •

edited

Loading

Uh oh!

masnesral commented Sep 27, 2024

Uh oh!

atalman left a comment

Uh oh!

atalman commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136858

[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136858

Uh oh!

Conversation

masnesral commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136858

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

masnesral commented Sep 27, 2024

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

atalman commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

masnesral commented Sep 27, 2024 •

edited

Loading

pytorch-bot bot commented Sep 27, 2024 •

edited

Loading