Add scaling arguments to bsr_dense_addmm #136104

pearu · 2024-09-14T21:06:21Z

As in the title.

Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413

The PR assumes that the existing tuning parameters are good also when using scaling arguments. This needs to be verified as a follow-up task.

Also, this PR redefines triton-contiguous tensors: the tensor must have strides not larger than 1. This will now allow zero strides that previously triggered contiguous call although the underlying memory buffer was contiguous.

Re: "a considerable slow-down occurs because tensor data is copied element-wise rather than chunk-wise" - this note should refer to a code (torch or triton?) that implements the element/chunk-wise copy so that we could verify that allowing zero strides indeed would not trigger element-wise copies. Atm, the performance increase in ViT-H benchmarks (that involve using 0 strides) is an evidence that allowing zero strides does not lead to slow-downs.

Stack from ghstack (oldest at bottom):

-> Add scaling arguments to bsr_dense_addmm #136104

[ghstack-poisoned]

pytorch-bot · 2024-09-14T21:06:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136104

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 825a557 with merge base 357b7fb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

As in the title. Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413 [ghstack-poisoned]

As in the title. Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413 The PR assumes that the existing tuning parameters are good also when using scaling arguments. This needs to be verified as a follow-up task. [ghstack-poisoned]

As in the title. Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413 The PR assumes that the existing tuning parameters are good also when using scaling arguments. This needs to be verified as a follow-up task. Also, this PR redefines triton-contiguous tensors: the tensor must have strides not larger than 1. This will now allow zero strides that previously triggered `contiguous` call although the underlying memory buffer was contiguous. Re: "a considerable slow-down occurs because tensor data is copied element-wise rather than chunk-wise" - this note should refer to a code (torch or triton?) that implements the element/chunk-wise copy so that we could verify that allowing zero strides indeed would not trigger element-wise copies. Atm, the performance increase in ViT-H benchmarks (that involve using 0 strides) is an evidence that allowing zero strides does not lead to slow-downs. [ghstack-poisoned]

ghstack-source-id: aee915e Pull Request resolved: #136104

pearu · 2024-09-16T20:19:30Z

@pytorchbot merge

pytorchmergebot · 2024-09-16T20:21:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

As in the title. Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413 The PR assumes that the existing tuning parameters are good also when using scaling arguments. This needs to be verified as a follow-up task. Also, this PR redefines triton-contiguous tensors: the tensor must have strides not larger than 1. This will now allow zero strides that previously triggered `contiguous` call although the underlying memory buffer was contiguous. Re: "a considerable slow-down occurs because tensor data is copied element-wise rather than chunk-wise" - this note should refer to a code (torch or triton?) that implements the element/chunk-wise copy so that we could verify that allowing zero strides indeed would not trigger element-wise copies. Atm, the performance increase in ViT-H benchmarks (that involve using 0 strides) is an evidence that allowing zero strides does not lead to slow-downs. Pull Request resolved: pytorch#136104 Approved by: https://github.com/cpuhrsch

Add scaling arguments to bsr_dense_addmm

3ab1081

[ghstack-poisoned]

pytorch-bot bot added the release notes: sparse release notes category label Sep 14, 2024

pearu mentioned this pull request Sep 14, 2024

int8 dynamic quant + bsr support pytorch/ao#821

Merged

pearu added open source topic: new features topic category labels Sep 14, 2024

pearu added 2 commits September 15, 2024 11:08

Update on "Add scaling arguments to bsr_dense_addmm"

10e7959

As in the title. Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413 [ghstack-poisoned]

pearu added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 15, 2024

pearu requested review from amjames and cpuhrsch September 15, 2024 17:32

pearu added a commit that referenced this pull request Sep 16, 2024

Add scaling arguments to bsr_dense_addmm

06ec3d7

ghstack-source-id: aee915e Pull Request resolved: #136104

pearu mentioned this pull request Sep 16, 2024

Add blocksparse_int_addmm. Eliminate unnecessary contiguous calls which leads to performance increase. pytorch/ao#891

Draft

cpuhrsch approved these changes Sep 16, 2024

View reviewed changes

pytorchmergebot added the merging label Sep 16, 2024

pytorchmergebot added the Merged label Sep 16, 2024

pytorchmergebot closed this in b76d1b7 Sep 16, 2024

pytorchmergebot removed the merging label Sep 16, 2024

github-actions bot deleted the gh/pearu/135/head branch October 17, 2024 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add scaling arguments to bsr_dense_addmm #136104

Add scaling arguments to bsr_dense_addmm #136104

Uh oh!

pearu commented Sep 14, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 14, 2024 •

edited

Loading

Uh oh!

pearu commented Sep 16, 2024

Uh oh!

pytorchmergebot commented Sep 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add scaling arguments to bsr_dense_addmm #136104

Add scaling arguments to bsr_dense_addmm #136104

Uh oh!

Conversation

pearu commented Sep 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136104

✅ No Failures

Uh oh!

pearu commented Sep 16, 2024

Uh oh!

pytorchmergebot commented Sep 16, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pearu commented Sep 14, 2024 •

edited

Loading

pytorch-bot bot commented Sep 14, 2024 •

edited

Loading