CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[PyTorch] Port ExecuTorch bfdot improvement back to ATen BlasKernel #136331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes not-yet-reviewed pytorch/executorch#5444 . Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136331
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 1 Unrelated FailureAs of commit 63d29fa with merge base 99eb47f ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D63045939 |
ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes not-yet-reviewed pytorch/executorch#5444 . Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/) ghstack-source-id: 243596406 Pull Request resolved: #136331
Note that this code is gated off under C10_MOBILE, but we have ExecuTorch for that anyway, right? |
c10/macros/Macros.h
Outdated
#if defined(_MSC_VER) | ||
#define C10_ALWAYS_INLINE_ATTRIBUTE | ||
#elif __has_attribute(always_inline) || defined(__GNUC__) | ||
#define C10_ALWAYS_INLINE_ATTRIBUTE __attribute__((__always_inline__)) | ||
#else | ||
#define C10_ALWAYS_INLINE_ATTRIBUTE | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move it to a separate PR? Also, why is this needed?(any perf numbers) And why exclude clang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?(any perf numbers)
not having it prevents ForcedUnroll from working properly under -Oz. See pytorch/executorch#5247 .
why exclude clang
clang defines __GNUC__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate PR
done. #136445
…lasKernel" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes not-yet-reviewed pytorch/executorch#5444 . Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D63045939 |
…lasKernel" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes not-yet-reviewed pytorch/executorch#5444 . Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D63045939 |
Pull Request resolved: #136331 ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes not-yet-reviewed pytorch/executorch#5444 . ghstack-source-id: 244187352 Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/)
…lasKernel" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes pytorch/executorch#5444 . Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D63045939 |
Pull Request resolved: #136331 ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes pytorch/executorch#5444 . ghstack-source-id: 245455630 Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure look very much relevant!
@@ -301,7 +302,7 @@ static constexpr auto kF16RegistersPerIterationShift = kF16ElementsPerIterationS | |||
static constexpr auto kF16RegistersPerIteration = 1 << kF16RegistersPerIterationShift; | |||
static_assert(kF16RegistersPerIteration == kF16ElementsPerIteration / kF16ElementsPerRegister); | |||
|
|||
static inline double reduce(float16x8_t x[kF16RegistersPerIteration]) { | |||
static inline float reduce(float16x8_t x[kF16RegistersPerIteration]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't this done on purpose to preserve accuracy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @malfet asked for it. perhaps I got confused though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in any case, the numerical error on macos arm seem relevant and needs investigation right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in any case, the numerical error on macos arm seem relevant and needs investigation right?
yes, it's fixed. I accidentally replaced the float16 kernel with the bfloat16 kernel due to name changes
…lasKernel" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes pytorch/executorch#5444 . Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D63045939 |
Pull Request resolved: #136331 ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes pytorch/executorch#5444 . ghstack-source-id: 245903000 Differential Revision: [D63045939](https://our.internmc.facebook.com/intern/diff/D63045939/)
I will probably reapply this and just disable the BF16 option for GCC. |
If I add below code:
And modified
So the issue here might me like in function Correct me if I am wrong. |
It should be fine to inline a not-specific-to-bf16 function in a specific-to-bf16 function. In fact, clang does it correctly. I would prefer not to duplicate code to work around compiler bugs. |
it will be slightly ugly, but we can put the body of the one function we'd need to duplicate in a macro so as not to punish people for using ARM GCC. I will try that. |
…Try #2 ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…Try #2 ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) ghstack-source-id: 246411194 Pull Request resolved: #137377
#137377 is the second attempt |
…t back to ATen BlasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…lasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…t back to ATen BlasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…lasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…Try #2 Pull Request resolved: #137377 ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . ghstack-source-id: 246616406 Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/)
…t back to ATen BlasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…lasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…fdot improvement back to ATen BlasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
… back to ATen BlasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…vement back to ATen BlasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…Ten BlasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…Try #2 Pull Request resolved: #137377 ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . ghstack-source-id: 246956192 Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/)
…t back to ATen BlasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…lasKernel, Try #2" ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
…Try #2 (#137377) ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) Pull Request resolved: #137377 Approved by: https://github.com/malfet
…Try #2 ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was #136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. Supersedes #127488 . Includes pytorch/executorch#5444 .
Differential Revision: D63045939