CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[dynamo] implement framelocals mapping as c++ object #140063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
đź”— Helpful Linksđź§Ş See artifacts and rendered test results at hud.pytorch.org/pr/140063
Note: Links to docs will display an error until the docs builds have been completed. âś… You can merge normally! (1 Unrelated Failure)As of commit c01e3fd with merge base 7ab3177 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
#endif | ||
|
||
void framelocals_mapping_free(FrameLocalsMapping* map); | ||
|
||
// Borrowed reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you say borrowed, but above you use py::reinterpret_steal
which seems wrong if this is borrowed.
|
||
typedef struct VISIBILITY_HIDDEN FrameLocalsMapping { | ||
private: | ||
std::unordered_map<std::string, PyObject*> _map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a std::unordered_map
actually much faster than a Python dict?
The thing I was suggesting in #93753 was replacing:
f_locals["foo"]
with
fastlocals[3]
where the conversion to figure out that the variable "foo"
is stored at index 3
can be done ahead of time. Thus making the lookup just a single load instruction (rather than a hashtable lookup).
Since both PyDict and unordered_map are hashtables, I'd expect similar performance.
You should compute the fastlocals offset ahead of time when the guard object is constructed, and store it on the guard. So when we evaluate the guard you can replace the hashtable lookup with an array lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could use pybind11 to do this casting for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there is a guarantee on the ordering of the fastlocals names (but in practice, this may be true). The safe thing to do would be to add a check that fastlocals name order is the same.
I'd also think that a C++ unordered has less overhead than using Python dict, although I'd have to measure to be sure. The point of this PR is not to introduce the complete optimization, but rather, is a step to moving framelocals/fastlocals lookup to C++. I plan on incrementally introducing optimizations in followup PRs. For this PR, we just need to make sure there are no regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there is a guarantee on the ordering of the fastlocals names (but in practice, this may be true). The safe thing to do would be to add a check that fastlocals name order is the same.
Isn't the order defined on the code object which is a constant? How could the order change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I hadn't considered that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think directly using fastlocals will actually be easier to implement than a unordered_map. This also matches what CPython does, I believe:
LOAD_FAST 3
is just push(fastlocals[3])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above.
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
TODO: update PR description Benchmark update: - 713.2us -> 630.0us (3.10) - 598.8us -> 530.7us (3.12) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Move frame local guard accessors to C++. Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible. Some implementation notes: - `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells. - `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily. - We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed. - We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. dynamo_guard_eval.py microbenchmark update: - 713.2us -> 630.0us (3.10) - 598.8us -> 530.7us (3.12) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
@@ -616,9 +617,15 @@ class GuardManagerType(enum.Enum): | |||
DICT_SUBCLASS_GUARD_MANAGER = 3 | |||
|
|||
|
|||
@functools.lru_cache(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a limit here?
framelocals_names_reversed = code_framelocals_names_reversed_cached( | ||
self.f_code | ||
) | ||
framelocals_idx = ( | ||
len(framelocals_names_reversed) | ||
- framelocals_names_reversed.index(source.local_name) | ||
- 1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the first one return a dict so you can compute framelocals_idx in O(1) time?
ghstack-source-id: f1f02aa Pull Request resolved: pytorch/pytorch#140063
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Implements pytorch/pytorch#93753 - move frame local guard accessors to C++. Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible. Some implementation notes: - `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells. - `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily. - We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed. - We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up) dynamo_guard_eval.py microbenchmark update: - 713.2us -> 630.0us (3.10) - 598.8us -> 530.7us (3.12) Other followups: - Add `FrameLocalsMapping` version for `check_verbose_nopybind` in order to match behavior between `check_nopybind` and `check_verbose_nopybind`. This can prevent difficult debugging situations where guards fail (`check_nopybind` returns false) but no guard error message is generated (`check_verbose_nopybind` succeeds). - Rewrite the `SHAPE_ENV` guard into C++ - it is a fairly common guard that results in `FrameLocalsMapping` needing to convert to a dict X-link: pytorch/pytorch#140063 Approved by: https://github.com/jansel ghstack dependencies: #142117, #142430 Reviewed By: huydhn Differential Revision: D67355697 Pulled By: williamwen42 fbshipit-source-id: 930855f49db629b2bc2d7b7ff14c065bd044e894
Stack from ghstack (oldest at bottom):
Implements #93753 - move frame local guard accessors to C++.
Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible.
Some implementation notes:
FrameLocalsMapping
is now initialized as a C++ vector ofPyObject
s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells.FrameLocalsMapping
can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily.LeafGuard
,GuardAccessor
, andGuardManager
'scheck_nopybind
methods to acceptFrameLocalsMapping
. By default, we convert theFrameLocalsMapping
to a Python dict and run the originalcheck_nopybind
on it, but in some cases, conversion is not needed.FrameLocalsGuardAccessor
, which is similar toDictGetItemGuardAccessor
but has special handling forFrameLocalsMapping
. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up)dynamo_guard_eval.py microbenchmark update:
Other followups:
FrameLocalsMapping
version forcheck_verbose_nopybind
in order to match behavior betweencheck_nopybind
andcheck_verbose_nopybind
. This can prevent difficult debugging situations where guards fail (check_nopybind
returns false) but no guard error message is generated (check_verbose_nopybind
succeeds).SHAPE_ENV
guard into C++ - it is a fairly common guard that results inFrameLocalsMapping
needing to convert to a dictcc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames