CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[SOT][Faster Guard] implement faster lookup #71994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a faster guard lookup mechanism by introducing a guard tree approach in the SOT executor. Key changes include:
- Adding new make_faster_guard methods decorated with check_faster_guard across variables.
- Modifying the executor and executor cache to propagate and utilize guard_nodes alongside guard_fn.
- Removing the legacy make_faster_guard helper from the guard module and splitting guard-related properties in the function graph.
Reviewed Changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
python/paddle/jit/sot/opcode_translator/executor/variables/iter.py | Added make_faster_guard method with check_faster_guard decorator. |
python/paddle/jit/sot/opcode_translator/executor/variables/container.py | Introduced make_faster_guard methods for container variables. |
python/paddle/jit/sot/opcode_translator/executor/variables/callable.py | Added make_faster_guard implementations for callable variables. |
python/paddle/jit/sot/opcode_translator/executor/variables/basic.py | Implemented make_faster_guard across several basic variable classes. |
python/paddle/jit/sot/opcode_translator/executor/opcode_executor.py | Propagated and stored guard_nodes in addition to guard_fn. |
python/paddle/jit/sot/opcode_translator/executor/guard.py | Removed the legacy make_faster_guard function. |
python/paddle/jit/sot/opcode_translator/executor/function_graph.py | Split guard properties into guard_nodes and guard_fn along with event registration. |
python/paddle/jit/sot/opcode_translator/executor/executor_cache.py | Updated caching logic to include guard_nodes and ensure consistency of lookup. |
Files not reviewed (2)
- paddle/fluid/pybind/jit.cc: Language not supported
- paddle/fluid/pybind/sot/guards.h: Language not supported
Comments suppressed due to low confidence (3)
python/paddle/jit/sot/opcode_translator/executor/executor_cache.py:182
- The assertion assumes that the guard_tree lookup always returns the exact index of the matching guard function. Consider handling mismatches more gracefully instead of relying on a strict assertion, which may lead to unexpected cache lookup failures.
assert (index == cache_index), "cache_index is not equal to index"
python/paddle/jit/sot/opcode_translator/executor/executor_cache.py:338
- Using a dummy GuardNode along with a TODO comment suggests a placeholder implementation. It's recommended to implement a proper zero-expression constructor for GuardNode or explicitly handle this fallback case to avoid potential runtime issues.
paddle.framework.core.GuardNode( paddle.framework.core.DummyGuard(), paddle.framework.core.ExprNode(), )
python/paddle/jit/sot/opcode_translator/executor/variables/basic.py:300
- [nitpick] Consider providing a descriptive error message in the NotImplementedError (e.g., including the class name) to maintain consistency and aid debugging.
raise NotImplementedError
04341c1
to
096124c
Compare
d9f51cf
to
9339a27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
paddle/fluid/pybind/jit.cc:189
- The py::init binding for ExternVarExprNode appears to be missing a closing parenthesis after listing the constructor arguments. Please verify the syntax to ensure the binding compiles correctly.
py::class_<ExternVarExprNode, ExprNode, std::shared_ptr<ExternVarExprNode>>(
29ae43c
to
eac2656
Compare
return False | ||
|
||
def need_guard(self) -> bool: | ||
# TODO(zrr1999): to implement gen_instructions and trace_value_from_frame | ||
# TODO(zrr1999): implemented in #68555 | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SymbolicOperationTracker
相关的后面我会看看怎么实现
BinaryOperatorTracker
会在后续 PR 移除
if paddle.framework.use_pir_api(): | ||
# TODO(zrr1999): Better log | ||
log(0, "[Warning] Guard Tree Only support PIR") | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在我们会在老 IR 跑 make_faster_guard
么?直接 assert 可以吗?如果前两天动转静单测有这个问题,现在可能没了,因为前两个 PR 已经将 SOT+PT 的单测去掉了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看来我应该再拖两天再改
paddle.framework.core.AttributeExprNode( | ||
expr_node, "stop_gradient" | ||
) | ||
], | ||
), | ||
# TODO(zrr1999): add dist_info check | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里后面直接用 TensorMetaMatchGuard
应该就不用在这里考虑 dist_info
了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if not enable_strict_guard and enable_guard_tree: | ||
guard_tree = paddle.framework.core.GuardTree(guard_nodes_list) | ||
cache_index = guard_tree.lookup(frame) | ||
if cache_index is not None: | ||
# TODO(zrr1999): add a mapping between custom_code and cache_index | ||
return guarded_fns[cache_index][0] | ||
|
||
for custom_code, guard_fn in guarded_fns: | ||
else: | ||
if enable_strict_guard: | ||
mirror_guard_error = None | ||
try: | ||
with EventGuard("try mirror guard"): | ||
mirror_guard_result = guard_fn.mirror_guard(frame) | ||
except Exception as e: | ||
log(2, f"[Cache] Mirror guard error: {e}\n") | ||
mirror_guard_error = e | ||
guard_tree = paddle.framework.core.GuardTree(guard_nodes_list) | ||
cache_index = guard_tree.lookup(frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块逻辑有点复杂呀,为啥不是用 enable_guard_tree or enable_strict_guard
判断是否要跑 guard_tree.lookup
,然后用是否有 enable_strict_guard
判断是否直接返回呢?感觉不需要让后续代码整体加一个缩进层级
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有道理
eac2656
to
1bc1c05
Compare
奇怪 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Category
Execute Infrastructure
PR Types
Performance
Description
实现了guard tree版本的lookup