CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[SOT][PIR] Mark item
to breakgraph && add unittest of tensor array related APIs
#72631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: 为什么不把 create_array
、array_write
、array_read
、array_length
放到 SIR?
实际上,这些 API 严格上不是「动静统一」的,它们在纯动态图、纯静态图/AST 动转静模式下确实是逻辑自洽的,纯动态图操作的都是 list[Tensor]
,纯静态图则都是 TensorArray
,但在动静混合的情况下就不是这样了。
首先,SOT 随时可能发生打断,其中任何一个 API 都可能在动态图/静态图跑,一种简单的 case,一旦 create_array
在动态图跑,而 array_write
在静态图跑,那就会因为前者创建了 list[Tensor]
,导致后者预期类型不匹配,本应是 TensorArray
,但却不是,就会挂掉,反之亦然
其实即便 create_array
和 array_write
都在静态图跑也会有问题,一旦中间发生了打断,那么 create_array
的结果必然会作为前一个子图的输出,而动态图下是没有 TensorArray
的,也就是说 TensorArray
并不能作为子图输出,从这一点上来看,这些 API 在 SOT 下是绝对不可以跑静态图分支的,因此目前决定这些 API 仍然跑动态图分支(通过模拟的方式)
TODO:
- 我们模拟的时候是否可以感知到
TensorArray
的存在,一旦发现TensorArray
,就在子图边界将其转换为list[Tensor]
?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #72631 +/- ##
==========================================
Coverage ? 96.55%
==========================================
Files ? 1
Lines ? 29
Branches ? 0
==========================================
Hits ? 28
Misses ? 1
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
item
to breakgraph && add unitest of array* apiitem
to breakgraph && add unittest of tensor array related APIs
item
to breakgraph && add unittest of tensor array related APIsitem
to breakgraph && add unittest of tensor array related APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -124,6 +124,7 @@ def _get_tensor_methods(): | |||
'numpy', | |||
'clear_gradient', | |||
'tolist', | |||
'item', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO:后续可以探索非打断的,输出是 SymbolicVariable
的实现
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# NOTE: This is to maintain consistency with the original code. | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: 这部分提前 return
主要是与之前保持一致
如果不加这个 early return
会导致 test_asgd_op
PIR + 混合精度模式下返回 NaN
"Test pir graph mode"
output1_pir = self.pir_asgd_mp(mp=True) # <--- 第一个结果正确,后续4个结果都为 NaN
output2_pir = self.pir_asgd_mp(mp=False)
for idx in range(5):
np.testing.assert_allclose(
output1_pir[idx].astype('float32'),
output2_pir[idx].astype('float32'),
rtol=1e-05,
atol=0.1,
)
PR Category
Execute Infrastructure
PR Types
Bug fixes
Description
PCard-66972