CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
【AutoParallism】Support semi auto amp #61221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【AutoParallism】Support semi auto amp #61221
Conversation
@@ -44,9 +44,9 @@ if((WITH_GPU) AND (LINUX)) | |||
endif() | |||
if((WITH_GPU) AND (LINUX)) | |||
py_test_modules( | |||
test_semi_auto_parallel_llama_model_vpp MODULES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete vpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
单测应该加到test/auto_parallel/hybrid_strategy/testslist.csv中,然后调用脚本生成该文件。原始的vpp单测是手动添加的,不会再CI中跑,而且这个单测忠慧反馈有问题。后续修改后再添加上。
@@ -56,7 +56,7 @@ def check_results( | |||
# the number of operators of this type +2 | |||
self.assertEqual( | |||
int(op_list['transfer_dtype'].split(',')[0]), | |||
total_steps + total_steps * 2 + 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
将gradient_accumulator.cc
中的phi::AddKernel<T, CONTEXT>(*cpu_ctx, src_tensor, *dst_tensor, dst_tensor); \
修改为phi::AddKernel<T, CONTEXT>(*cpu_ctx, *dst_tensor, src_tensor,dst_tensor); \
后,在master_grad
中的cast变少,经过和张婷确认,这里修改是正确的。
scaler (paddle.amp.GradScaler): The GradScaler to be sharded. | ||
|
||
Returns: | ||
An GradScaler with distributed view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An -> a
from paddle.base import unique_name | ||
from paddle.base.dygraph import to_variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use to_tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// initialize output dist_attr's process_mesh, batch_dim and dynamic dims with | ||
// input dist_attr. | ||
TensorDistAttr out_dist_attr = | ||
CopyTensorDistAttrWithPartialForOutput(x_dist_attr_src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u can reuse the ElementwiseUnary
function except this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ElementwiseUnary
会清除 partial
状态,但是在cast
中可以将partial状态流转。
|
||
if len(amp_global_state().mesh2params): | ||
for _, params in amp_global_state().mesh2params.items(): | ||
core.eager.set_master_grads(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz comment why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
temp_found_inf = dist.reshard( | ||
temp_found_inf, src_mesh, temp_found_inf.placements | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz comment why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
单测删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for shard_scale docs
请提供中文文档~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for API changes
PR types
New features
PR changes
Others
Description
Pcard-76459
支持动半的AMP策略。本PR为支持动半AMP主要修改如下:
shard_scaler
接口,作为动半scaler的接口auto_cast
中的涉及到的master_grad逻辑,支持动半TensorAdd
的逻辑错误,使得可以正确适用"half+float32"的场景cast
的SPMD规则,使其可以传递partial状态__hash__
成员,便于统计不同PP下的grad