CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[AutoParallel] Add local view reshape and nd_mesh_alltoall reshard for moe #68187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
Sorry to inform you that 685f95d's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
return local_shape | ||
|
||
|
||
def infer_pos_shape(src_shape, tgt_shape): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pos是什么的缩写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
positive,改成 positive了
), "At most one -1 is allowed in target shape." | ||
|
||
nelem = np.prod(src_shape) | ||
ret_shape[neg_one_idx[0]] = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这行代码是冗余的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个方法用来推断输入shape里面的 -1 对应的值是多少。这行把 -1 设成 1 是为了下面 np.prod(ret_shape) 算出来是个正数。
if src_mesh == mesh or src_mesh.process_ids != mesh.process_ids: | ||
return False | ||
|
||
# only the mesh shapes are different, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种情况是否可以在dist_reshape中也加一个拦截报错,防止用户直接调用dist_reshape产生非预期行为。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -772,6 +777,17 @@ def reshard( | |||
if len(partial_dims) > 0: | |||
dist_attr._set_partial_dims(partial_dims) | |||
|
|||
alltoall_dim = _specific_alltoall_dim(dist_tensor, mesh, placements) | |||
if alltoall_dim is not None: | |||
# return _nd_mesh_alltoall_reshard(dist_tensor, mesh, placements, alltoall_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这行代码是否可以删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
" sharding strategies yet (i.e. [Shard(0), Shard(0)])", | ||
shard_dim, | ||
dim_map[shard_dim])); | ||
// PADDLE_ENFORCE_EQ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这行检查改成WARNING会不会好一些
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
New features
Description
Pcard-67164
All operations in auto parallel are identical to the serial case, i.e. all operations are global view. In MoE, the global view reshape brings additional communication.
This pr add local view reshape and nd_mesh_alltoall reshard to make the MoE training the same as original distributed training, to achieve higher performance.
Additional: add deepcopy function for placements.