CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[AutoParallel] Support pipeline parallelism forward non-computation clip. #58126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel] Support pipeline parallelism forward non-computation clip. #58126
Conversation
… pythonc_inner_shard
paddle/phi/api/lib/data_transform.cc
Outdated
@@ -613,8 +613,10 @@ std::string ReshardDebugInfo( | |||
const phi::distributed::DistTensor& src_tensor, | |||
const phi::distributed::TensorDistAttr& dist_attr) { | |||
std::stringstream sstream; | |||
phi::DDim local_dims = | |||
src_tensor.defined() ? src_tensor.local_dims() : phi::DDim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在reshard是没有被跳过吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reshard被跳过了,当前修改已删除。thx~
# """ | ||
# if (!computation_clip_for_pp) {{ | ||
# using kernel_signature = {}; | ||
# auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>(); | ||
# (*kernel_fn)({}, {}); | ||
# }} else {{ | ||
# {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里后面打算怎么处理?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除。thx~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能否在PR描述中附一个生成后的API函数体示例
""" | ||
# """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里code还有用吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除,thx~
// DenseTensor, such as under pipeline parallel. But grad node construction | ||
// needs its place, we need to assign its place to DistTensor. It's in | ||
// accordance with its value's place as long as its value is initialized. | ||
Place place_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullpter的Allocation也可以取place,可以再看下增加place是否必要
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前SetGradOutMeta
关于DistTensor.place()
的报错,是因为paddle.distributed.reshard
执行完Send
后,返回的DistTensor
是undefined,已修复。
) | ||
|
||
def forward(self, x): | ||
y = paddle.matmul( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经支持混合输入了,只不是参数shard_tensor就可以了,这里还需要shard吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thx~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR描述关于增加place的部分可以更新一下
self.create_parameter( | ||
shape=[IMAGE_SIZE, IMAGE_SIZE], | ||
attr=paddle.framework.ParamAttr( | ||
name="pp_demo_weight_0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要参考这个pr修改一下吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,我在PR 58238一起给改了
…lip. (PaddlePaddle#58126) * [AutoParallel] Support operators have mixed inputs like DenseTensor and DistTensor. * Polish code with review comments. * [AutoParallel] Support pipeline parallelism forward non-computation clip. * Polish code. * Fix some problem. * Fix some compilation problems. * Fix problem of compilation problem. * Fix compilation problem. * Polish code. Remove place property of DistTensor. * Fix problem of multi initialization cross-mesh send/recv.
…lip. (PaddlePaddle#58126) * [AutoParallel] Support operators have mixed inputs like DenseTensor and DistTensor. * Polish code with review comments. * [AutoParallel] Support pipeline parallelism forward non-computation clip. * Polish code. * Fix some problem. * Fix some compilation problems. * Fix problem of compilation problem. * Fix compilation problem. * Polish code. Remove place property of DistTensor. * Fix problem of multi initialization cross-mesh send/recv.
…lip. (PaddlePaddle#58126) * [AutoParallel] Support operators have mixed inputs like DenseTensor and DistTensor. * Polish code with review comments. * [AutoParallel] Support pipeline parallelism forward non-computation clip. * Polish code. * Fix some problem. * Fix some compilation problems. * Fix problem of compilation problem. * Fix compilation problem. * Polish code. Remove place property of DistTensor. * Fix problem of multi initialization cross-mesh send/recv.
PR types
New features
PR changes
Others
Description
Pcard-73145
支持流水线并行前向的非计算rank计算裁剪。动半架构中,每一张卡的组网相同,流水线并行需要在PHI API这一层,根据设备rank区分当前Layer是否需要计算。不需要计算的Layer会跳过。
在PHI API层,
DistTensor
会进行切分信息推导(InferSPMD
)、Global Shape推导(InferMeta
)、Reshard Input、Launch Kernel等操作。由于Reshard模块的设计,ChooseProperReshardFunction
需要用到input
的dist_attr
(具体需要用到process_mesh
、dims_mapping
及partial_status
信息),仍需执行InferSPMD
来推导正确的dist_attr
。InferSPMD
需要Input
的真实shape,这依赖InferMeta
的shape推导(后续需要推全InferMeta
正确性,当前matmul
和elementwise
的shape推导绝大情况是正确的,必须要launch kernel才能得到正确shape的op需要特殊处理)。目前PHI API实现非计算rank裁剪,只执行InferSPMD、InferMeta、创建空DistTensor操作。InferMeta
函数的实现正确性,后者出现情况较少,一种兜底方案是:这类算子通过手动插入paddle.distributed.reshard
,或者在PHI API层面对这类算子生成特殊的reshard逻辑,首先从正确设备获得Input的dist_attr
,然后退化成所有设备均执行这类算子,得到的Output释放显存,仅保留其shape信息。其它算子保持非计算rank裁剪的逻辑。dist_attr
和global_shape
信息需要在同设备的不同层间流动,需要创建一个uninitialized
的DistTensor
。它的显存为0,包含正确的dist_attr
和global_shape
信息。dtype
和layout
:由于PHI API设计时默认Input Tensor都是defined
的,持有dtype
、layout
和place
信息。对于reshard,它执行send/recv操作时,需要用到paddle.distributed.reshard
的input的dtype
来选择正确的kernel,layout
也需要不同层间的正确传递(构建反向图的时候也有多处用到这三个属性)。目前的做法是,DistTensor
在InferMeta
时,会设置DistTensor.value
的dtype
和layout
。这里可能会重复设置(local DenseTensor
也会进行InferMeta
)。place
:place
的处理会复杂一些,因为它是取的DistTensor.value()->holder_->place()
。对于没有分配显存的DistTensor
,place
显然是无法获得的,目前处理方法是,PHI API对于不需要计算的算子,构建一个空的DistTensor,它的value会用空显存的holder初始化。其他情况下,如果DistTensor.value
没有分配holder
,可以获取默认的place。示例代码如下: