[AutoParallel] Support pipeline parallelism forward non-computation clip. #58126

GhostScreaming · 2023-10-16T13:07:58Z

PR types

New features

PR changes

Others

Description

Pcard-73145

支持流水线并行前向的非计算rank计算裁剪。动半架构中，每一张卡的组网相同，流水线并行需要在PHI API这一层，根据设备rank区分当前Layer是否需要计算。不需要计算的Layer会跳过。

在PHI API层，DistTensor会进行切分信息推导（InferSPMD）、Global Shape推导（InferMeta）、Reshard Input、Launch Kernel等操作。由于Reshard模块的设计，ChooseProperReshardFunction需要用到input的dist_attr（具体需要用到process_mesh、dims_mapping及partial_status信息），仍需执行InferSPMD来推导正确的dist_attr。InferSPMD需要Input的真实shape，这依赖InferMeta的shape推导（后续需要推全InferMeta正确性，当前matmul和elementwise的shape推导绝大情况是正确的，必须要launch kernel才能得到正确shape的op需要特殊处理）。目前PHI API实现非计算rank裁剪，只执行InferSPMD、InferMeta、创建空DistTensor操作。

InferMeta：如上所述，目前有两种情况需要处理：Op的InferMeta无法推导正确shape（正确的shape推导可能放在kernel里），必须launch kernel才能得到正确shape（如bincount算子）。前者需要推全算子InferMeta函数的实现正确性，后者出现情况较少，一种兜底方案是：这类算子通过手动插入paddle.distributed.reshard，或者在PHI API层面对这类算子生成特殊的reshard逻辑，首先从正确设备获得Input的dist_attr，然后退化成所有设备均执行这类算子，得到的Output释放显存，仅保留其shape信息。其它算子保持非计算rank裁剪的逻辑。
创建空Tensor：当前动半的流水线并行实现不考虑性能，只需要保证能跑起来，不爆显存。由于dist_attr和global_shape信息需要在同设备的不同层间流动，需要创建一个uninitialized的DistTensor。它的显存为0，包含正确的dist_attr和global_shape信息。
Tensor的dtype和layout：由于PHI API设计时默认Input Tensor都是defined的，持有dtype、layout和place信息。对于reshard，它执行send/recv操作时，需要用到paddle.distributed.reshard的input的dtype来选择正确的kernel，layout也需要不同层间的正确传递（构建反向图的时候也有多处用到这三个属性）。目前的做法是，DistTensor在InferMeta时，会设置DistTensor.value的dtype和layout。这里可能会重复设置（local DenseTensor也会进行InferMeta）。
Tensor的place：place的处理会复杂一些，因为它是取的DistTensor.value()->holder_->place()。对于没有分配显存的DistTensor，place显然是无法获得的，目前处理方法是，PHI API对于不需要计算的算子，构建一个空的DistTensor，它的value会用空显存的holder初始化。其他情况下，如果DistTensor.value没有分配holder，可以获取默认的place。

示例代码如下：

import numpy as np
import paddle
import paddle.distributed as dist
from paddle import nn
class PPDemoNet(nn.Layer):
    def __init__(self, np_w0, np_w1, mesh0, mesh1):
        super().__init__()
        self.replicate_dist_attr0 = dist.DistAttr(
            mesh=dist.ProcessMesh([0], dim_names=["x"]),
            sharding_specs=[None, None]
        )
        self.replicate_dist_attr1 = dist.DistAttr(
            mesh=dist.ProcessMesh([1], dim_names=["x"]),
            sharding_specs=[None, None]
        )
        np_w0 = np.random.random([784, 784]).astype('float32')
        self.w0 = dist.shard_tensor(
            self.create_parameter(
                shape=[784, 784],
                attr=paddle.framework.ParamAttr(
                    name="pp_demo_weight_0",
                    initializer=paddle.nn.initializer.Assign(np_w0),
                ),
            ),
            dist_attr=self.replicate_dist_attr0,
        )
        np_w1 = np.random.random([784, 10]).astype('float32')
        self.w1 = dist.shard_tensor(
            self.create_parameter(
                shape=[784, 10],
                attr=paddle.framework.ParamAttr(
                    name="pp_nemo_weight_1",
                    initializer=paddle.nn.initializer.Assign(np_w1),
                ),
            ),
            dist_attr=self.replicate_dist_attr1,
        )
    def forward(self, x):
        y = paddle.matmul(x, self.w0)
        y = dist.reshard(y, dist_attr=self.replicate_dist_attr1)
        z = paddle.matmul(y, self.w1)
        return z

…nd DistTensor.

… pythonc_inner_shard

…lip.

LiYuRio · 2023-10-18T06:25:37Z

paddle/phi/api/lib/data_transform.cc

@@ -613,8 +613,10 @@ std::string ReshardDebugInfo(
    const phi::distributed::DistTensor& src_tensor,
    const phi::distributed::TensorDistAttr& dist_attr) {
  std::stringstream sstream;
+  phi::DDim local_dims =
+      src_tensor.defined() ? src_tensor.local_dims() : phi::DDim();


现在reshard是没有被跳过吗

reshard被跳过了，当前修改已删除。thx~

LiYuRio · 2023-10-18T06:26:57Z

paddle/phi/api/yaml/generator/dist_api_gen.py

+# """
+#     if (!computation_clip_for_pp) {{
+#         using kernel_signature = {};
+#         auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>();
+#         (*kernel_fn)({}, {});
+#     }} else {{
+#         {}


这里后面打算怎么处理？

已删除。thx~

chenwhql

能否在PR描述中附一个生成后的API函数体示例

chenwhql · 2023-10-18T06:30:12Z

paddle/phi/api/yaml/generator/dist_api_gen.py

 """
+# """


这里code还有用吗？

已删除，thx~

chenwhql · 2023-10-18T06:34:27Z

paddle/phi/core/distributed/auto_parallel/dist_tensor.h

+  // DenseTensor, such as under pipeline parallel. But grad node construction
+  // needs its place, we need to assign its place to DistTensor. It's in
+  // accordance with its value's place as long as its value is initialized.
+  Place place_;


nullpter的Allocation也可以取place，可以再看下增加place是否必要

之前SetGradOutMeta关于DistTensor.place()的报错，是因为paddle.distributed.reshard执行完Send后，返回的DistTensor是undefined，已修复。

chenwhql · 2023-10-18T06:35:59Z

test/auto_parallel/semi_auto_parallel_simple_net.py

+        )
+
+    def forward(self, x):
+        y = paddle.matmul(


已经支持混合输入了，只不是参数shard_tensor就可以了，这里还需要shard吗

chenwhql

PR描述关于增加place的部分可以更新一下

LiYuRio · 2023-10-19T08:51:52Z

test/auto_parallel/semi_auto_parallel_simple_net.py

+            self.create_parameter(
+                shape=[IMAGE_SIZE, IMAGE_SIZE],
+                attr=paddle.framework.ParamAttr(
+                    name="pp_demo_weight_0",


需要参考这个pr修改一下吗

ok，我在PR 58238一起给改了

…lip. (PaddlePaddle#58126) * [AutoParallel] Support operators have mixed inputs like DenseTensor and DistTensor. * Polish code with review comments. * [AutoParallel] Support pipeline parallelism forward non-computation clip. * Polish code. * Fix some problem. * Fix some compilation problems. * Fix problem of compilation problem. * Fix compilation problem. * Polish code. Remove place property of DistTensor. * Fix problem of multi initialization cross-mesh send/recv.

GhostScreaming added 9 commits September 26, 2023 18:19

[AutoParallel] Support operators have mixed inputs like DenseTensor a…

013ff44

…nd DistTensor.

Polish code with review comments.

90ec330

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

42e7c83

… pythonc_inner_shard

[AutoParallel] Support pipeline parallelism forward non-computation c…

b5afe2f

…lip.

Polish code.

8b567be

Fix some problem.

078de8f

Fix some compilation problems.

0962a43

Fix problem of compilation problem.

0398119

Fix compilation problem.

1d7849c

LiYuRio reviewed Oct 18, 2023

View reviewed changes

chenwhql reviewed Oct 18, 2023

View reviewed changes

GhostScreaming added 2 commits October 18, 2023 21:17

Polish code. Remove place property of DistTensor.

350afb0

Fix problem of multi initialization cross-mesh send/recv.

ffcf702

chenwhql approved these changes Oct 19, 2023

View reviewed changes

LiYuRio reviewed Oct 19, 2023

View reviewed changes

LiYuRio approved these changes Oct 19, 2023

View reviewed changes

GhostScreaming merged commit 24acfdb into PaddlePaddle:develop Oct 19, 2023

This was referenced Oct 27, 2023

[AutoParallel] Support pipeline parallelism backward non-computation clip. #58449

Merged

[AutoParallel] Support pipeline parallelism backward non-computation clip. #58609

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoParallel] Support pipeline parallelism forward non-computation clip. #58126

[AutoParallel] Support pipeline parallelism forward non-computation clip. #58126

Uh oh!

GhostScreaming commented Oct 16, 2023 •

edited

Loading

Uh oh!

LiYuRio Oct 18, 2023

Uh oh!

GhostScreaming Oct 18, 2023

Uh oh!

LiYuRio Oct 18, 2023

Uh oh!

GhostScreaming Oct 18, 2023

Uh oh!

chenwhql left a comment

Uh oh!

chenwhql Oct 18, 2023

Uh oh!

GhostScreaming Oct 18, 2023

Uh oh!

chenwhql Oct 18, 2023

Uh oh!

GhostScreaming Oct 18, 2023

Uh oh!

chenwhql Oct 18, 2023

Uh oh!

GhostScreaming Oct 18, 2023

Uh oh!

chenwhql left a comment

Uh oh!

LiYuRio Oct 19, 2023 •

edited

Loading

Uh oh!

GhostScreaming Oct 19, 2023 •

edited

Loading

Uh oh!

Uh oh!

[AutoParallel] Support pipeline parallelism forward non-computation clip. #58126

[AutoParallel] Support pipeline parallelism forward non-computation clip. #58126

Uh oh!

Conversation

GhostScreaming commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenwhql left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenwhql left a comment

Choose a reason for hiding this comment

Uh oh!

LiYuRio Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GhostScreaming Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GhostScreaming commented Oct 16, 2023 •

edited

Loading

LiYuRio Oct 19, 2023 •

edited

Loading

GhostScreaming Oct 19, 2023 •

edited

Loading