dist.to_static support pir program #62560

zhiqiu · 2024-03-08T09:15:28Z

PR types

New features

PR changes

APIs

Description

dist.to_static support pir program
Pcard-76459

In the original static auto-parallel process, the dynamic model is converted into serial program without DistTensors first, and the the dist attributes are added to the program.

However, the new static auto-parallel implementation will infer the spmd info during build operation, so the inputs and parameters in Dynamic model which are DistTensors should be converted to static DistDenseTensorType natively and eagerly.

This PR handles the inputs which is sharded by dataloader, the parameters will be handled in the next pr.

For example, given the follow DemoNet, call shard_dataloader to make

class DemoNet(nn.Layer):
    def __init__(self, mesh):
        super().__init__()
        self._mesh = mesh
        self.linear_0 = nn.Linear(IMAGE_SIZE, IMAGE_SIZE)
        self.linear_1 = nn.Linear(IMAGE_SIZE, CLASS_NUM)
        self.relu = nn.ReLU()
        # shard the weights of this layer
        self.linear_0.weight = dist.shard_tensor(
            self.linear_0.weight,
            self._mesh,
            [Shard(1)],
            stop_gradient=False,
        )
        self.linear_1.weight = dist.shard_tensor(
            self.linear_1.weight,
            self._mesh,
            [Shard(0)],
            stop_gradient=False,
        )
    def forward(self, x):
        out = self.linear_0(x)
        out = self.relu(out)
        out = self.linear_1(out)
        return out
mesh = dist.ProcessMesh([0, 1], dim_names=["x"])
layer = DemoNet(mesh)
opt = paddle.optimizer.SGD(
    learning_rate=0.1, parameters=layer.parameters()
)
loss_fn = nn.MSELoss()
loader = create_data_loader()
dist_loader = dist.shard_dataloader(loader, meshes=[mesh])
dist_model = dist.to_static(layer, dist_loader, loss_fn, opt)
main_program = dist_model._engine._fwd_main_progs["train"]

The serial main program is mixed with DenseTensor and DistDenseTensor.

{
    (%0) = "builtin.parameter" () {is_persistable:[true],parameter_name:"parameter_3",stop_gradient:[false]} : () -> builtin.tensor<8xf32>
    (%1) = "builtin.parameter" () {is_persistable:[true],parameter_name:"parameter_2",stop_gradient:[false]} : () -> builtin.tensor<16x8xf32>
    (%2) = "builtin.parameter" () {is_persistable:[true],parameter_name:"parameter_1",stop_gradient:[false]} : () -> builtin.tensor<16xf32>
    (%3) = "builtin.parameter" () {is_persistable:[true],parameter_name:"parameter_0",stop_gradient:[false]} : () -> builtin.tensor<16x16xf32>
    (%4) = "pd_op.data" () {dtype:(pd_op.DataType)float32,name:"input0",place:(pd_op.Place)Place(undefined:0),shape:(pd_op.IntArray)[4,16],stop_gradient:[true]} : () -> pd_dist.tensor<4x16xf32, mesh: {shape: [2], process_ids: [0,1], dim_names: [x]}, dims_mappings: [-1,-1]>
    (%5) = "pd_op.data" () {dtype:(pd_op.DataType)float32,name:"label0",place:(pd_op.Place)Place(undefined:0),shape:(pd_op.IntArray)[4,8],stop_gradient:[true]} : () -> pd_dist.tensor<4x8xf32, mesh: {shape: [2], process_ids: [0,1], dim_names: [x]}, dims_mappings: [-1,-1]>
    (%6) = "pd_op.matmul" (%4, %3) {stop_gradient:[false],transpose_x:false,transpose_y:false} : (pd_dist.tensor<4x16xf32, mesh: {shape: [2], process_ids: [0,1], dim_names: [x]}, dims_mappings: [-1,-1]>, builtin.tensor<16x16xf32>) -> builtin.tensor<4x16xf32>
    (%7) = "pd_op.add" (%6, %2) {stop_gradient:[false]} : (builtin.tensor<4x16xf32>, builtin.tensor<16xf32>) -> builtin.tensor<4x16xf32>
    (%8) = "pd_op.relu" (%7) {stop_gradient:[false]} : (builtin.tensor<4x16xf32>) -> builtin.tensor<4x16xf32>
    (%9) = "pd_op.matmul" (%8, %1) {stop_gradient:[false],transpose_x:false,transpose_y:false} : (builtin.tensor<4x16xf32>, builtin.tensor<16x8xf32>) -> builtin.tensor<4x8xf32>
    (%10) = "pd_op.add" (%9, %0) {stop_gradient:[false]} : (builtin.tensor<4x8xf32>, builtin.tensor<8xf32>) -> builtin.tensor<4x8xf32>
    (%11) = "pd_op.subtract" (%10, %5) {stop_gradient:[false]} : (builtin.tensor<4x8xf32>, pd_dist.tensor<4x8xf32, mesh: {shape: [2], process_ids: [0,1], dim_names: [x]}, dims_mappings: [-1,-1]>) -> builtin.tensor<4x8xf32>
    (%12) = "pd_op.square" (%11) {stop_gradient:[false]} : (builtin.tensor<4x8xf32>) -> builtin.tensor<4x8xf32>
    (%13) = "pd_op.mean" (%12) {axis:(pd_op.IntArray)[],keepdim:false,stop_gradient:[false]} : (builtin.tensor<4x8xf32>) -> builtin.tensor<f32>
}

Currently, the inputs of model, input0 and label0, are DistDenseTensor.
TODO: MAKE parameters be DistDenseTensors.

paddle-bot · 2024-03-08T09:15:33Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

winter-wang · 2024-03-08T09:34:44Z

paddle/fluid/pir/dialect/distributed/ir/dist_dialect.cc

@@ -37,18 +39,45 @@ void DistDialect::initialize() {
 void DistDialect::PrintType(pir::Type type, std::ostream &os) const {
  if (auto dist_dense_tensor_type = type.dyn_cast<DistDenseTensorType>()) {
    // Todo: Design the dist dense tensor type print format.
-    os << dist_dense_tensor_type.dense_tensor_type();
+    os << type.dialect().name();


Suggested change

os << type.dialect().name();

os << name();

Mostlyt, its equal, but I wonder if there is a type which is not belong to dist dialect?

JZ-LIANG · 2024-03-08T12:12:42Z

python/paddle/jit/dy2static/function_spec.py

+                    )
+
+                    if isinstance(var_spec, DistributedInputSpec):
+                        dist_dense_tensor_type = paddle.base.libpaddle.pir.create_dist_dense_tensor_type_by_dense_tensor(


to change to use shard_tensor python API in future when this API is adapted for PIR. @hitywt

JZ-LIANG · 2024-03-08T12:52:25Z

test/auto_parallel/pir/test_to_static_pir_program.py

+        main_program = dist_model._engine._fwd_main_progs["train"]
+        for op in main_program.global_block().ops:
+            tensor = op.result(0)
+            if op.name() == 'pd_op.data':


enable check for "builtin.parameter" after shard_tensor API is adapted for PIR. @hitywt

JZ-LIANG · 2024-03-08T12:53:11Z

test/auto_parallel/pir/test_to_static_pir_program.py

+                self.assertEqual(tensor.process_mesh.process_ids, [0, 1])
+                self.assertEqual(tensor.dims_mapping, [-1, -1])
+                self.assertEqual(tensor.partial_dims, set())
+            else:


enable check for all other forward computation op after build is adapted for disttensor. @winter-wang

JZ-LIANG

LGTM

* auto_parallel engine build pir program * skip prepare_op_amp_options in build_program * add ut * fix cmake * remove print

zhiqiu added 3 commits March 8, 2024 15:50

auto_parallel engine build pir program

d403642

skip prepare_op_amp_options in build_program

21df6fd

add ut

d10360b

winter-wang reviewed Mar 8, 2024

View reviewed changes

JZ-LIANG reviewed Mar 8, 2024

View reviewed changes

zhiqiu added 2 commits March 8, 2024 21:02

fix cmake

3711801

remove print

7d1fc17

JZ-LIANG approved these changes Mar 9, 2024

View reviewed changes

JZ-LIANG merged commit bc56513 into PaddlePaddle:develop Mar 9, 2024

co63oc pushed a commit to co63oc/Paddle that referenced this pull request Mar 10, 2024

dist.to_static support pir program (PaddlePaddle#62560)

82eba3f

* auto_parallel engine build pir program * skip prepare_op_amp_options in build_program * add ut * fix cmake * remove print

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Mar 11, 2024

dist.to_static support pir program (PaddlePaddle#62560)

01f21eb

* auto_parallel engine build pir program * skip prepare_op_amp_options in build_program * add ut * fix cmake * remove print

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Mar 11, 2024

dist.to_static support pir program (PaddlePaddle#62560)

23d7a5a

* auto_parallel engine build pir program * skip prepare_op_amp_options in build_program * add ut * fix cmake * remove print

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dist.to_static support pir program #62560

dist.to_static support pir program #62560

Uh oh!

zhiqiu commented Mar 8, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented Mar 8, 2024

Uh oh!

winter-wang Mar 8, 2024

Uh oh!

zhiqiu Mar 8, 2024

Uh oh!

JZ-LIANG Mar 8, 2024

Uh oh!

zhiqiu Mar 8, 2024

Uh oh!

JZ-LIANG Mar 8, 2024

Uh oh!

JZ-LIANG Mar 8, 2024

Uh oh!

JZ-LIANG left a comment

Uh oh!

Uh oh!

dist.to_static support pir program #62560

dist.to_static support pir program #62560

Uh oh!

Conversation

zhiqiu commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Mar 8, 2024

Uh oh!

winter-wang Mar 8, 2024

Choose a reason for hiding this comment

Uh oh!

zhiqiu Mar 8, 2024

Choose a reason for hiding this comment

Uh oh!

JZ-LIANG Mar 8, 2024

Choose a reason for hiding this comment

Uh oh!

zhiqiu Mar 8, 2024

Choose a reason for hiding this comment

Uh oh!

JZ-LIANG Mar 8, 2024

Choose a reason for hiding this comment

Uh oh!

JZ-LIANG Mar 8, 2024

Choose a reason for hiding this comment

Uh oh!

JZ-LIANG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhiqiu commented Mar 8, 2024 •

edited

Loading