mix2dist_pass support shard randomly sampled data #67589

jeff41404 · 2024-08-20T12:57:40Z

PR Category

Auto Parallel

PR Types

New features

Description

pcard-86321
The Dit/LargeDit model sample a large amount of random data during its training. Especially when running in a distributed environment, these sampled random data also need to be sharded into different devices to reduce memory usage.
But at the beginning of mix2dist_pass design in PIR mode, it only supported the shard of input data and model parameter, and did not support the shard of other data. this PR will expand this capability.

import paddle
import paddle.distributed as dist
process_mesh = dist.ProcessMesh([0, 1], dim_names=['mp'])
# before this PR, only support
linear1 = paddle.nn.Linear(100, 200)
linear1.weight = dist.shard_tensor(linear1.weight, process_mesh, [dist.Shard(1)]) # can shard input data and model parameter
noise = paddle.randn(x.shape)
noise = dist.shard_tensor(noise, process_mesh, [dist.Replicate()) # In other cases, placements must be all Replicate, otherwise an error will be reported
# after this PR, also support
timesteps = paddle.randint(0, self.num_timesteps, (x.shape[0],))
timesteps = dist.shard_tensor(timesteps, process_mesh, [dist.Shard(1))  # can shard randomly sampled data(int)
noise = paddle.randn(x.shape)
noise = dist.shard_tensor(noise, process_mesh, [dist.Shard(1))  # can shard randomly sampled data(float)

paddle-bot · 2024-08-20T12:57:45Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

test/auto_parallel/pir/test_static_pir_program.py

JZ-LIANG

LGTM and revise in next PR

jeff41404 · 2024-08-21T03:17:42Z

LGTM and revise in next PR

thanks, it would be better to add the necessary unit test cases in this PR, I commit it,

JZ-LIANG

LGTM

mix2dist_pass support shard randomly sampled data

a1bb319

JZ-LIANG reviewed Aug 21, 2024

View reviewed changes

test/auto_parallel/pir/test_static_pir_program.py Show resolved Hide resolved

JZ-LIANG previously approved these changes Aug 21, 2024

View reviewed changes

add unit test case of checking full_int_array op upstream and its result

91aa024

jeff41404 dismissed JZ-LIANG’s stale review via 91aa024 August 21, 2024 03:10

JZ-LIANG approved these changes Aug 21, 2024

View reviewed changes

jeff41404 merged commit fb57ee7 into PaddlePaddle:develop Aug 21, 2024
28 checks passed

jeff41404 deleted the mix2dist_pass_support_shard_randomly_sampled_data branch August 21, 2024 08:53

jeff41404 mentioned this pull request Sep 12, 2024

dist2dense_pass fix shape errors in shard randomly sampled data #68067

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mix2dist_pass support shard randomly sampled data #67589

mix2dist_pass support shard randomly sampled data #67589

Uh oh!

jeff41404 commented Aug 20, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 20, 2024

Uh oh!

Uh oh!

JZ-LIANG left a comment

Uh oh!

jeff41404 commented Aug 21, 2024

Uh oh!

JZ-LIANG left a comment

Uh oh!

Uh oh!

Uh oh!

mix2dist_pass support shard randomly sampled data #67589

mix2dist_pass support shard randomly sampled data #67589

Uh oh!

Conversation

jeff41404 commented Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Aug 20, 2024

Uh oh!

Uh oh!

JZ-LIANG left a comment

Choose a reason for hiding this comment

Uh oh!

jeff41404 commented Aug 21, 2024

Uh oh!

JZ-LIANG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeff41404 commented Aug 20, 2024 •

edited

Loading