CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
新增API local_map #71804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
新增API local_map #71804
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@@ -0,0 +1,280 @@ | |||
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2024 -> 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在仍然是2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
__all__ = ["local_map"] | ||
|
||
PlacementType = Sequence[dist.Placement] | None | ||
InputPlacements = tuple[PlacementType, ...] | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么输入和输出支持的Placements参数类型不同?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经统一
def local_map( | ||
func: Callable, | ||
out_placements: OutputPlacements, | ||
in_placements: InputPlacements | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in_placements: InputPlacements | None, | |
in_placements: Optional[tuple[list[dist.Placement], ...]], |
没必要新创建太多类型命名,与框架中其它模块的使用习惯都不相同,反而增加用户理解成本。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
for idx, arg in enumerate(flat_args): | ||
if _is_distributed_tensor(arg): | ||
# TODO: the current code doesn't consider the uneven sharding case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个注释是啥意思?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经删除,暂时不考虑这个
redistribute_inputs: bool | None, | ||
): | ||
""" | ||
:meth:`local_map` is an experimental API that allows users to pass dist_tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们没有experimental API
这种说法
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
if arg.placements != spec: | ||
if redistribute_inputs: | ||
# Redistribute to input placements | ||
arg = arg.redistribute(process_mesh, spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们有redistribute
这个接口吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经删除并按照paddle框架改写
in_placements: InputPlacements | None, | ||
process_mesh: ProcessMesh | None, | ||
*, | ||
redistribute_inputs: bool | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
命名要符合现有框架的习惯,我们没有redistribute
这种叫法
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
else: | ||
return out | ||
|
||
def _is_distributed_tensor(tensor) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _is_distributed_tensor(tensor) -> bool: | |
def is_dist_tensor(tensor) -> bool: |
这是一个很基础的方法,应该放在更公共的地方,方便其它模块复用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
return pack_sequence_as(out, flat_dist_out) | ||
else: | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果用户的输入没有dist_tensor,但指定了输出的分布式标记,这个时候直接忽略输出标记,是一种合理的行为吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经增加了这种情况的处理
if TYPE_CHECKING: | ||
from paddle.distributed import ProcessMesh | ||
|
||
__all__ = ["local_map"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不通过auto_parallel.local_map路径对外暴露,不应该加在这个文件的__all__
里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -19,6 +19,8 @@ if(WITH_DISTRIBUTE AND WITH_GPU) | |||
py_test_modules(test_mlp MODULES test_mlp ENVS FLAGS_enable_pir_api=1) | |||
py_test_modules(test_local_layer MODULES test_local_layer ENVS | |||
FLAGS_enable_pir_api=1) | |||
py_test_modules(test_local_map MODULES test_local_map ENVS | |||
FLAGS_enable_pir_api=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why still need FLAGS_enable_pir_api=1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前是为了对标LocalLayer,不过现在好像确实不用标记为pir,单测都会自动走,已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
According to the newly added API specification of paddle, it is necessary to write the API Chinese documentation in docs repo for users to refer to the official website. please add link of docs repo PR in description above. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
in_placements: list[list[dist.Placement]] | None, | ||
process_mesh: ProcessMesh | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认值的类型注释是不是还得加一下 = None
? @SigureMo 一师傅看看
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这得看接口形态,如果没有默认值的话就是不需要加 = None
的啊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这得看接口形态,如果没有默认值的话就是不需要加
= None
的啊
麻烦您看看还有其他什么大问题吗,没有的话,可以先approval吗,格式我会再提一个pr修复
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
这里我没说过要改,这里具体是否要加 = None
需要看接口形态,这里完全是 @sunzhongkai588 不懂这里写的评论,我只是给他解释这一点
请 review 这里的改动,如果从接口形态上来看确实需要有默认值,且需要为 None
,那么可以改这里,否则不需要改这里
Default: None | ||
|
||
reshard_inputs (bool, optional): | ||
the bool value indicating whether to reshard the input :dist_tensor` s when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:dist_tensor` 是不是写错了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文档问题之后再提 PR 修复,一师傅别忘了回复一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
中文文档 PR 也没写么?可以下个 PR,反正 @sunzhongkai588 同意了
|
||
|
||
def local_map( | ||
func: Callable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
泛型写清楚内部参数类型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要,Callable[..., Any]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
in_placements: list[list[dist.Placement]] | None, | ||
process_mesh: ProcessMesh | None, | ||
reshard_inputs: bool = False, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写清楚返回值类型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没看到啊……
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry,理解错了您的意思,现在加了
in_placements. | ||
|
||
Example: | ||
>>> from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个示例代码格式不对吧,这样能正确渲染吗?就算英文能正确渲染,中文文档也无法使用 COPY-FROM copy 过去
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Raises: | ||
AssertionError: If the number of output placements does not match the number | ||
of function outputs. | ||
|
||
AssertionError: If a non-tensor output has a non-None placement specified. | ||
|
||
AssertionError: If process_mesh is None and there are no dist_tensor inputs | ||
but out_placements contains non-None values. | ||
|
||
ValueError: If the input dist_tensor placements don't match the required | ||
in_placements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
根据文档规范不写 Raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
中文文档写了的,好的,格式问题我统一修改一下
---- 回复的原邮件 ----
| 发件人 | Nyakku ***@***.***> |
| 发送日期 | 2025年04月11日 10:41 |
| 收件人 | PaddlePaddle/Paddle ***@***.***> |
| 抄送人 | zty-king ***@***.***>,
Author ***@***.***> |
| 主题 | Re: [PaddlePaddle/Paddle] 新增API local_map (PR #71804) |
@SigureMo approved this pull request.
中文文档 PR 也没写么?可以下个 PR,反正 @sunzhongkai588 同意了
In python/paddle/distributed/auto_parallel/local_map.py:
+# limitations under the License.
+from __future__ import annotations
+
+import functools
+from typing import TYPE_CHECKING, Callable
+
+import paddle
+import paddle.distributed as dist
+from paddle.utils import flatten, pack_sequence_as
+
+if TYPE_CHECKING:
+ from paddle.distributed import ProcessMesh
+
+
+def local_map(
+ func: Callable,
泛型写清楚内部参数类型
In python/paddle/distributed/auto_parallel/local_map.py:
+
+import paddle
+import paddle.distributed as dist
+from paddle.utils import flatten, pack_sequence_as
+
+if TYPE_CHECKING:
+ from paddle.distributed import ProcessMesh
+
+
+def local_map(
+ func: Callable,
+ out_placements: list[list[dist.Placement]],
+ in_placements: list[list[dist.Placement]] | None,
+ process_mesh: ProcessMesh | None,
+ reshard_inputs: bool = False,
+):
写清楚返回值类型
In python/paddle/distributed/auto_parallel/local_map.py:
+ and returns dist_tensors constructed from the return values of ``func``.
+
+ Raises:
+ AssertionError: If the number of output placements does not match the number
+ of function outputs.
+
+ AssertionError: If a non-tensor output has a non-None placement specified.
+
+ AssertionError: If process_mesh is None and there are no dist_tensor inputs
+ but out_placements contains non-None values.
+
+ ValueError: If the input dist_tensor placements don't match the required
+ in_placements.
+
+ Example:
+ >>> from __future__ import annotations
这个示例代码格式不对吧,这样能正确渲染吗?就算英文能正确渲染,中文文档也无法使用 COPY-FROM copy 过去
In python/paddle/distributed/auto_parallel/local_map.py:
+ Raises:
+ AssertionError: If the number of output placements does not match the number
+ of function outputs.
+
+ AssertionError: If a non-tensor output has a non-None placement specified.
+
+ AssertionError: If process_mesh is None and there are no dist_tensor inputs
+ but out_placements contains non-None values.
+
+ ValueError: If the input dist_tensor placements don't match the required
+ in_placements.
根据文档规范不写 Raises
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
好的
---- 回复的原邮件 ----
| 发件人 | Nyakku ***@***.***> |
| 发送日期 | 2025年04月11日 11:28 |
| 收件人 | PaddlePaddle/Paddle ***@***.***> |
| 抄送人 | zty-king ***@***.***>,
Author ***@***.***> |
| 主题 | Re: [PaddlePaddle/Paddle] 新增API local_map (PR #71804) |
@SigureMo commented on this pull request.
In python/paddle/distributed/auto_parallel/local_map.py:
+# limitations under the License.
+from __future__ import annotations
+
+import functools
+from typing import TYPE_CHECKING, Callable
+
+import paddle
+import paddle.distributed as dist
+from paddle.utils import flatten, pack_sequence_as
+
+if TYPE_CHECKING:
+ from paddle.distributed import ProcessMesh
+
+
+def local_map(
+ func: Callable,
需要,Callable[..., Any]
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sorry to inform you that fdbfcf3's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
798c1a6
PaddlePaddle/docs#7245 这个是对应中文文档PR |
明白,这里确实是加默认值的
---- 回复的原邮件 ----
| 发件人 | Nyakku ***@***.***> |
| 发送日期 | 2025年04月15日 23:58 |
| 收件人 | PaddlePaddle/Paddle ***@***.***> |
| 抄送人 | zty-king ***@***.***>,
Author ***@***.***> |
| 主题 | Re: [PaddlePaddle/Paddle] 新增API local_map (PR #71804) |
@SigureMo commented on this pull request.
In python/paddle/distributed/auto_parallel/local_map.py:
+ in_placements: list[list[dist.Placement]] | None,
+ process_mesh: ProcessMesh | None,
Done
这里我没说过要改,这里具体是否要加 = None 需要看接口形态,这里完全是 @sunzhongkai588 不懂这里写的评论,我只是给他解释这一点
请 review 这里的改动,如果从接口形态上来看确实需要有默认值,且需要为 None,那么可以改这里,否则不需要改这里
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* 新增API local_map * 修正文件格式 * 优化了local_map一些功能 * 新增reshard功能,同时兼容动静态下的local_map调用 * 修改格式规范 * 修正单测的接口命名 * 用local_map替换LocalLayer * 单测使用local_map时的参数设置修改,reshard设置为True * 修正单测 * 修改单测 * 修改格式 * 修改格式 * 修改测试样例格式 * 修改测试样例格式
PR Category
Auto Parallel
PR Types
Others
Description
新增API local_map
1、相关背景
在分布式训练场景中,经常需要将分布式张量(dist_tensor)传递给为仅仅能处理普通张量(dense_tensor)或者必须以本地视角处理本地张量的函数。为了简化这个过程,需要提供一个工具函数来处理分布式张量到普通张量的转换,以及将函数处理的结果重新加上分布式属性。local_map API 就是为了解决这个问题而设计的。
2、功能目标
local_map
函数的主要功能是允许用户将分布式张量传递给为普通张量编写的函数。它实现了以下目标:3、意义
为 Paddle 分布式训练提供更便捷的张量处理方式,使得用户可以轻松地在分布式环境中复用为普通张量编写的函数。
4、常见使用场景
带 mask 的 loss 计算:需要在每张卡上独立计算 masked token 的 loss
MoE (混合专家模型)相关计算:
aux_loss 计算:基于每张卡上专家分配到的局部 token 数进行计算
z_loss 计算:对每张卡上的 logits 独立计算 z_loss
张量 reshape 操作:在局部维度上进行 shape 变换
5、local_map相比较LocalLayer的优化点