You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.
chenwhql
changed the title
[AutoParallel] Generate replicated spmd for PHI API
[AutoParallel] Generate replicated spmd for PHI API and verify DP MP strategy
Sep 21, 2023
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR types
New features
PR changes
Others
Description
Pcard-73145
[AutoParallel] Generate replicated spmd for PHI API and verify DP MP strategy
本PR将通用的切分推导规则与转换逻辑生成至仅包含Tensor输入或输出的API中,通用的切分推导是将API的输入整体转换为Replicate状态,再进行Kernel运算,相当于每个节点都单独进行完整的运算。
在该规则生成之后,虽然性能较差,但相当一部分API可以测试动半的基础执行流程。目前具备专用切分推导规则的仅有matmul一个算子(且反向尚不完备),其他算子前反向切分推导规则将会是周期相对较长的逐算子扩量工作,通用规则的存在确保动半架构执行时不会因为切分推导策略不存在而直接失败。
本PR基于以上状况,通过一个简单的Demo网络,验证在动半架构下,DP、MP单策略的正确性。
Demo网络
仅包含两个matmul,以及一个mse loss(包含subtract、square、mean三个算子),仅执行前反向
DP demo改写
DP切分执行示意图:

MP demo改写
MP切分执行示意图:

测试原理
动半模式下的Tensor具有全局视角,即用户打印任意tensor值,均应该拿到与单卡一样的结果,如果取值时tensor处于Shard或者Replicate状态,则会自动触发通信补全数据
原方案调整TODO
目前Demo改写的复杂度比较高,易用性不佳。我们原先的方案约束是,动半模式下API的所有输入均需要是DsitTensor,这导致用户不仅需要对关键参数进行切分,还需要将其他不进行切分的输入均通过shard_tensor api由DenseTensor转换成Replicate的DistTensor,比如label数据,比如Optimizer的learning_rate(用户传入的是float,无法显式切分)。目前看来这个方案约束需要调整,允许输入存在DenseTensor,并自动将DenseTensor转换为Replicate的DistTensor,否则写法过于复杂,该功能目前已在开发中。
其他改动说明