You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
原先vectorize在split/fuse时,会出现索引越界的bug:
The loop index in Split should be less than total loop's number.
The loop index in Fuse should be less than total loop's number!
你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.
PaddlePaddle#71973)
* Fix the loop index in Split/Fuse is not less than total loop's number.
* Fix the loop index in Split/Fuse is not less than total loop's number.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
CINN
PR Types
Bug fixes
Description
Pcard-89071
原先vectorize在split/fuse时,会出现索引越界的bug:
The loop index in Split should be less than total loop's number.
The loop index in Fuse should be less than total loop's number!
做了FuseAxisGroups(sch, block_id)后,如果原layout是NCHW,则输出到layout是3维[b, p, b],但是如果是NHWC,则layout是2维[b, p]。

1)当layout是NHWC输出2维时,仍旧用3维的方式去处理split/fuse,出现越界:
resnet50-NHWC网络中,vectorize在NHWC场景下,由于用了过多寄存器,有性能问题。
该patch先做功能修复,TileBroadcastTactic NHWC场景走CINN实现。
后续会尝试通过预测寄存器使用数量来tuner是否打开vectorize,再打开NHWC的vectorize实现。
2)当layout是NCHW输出3维时:
原写法会有一个for循环,该patch做了性能调优。TileBroadcastTactic NCHW时开启vectorize。