You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
CINN
PR Types
Performance
Description
通过循环条件的蕴含(Entailment)关系对不整除的循环进行优化
这个PR实际上实现了两个优化,第一个优化是消除不整除循环里面的
if
,例如对于下面的循环:观察到,
if
的条件事实上蕴含(Entail)了循环的条件,即当((k * 32) + thread.x) < 234
为真时,k < 8
一定也为真。可以使用反证法证明,假设k < 8
不为真,例如k = 8
,则无论thread.x
等于多少,((k * 32) + thread.x)
都至少为256,不满足小于234的条件,因此蕴含关系成立。因此,可以使用
if
的条件替代掉循环的条件,并把if
消去,得到新的循环:完成第一个优化后,又注意到第二个优化,即此时
k * 32
是这整个循环中的公共表达式,可以提取出来变成k_strided = k * 32
,这样进一步省掉一个寄存器,最终优化结果为:关于为什么两个优化实现在一个Pass里,因为这两个优化是息息相关的,都是和循环变量有关,两个优化都要对loop的extent和step进行修改,所以实现在一个Pass里更方便;另外优化二单独做是没有收益的,必须和优化一结合才有收益,因此分开实现可能导致优化一没匹配但优化二自己做了的情况
注:优化一是可以单独做的,在性能收益中占大头;优化二确实有可能匹配不上(尤其在动态shape下),但没关系,收益的大头已经拿到了
关于实现
由于新循环的条件是一个表达式而不是
k < extent
的形式,需要PolyFor才能表示,而当前我们的新IR中把PolyFor移除了,因此为了尽快上线该Pass,我只能先用一个C的宏来进行trick,即:上述循环实际生成的IR为:
也就是用宏把原有循环打断,然后重新构造一个PolyFor的结构,这样确实不太规范,但至少在现行IR中是安全的,而且字面上没有改变循环的结构,调试的时候直接把
CINN_ENTAIL_LOOP_CONDITION
这行删掉也能正常执行性能测试
Pcard-85711