CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[CINN] EliminateCommonGlobalVar pass, optimize performance #62517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CINN] EliminateCommonGlobalVar pass, optimize performance #62517
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
} | ||
tensor_indices.push_back(new_indice); | ||
} | ||
for (const auto& [var, extent] : for_var_extents_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个循环体中什么也不做吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 for 循环应该删掉,thanks
return AllIndiceAndExtentEqual(indice_and_extent); | ||
}; | ||
|
||
std::unordered_set<std::string> global_tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最好把global_tensor都换成global_buffer_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,thanks~
}; | ||
|
||
std::unordered_set<std::string> global_tensor; | ||
for (const auto& [tensor_name, indice_and_extent] : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensor_name -> buffer_name
不保证tensor_name和buffer_name全部一致,从代码上看,这个pass全部按buffer_name索引,包括对Load节点进行collect时,最好全部都用buffer_name。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,thanks~
} | ||
|
||
std::unordered_set<std::string> eliminate_tensor_names_; | ||
std::unordered_map<std::string, ir::Expr> global_tensor_to_local_tensor_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个key也是buffer_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,thanks~
…dle#62517) * [CINN] EliminateCommonGlobalVar pass, optimize performance * std::cerr->VLOG * Fix trick codes * CHECK->PADDLE_ENFORCE * Fix typo
…dle#62517) * [CINN] EliminateCommonGlobalVar pass, optimize performance * std::cerr->VLOG * Fix trick codes * CHECK->PADDLE_ENFORCE * Fix typo
…dle#62517) * [CINN] EliminateCommonGlobalVar pass, optimize performance * std::cerr->VLOG * Fix trick codes * CHECK->PADDLE_ENFORCE * Fix typo
PR types
Performance optimization
PR changes
Others
Description
pcard-76996
Softmax shape=[128, 12, 128, 128]
场景下做性能验证phi = 248 us,CINN = 308 us,优化后 252 us,有 22.2 % 的性能提升,与 phi 基本持平
注:本 PR 的 EliminateCommonGlobalTensor pass 采取了一种较为保守的 GlobalTensor 替换策略:当且仅当同一个 GlobalTensor 在不同 ScheduleBlock 的下标索引完全相等时,才会将其替换为 LocalTensor
Perform performance validation in the
Softmax shape=[128, 12, 128, 128]
scenarioPhi=248 us, CINN=308 us, optimized to 252 us, with a 22.2% performance improvement, basically on par with phi
Note: EliminateCommonGlobalTensor pass in this PR adopts a more conservative GlobalTensor replacement strategy: it will only be replaced with LocalTensor if and only if the same GlobalTensor has exactly the same index in different ScheduleBlocks