CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[Inference] Refine global search optimization for cuBLASLt and apply it in INT8 GEMM. #65597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
int repeats = search_times_; | ||
|
||
for (int loop = 0; loop < repeats; loop++) { | ||
status = dynload::cublasLtMatmul(handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在正式计时之前,是否应该先warmup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
实测有没有warmup对最终的性能无影响
} | ||
|
||
template <typename InT, typename OutT> | ||
void TestMatmulRun(cublasLtHandle_t handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为何这个函数叫Test***?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经改为RunAndMeasureAlgo
@@ -0,0 +1,703 @@ | |||
/* Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 -> 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
#pragma once | ||
|
||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
重复
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
|
||
namespace phi { | ||
namespace funcs { | ||
namespace cublaslt_internal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有必要新增一个cublaslt_internal吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个功能默认不开启,而且不计划对外暴露,添加一个namesapce标识下也没问题吧
TestMatmulRun(handle, | ||
matmul_desc, | ||
a_desc, | ||
b_desc, | ||
bias_desc, | ||
c_desc, | ||
alpha, | ||
beta, | ||
a, | ||
b, | ||
bias, | ||
c, | ||
params[i], | ||
start_event, | ||
stop_event, | ||
stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
函数里面有判断失败的情况,这里却没有任何利用的逻辑,不妨TestMatmulRun返回一个bool类型表示是否失败,这里有对应的处理逻辑
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
处理在函数内部,判断失败了之后time记为max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for add flags
PR Category
Inference
PR Types
New features
Description
Pcard-71500
将fp8中的cublaslt矩阵乘法全局搜索算法抽离至一个头文件,并同时应用于fp8和int8的matmul计算。
新增flag: FLAGS_enable_blaslt_global_search,默认false,关闭功能。
开启后会在计算int8 matmul时启用cuBLASLt全局搜索找寻最优kernel并缓存至“./paddle_cublaslt_cache”,首次搜索耗时稍长,之后相同的矩阵乘复用cache,不再搜索。