CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[DCU] New features for LLM #65398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DCU] New features for LLM #65398
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@@ -959,6 +960,7 @@ REGISTER_OP_CPU_KERNEL(matmul_grad_grad, | |||
#if defined(PADDLE_WITH_HIP) | |||
REGISTER_OP_CUDA_KERNEL( | |||
matmul, | |||
ops::MatMulKernel<phi::GPUContext, int8_t>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
量化相关的kernel可以拆分到推理的PR里面。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分代码paddle之后改动很小了,一并提交吧,未合入的推理文件是近期会变动较大的
@@ -51,6 +59,7 @@ extern void* flashattn_dso_handle; | |||
__macro(flash_attn_fwd_with_bias_and_mask); \ | |||
__macro(flash_attn_bwd_with_bias_and_mask); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个kernel能否也在DCU上实现?这样可以和GPU完全打平,就不用这个分支宏了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DCU适配版本没有调通该接口,该接口只有fused gate attention用到了,当前已适配的接口可以满足大多数场景
@@ -1014,13 +1014,16 @@ struct is_pod<phi::dtype::float16> { | |||
is_standard_layout<phi::dtype::float16>::value; | |||
}; | |||
|
|||
#if !(defined(PADDLE_WITH_CUSTOM_KERNEL) && defined(PADDLE_WITH_HIP)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这段逻辑确认只会在DCU相关的逻辑分支中调用吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此段逻辑只会在DCU上编三方算子时跳过,其他场景均不受影响,否则编paddlenlp中的三方算子会报错,此段修改配合cpp_extension的修改可以在不影响paddle已有的逻辑情况下使得三方算子的编译可以使用half static cast
paddle/phi/infermeta/ternary.cc
Outdated
} | ||
#else | ||
auto out_dims = q.dims(); | ||
PADDLE_ENFORCE_EQ(out_dims.size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里修改原来GPU默认的逻辑原因是什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle/phi/infermeta/unary.cc
Outdated
PADDLE_ENFORCE_EQ( | ||
((arch == 70) || (arch == 75) || (arch == 80) || (arch == 86) || | ||
(arch == 89) || (arch == 90)), | ||
((arch == 80) || (arch == 86) || (arch == 70) || (arch == 75)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要琦姐帮忙确认一下影响面。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个我修复下,合错了
@@ -321,7 +357,7 @@ void FlashAttnBaseKernel( | |||
DenseTensor* softmax, | |||
DenseTensor* softmax_lse, | |||
DenseTensor* seed_offset) { | |||
#ifdef PADDLE_WITH_FLASHATTN | |||
#if defined(PADDLE_WITH_FLASHATTN) || defined(PADDLE_WITH_HIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
@@ -496,7 +582,7 @@ void FlashAttnQKVPackedKernel( | |||
DenseTensor* softmax, | |||
DenseTensor* softmax_lse, | |||
DenseTensor* seed_offset) { | |||
#ifdef PADDLE_WITH_FLASHATTN | |||
#if defined(PADDLE_WITH_FLASHATTN) || defined(PADDLE_WITH_HIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
@@ -19,13 +19,13 @@ | |||
#include "paddle/phi/core/enforce.h" | |||
#include "paddle/phi/kernels/empty_kernel.h" | |||
|
|||
#ifdef PADDLE_WITH_FLASHATTN | |||
#if defined(PADDLE_WITH_FLASHATTN) || defined(PADDLE_WITH_HIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
#include "paddle/phi/backends/dynload/flashattn.h" | ||
#endif | ||
|
||
namespace phi { | ||
|
||
#ifdef PADDLE_WITH_FLASHATTN | ||
#if defined(PADDLE_WITH_FLASHATTN) || defined(PADDLE_WITH_HIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
@@ -468,7 +468,7 @@ def unix_custom_single_compiler( | |||
# Note(qili93): HIP require some additional flags for CMAKE_C_FLAGS | |||
if core.is_compiled_with_rocm(): | |||
cflags.append('-D__HIP_PLATFORM_HCC__') | |||
cflags.append('-D__HIP_NO_HALF_CONVERSIONS__=1') | |||
# cflags.append('-D__HIP_NO_HALF_CONVERSIONS__=1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里注释的原因是什么?后续还会打开吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在DCU上编三方算子需要注释,否则算子内部无法使用half static_cast,之后不会打开
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for op-benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for setup.py.in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Performance Optimization
PR Types
New features
Description
支持flash attention(mha,gqa前反向,单测通过)
支持a8w8相关算子(单测通过)
支持quant_linear相关算子(单测通过)
支持fused rope相关算子(单测通过)
支持multiclass_nms3 op(单测通过)
支持batch norm调用miopen(FLAGS_batch_norm_use_miopen=1使能,v1,v2单测通过)
支持gemm fp16计算类型(FLAGS_gemm_use_half_precision_compute_type=1使能)