[DCU] New features for LLM #65398

yuguo-Jack · 2024-06-24T05:05:49Z

PR Category

Performance Optimization

PR Types

New features

Description

支持flash attention（mha，gqa前反向，单测通过）
支持a8w8相关算子（单测通过）
支持quant_linear相关算子（单测通过）
支持fused rope相关算子（单测通过）
支持multiclass_nms3 op（单测通过）
支持batch norm调用miopen（FLAGS_batch_norm_use_miopen=1使能，v1，v2单测通过）
支持gemm fp16计算类型（FLAGS_gemm_use_half_precision_compute_type=1使能）

paddle-bot · 2024-06-24T05:05:55Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

SylarTiaNII · 2024-06-24T09:34:03Z

paddle/fluid/operators/matmul_op.cc

@@ -959,6 +960,7 @@ REGISTER_OP_CPU_KERNEL(matmul_grad_grad,
 #if defined(PADDLE_WITH_HIP)
 REGISTER_OP_CUDA_KERNEL(
    matmul,
+    ops::MatMulKernel<phi::GPUContext, int8_t>,


量化相关的kernel可以拆分到推理的PR里面。

这部分代码paddle之后改动很小了，一并提交吧，未合入的推理文件是近期会变动较大的

SylarTiaNII · 2024-06-24T09:35:19Z

paddle/phi/backends/dynload/flashattn.h

@@ -51,6 +59,7 @@ extern void* flashattn_dso_handle;
  __macro(flash_attn_fwd_with_bias_and_mask); \
  __macro(flash_attn_bwd_with_bias_and_mask); \


这两个kernel能否也在DCU上实现？这样可以和GPU完全打平，就不用这个分支宏了。

DCU适配版本没有调通该接口，该接口只有fused gate attention用到了，当前已适配的接口可以满足大多数场景

SylarTiaNII · 2024-06-24T09:37:09Z

paddle/phi/common/float16.h

@@ -1014,13 +1014,16 @@ struct is_pod<phi::dtype::float16> {
                            is_standard_layout<phi::dtype::float16>::value;
 };

+#if !(defined(PADDLE_WITH_CUSTOM_KERNEL) && defined(PADDLE_WITH_HIP))


这段逻辑确认只会在DCU相关的逻辑分支中调用吗？

此段逻辑只会在DCU上编三方算子时跳过，其他场景均不受影响，否则编paddlenlp中的三方算子会报错，此段修改配合cpp_extension的修改可以在不影响paddle已有的逻辑情况下使得三方算子的编译可以使用half static cast

SylarTiaNII · 2024-06-24T09:40:54Z

paddle/phi/infermeta/ternary.cc

-    }
+#else
+  auto out_dims = q.dims();
+  PADDLE_ENFORCE_EQ(out_dims.size(),


这里修改原来GPU默认的逻辑原因是什么？

paddle中fa变长接口用的也是FlashAttnInferMeta，在此推导接口内会检查qkv是否是4维，但是变长接口传入的qkv是3维，所以会报错，我还奇怪难道nv上调用变长接口时候这个检查能通过吗，理论上这段逻辑的影响在NV和DCU上是相同的

这部分NV的逻辑解决了，已回退

SylarTiaNII · 2024-06-24T09:41:37Z

paddle/phi/infermeta/unary.cc

  PADDLE_ENFORCE_EQ(
-      ((arch == 70) || (arch == 75) || (arch == 80) || (arch == 86) ||
-       (arch == 89) || (arch == 90)),
+      ((arch == 80) || (arch == 86) || (arch == 70) || (arch == 75)),


需要琦姐帮忙确认一下影响面。

这个我修复下，合错了

SylarTiaNII · 2024-06-24T09:48:24Z

paddle/phi/kernels/gpu/flash_attn_kernel.cu

@@ -321,7 +357,7 @@ void FlashAttnBaseKernel(
    DenseTensor* softmax,
    DenseTensor* softmax_lse,
    DenseTensor* seed_offset) {
-#ifdef PADDLE_WITH_FLASHATTN
+#if defined(PADDLE_WITH_FLASHATTN) || defined(PADDLE_WITH_HIP)


SylarTiaNII · 2024-06-24T09:48:35Z

paddle/phi/kernels/gpu/flash_attn_kernel.cu

@@ -496,7 +582,7 @@ void FlashAttnQKVPackedKernel(
    DenseTensor* softmax,
    DenseTensor* softmax_lse,
    DenseTensor* seed_offset) {
-#ifdef PADDLE_WITH_FLASHATTN
+#if defined(PADDLE_WITH_FLASHATTN) || defined(PADDLE_WITH_HIP)


SylarTiaNII · 2024-06-24T09:48:43Z

paddle/phi/kernels/gpu/flash_attn_utils.h

@@ -19,13 +19,13 @@
 #include "paddle/phi/core/enforce.h"
 #include "paddle/phi/kernels/empty_kernel.h"

-#ifdef PADDLE_WITH_FLASHATTN
+#if defined(PADDLE_WITH_FLASHATTN) || defined(PADDLE_WITH_HIP)


SylarTiaNII · 2024-06-24T09:48:49Z

paddle/phi/kernels/gpu/flash_attn_utils.h

 #include "paddle/phi/backends/dynload/flashattn.h"
 #endif

 namespace phi {

-#ifdef PADDLE_WITH_FLASHATTN
+#if defined(PADDLE_WITH_FLASHATTN) || defined(PADDLE_WITH_HIP)


SylarTiaNII · 2024-06-24T09:49:54Z

python/paddle/utils/cpp_extension/cpp_extension.py

@@ -468,7 +468,7 @@ def unix_custom_single_compiler(
                # Note(qili93): HIP require some additional flags for CMAKE_C_FLAGS
                if core.is_compiled_with_rocm():
                    cflags.append('-D__HIP_PLATFORM_HCC__')
-                    cflags.append('-D__HIP_NO_HALF_CONVERSIONS__=1')
+                    # cflags.append('-D__HIP_NO_HALF_CONVERSIONS__=1')


这里注释的原因是什么？后续还会打开吗？

在DCU上编三方算子需要注释，否则算子内部无法使用half static_cast，之后不会打开

… develop

zhangting2020

LGTM for op-benchmark

SylarTiaNII

LGTM

zhangbo9674

LGTM for setup

phlrain

LGTM for setup.py.in

XiaoguangHu01

LGTM

[DCU] New features for LLM

c0feecd

paddle-bot bot added the contributor External developers label Jun 24, 2024

SylarTiaNII requested changes Jun 24, 2024

View reviewed changes

yuguo-Jack added 11 commits June 24, 2024 19:14

fix WeightQuantizeInferMeta

089582e

[DCU] fix flashattn cmake

8bd3ba0

fix

f6e0f05

fix code style

c856315

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6f8d470

… develop

[DCU] add FLAGS_batch_norm_use_miopen

9edaa1f

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c1ce1da

… develop

fix

5fc9c09

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7083f67

… develop

fix setup.py.in

19b2bd3

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

11ae709

… develop

zhangting2020 approved these changes Jul 3, 2024

View reviewed changes

SylarTiaNII approved these changes Jul 3, 2024

View reviewed changes

zhangbo9674 approved these changes Jul 3, 2024

View reviewed changes

phlrain self-requested a review July 3, 2024 06:46

phlrain approved these changes Jul 3, 2024

View reviewed changes

XiaoguangHu01 approved these changes Jul 3, 2024

View reviewed changes

gongweibao merged commit f561b06 into PaddlePaddle:develop Jul 3, 2024
31 of 32 checks passed

		@@ -51,6 +59,7 @@ extern void* flashattn_dso_handle;
		__macro(flash_attn_fwd_with_bias_and_mask); \
		__macro(flash_attn_bwd_with_bias_and_mask); \

[DCU] New features for LLM #65398

[DCU] New features for LLM #65398

Uh oh!

Conversation

yuguo-Jack commented Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Jun 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuguo-Jack Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuguo-Jack Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuguo-Jack Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuguo-Jack Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangting2020 left a comment

Choose a reason for hiding this comment

Uh oh!

SylarTiaNII left a comment

Choose a reason for hiding this comment

Uh oh!

zhangbo9674 left a comment

Choose a reason for hiding this comment

Uh oh!

phlrain left a comment

Choose a reason for hiding this comment

Uh oh!

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yuguo-Jack commented Jun 24, 2024 •

edited

Loading

yuguo-Jack Jun 24, 2024 •

edited

Loading

yuguo-Jack Jun 24, 2024 •

edited

Loading

yuguo-Jack Jun 24, 2024 •

edited

Loading

yuguo-Jack Jun 24, 2024 •

edited

Loading