Add phi kernel fp8_fp8_half_gemm_fused #64955

Wangzheee · 2024-06-06T06:46:20Z

PR Category

Performance Optimization

PR Types

New features

Description

pcard-71500

Add api, phi_kernel of fp8_fp8_half_gemm_fused, can fuse gemm+bias+scale+act (cuBLASLt: Global search optimization)

paddle-bot · 2024-06-06T06:46:25Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

vivienfanghuagood · 2024-06-06T06:52:16Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+      cudaGetDevice(&dev);
+      if (dev == 0) {
+        std::ofstream outfile;
+        outfile.open(config_filename_, std::ios::out | std::ios::trunc);


创建文件的方式可以再优化～

看起来放在析构还是合适的，整体功能后续后迁出，做成一个工具api

Wangzheee · 2024-06-19T09:11:13Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+    infile.close();
+  }
+
+  std::string config_filename_{"/tmp/paddle_cublaslt_cache"};


可配置参数，默认值是当前文件夹

ming1753 · 2024-06-19T09:25:43Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+    return &algo_in_map;
+  }
+
+  ~CublasLtAlgoCache() {


shape_range_info在析构时候将信息序列化到磁盘，可以参考https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/analysis_predictor.cc#L3196

CLAassistant · 2024-06-20T11:22:37Z

All committers have signed the CLA.

qingqing01 · 2024-06-24T10:30:25Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+                  int64_t value) {
+    *seed ^= hash_fn(value) + 0x9e3779b9 + (*seed << 6) + (*seed >> 2);
+  }
+};


上面调优代码看起来是针对矩阵通用的，不限于FP8矩阵计算，后续需要抽取作为通用调优代码。

是的，全局搜索矩阵乘调优后面会做成一个工具

qingqing01 · 2024-06-24T10:31:02Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+                                                 &epilogue,
+                                                 sizeof(epilogue));
+    PADDLE_CUBLASLT_STATUS_CHECK(cublasLtMatmulDescSetAttribute);
+  }


需要增加else分支

增加了 else 的 PADDLE_THROW errors

qingqing01 · 2024-06-24T10:33:56Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+      ctx, batch_count, m, n, k, x, y, scale, bias, activation_type, out);
+}
+
+}  // namespace cutlass_internal


上面代码是cublaslt，而不是cutlass_xx

同上，namespace复用了，和cutlass的统一修改

qingqing01 · 2024-06-24T10:34:48Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+
+namespace phi {
+namespace fusion {
+namespace cutlass_internal {


cublaslt_internal ?

这个和cutlass的kernel拆开的时候忘记区分了，下一个cutlass的pr是和这个pr共用了 phi和api，不太好分开改namespace，最好是两边都合入后统一改成 fp8_internal

qingqing01 · 2024-06-24T10:42:04Z

test/ir/inference/test_float8_gemm.py

+        return False
+    if get_cuda_version() < 12010:
+        return False
+    return True


上面代码和上一个PR是重复的

单测中是的，这个代码似乎不太好做成通用的吧？

qingqing01 · 2024-06-24T10:45:49Z

test/legacy_test/test_float8.py

+        return False
+    if get_cuda_version() < 12010:
+        return False
+    return True


同上，重复代码

qingqing01 · 2024-06-24T10:48:19Z

test/legacy_test/test_float8.py

+                )
+                # there exists some problem in cpu fp8 cast
+                if self.device == "gpu":
+                    self.assertTrue(paddle.equal_all(input2, expect))


CPU这里有啥问题？后续需要修复吧？

这里忘记删除单测的skip，cast、full、reshape之类的非计算的数值操作 OP 都是支持FP8的；
matmul只在固定GPU和CUDA版本才支持

qingqing01 · 2024-06-24T10:48:31Z

test/legacy_test/test_float8.py

+                expect = paddle.to_tensor([[1, 1]]).astype("float32")
+                # there exists some problem in cpu fp8 full
+                if self.device == "gpu":
+                    self.assertTrue(paddle.equal_all(expect, input_fp32))


已去除单测的skip

qingqing01 · 2024-06-24T10:52:35Z

test/legacy_test/test_float8.py

+                            paddle.cast(output_bf16, "float32"),
+                            paddle.to_tensor(expect_result),
+                        )
+                    )


上面计算了gelu/relu、bias组合的，但这里都没有检查结果正确性。

好的，增加 gelu/relu、bias组合的结果正确性检查

qingqing01 · 2024-06-24T10:56:22Z

python/paddle/tensor/linalg.py

@@ -324,6 +324,164 @@ def __check_input(x, y):
        return out


+def fp8_fp8_fp16_gemm_fused(


API放这里不太合适

yuanlehome

通过增加一个属性来控制输出dtype呢？这样就能用一个api了，可以减少很多重复代码吧

Wangzheee · 2024-06-24T11:46:40Z

通过增加一个属性来控制输出dtype呢？这样就能用一个api了，可以减少很多重复代码吧

这个之前考虑过，通过属性来控制没有直接api控制更方便直接，

yuanlehome · 2024-06-24T11:43:51Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+#define PADDLE_CUBLASLT_STATUS_CHECK(name)                                    \
+  PADDLE_ENFORCE_EQ(                                                          \
+      status,                                                                 \
+      CUBLAS_STATUS_SUCCESS,                                                  \
+      phi::errors::External(                                                  \
+          #name                                                               \
+          "execution error"                                                   \
+          "refer https://docs.nvidia.com/cuda/cublas/index.html to get more " \
+          "information"))


这个paddle没有提供类似的工具吗？

没有cublas 的 error 枚举定义吧

那你把这块加到这里吧 paddle/phi/core/enforce.h

yuanlehome · 2024-06-24T11:45:14Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+
+namespace phi {
+namespace fusion {
+namespace cutlass_internal {


感觉没必要新加这个namespace

和后续的cutlass的kernel一起开发的，复用同一套phi、api定义，在同一哥namespace下实现的，还没有做区分，可以合入后可以改成 fp8_internal，应该是需要namespace的，cutlass实现的时候有一些函数和定义可能其它地方会重名

yuanlehome · 2024-06-24T11:45:55Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+#include "paddle/phi/common/place.h"
+#include "paddle/phi/core/allocator.h"
+
+namespace dyl = phi::dynload;


这个省略感觉没必要吧，本身也没有几个字母

这个有六十多个地方用到了，还挺多的

yuanlehome · 2024-06-24T11:47:05Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+    infile.close();
+  }
+
+  std::string config_filename_{"./paddle_cublaslt_cache"};


这个路径是什么？这样使用相对路径不合适吧

当前是以当前路径，之后会把整体功能抽出，可配置各种参数

yuanlehome · 2024-06-24T11:48:17Z

paddle/phi/kernels/fusion/fp8_gemm/fp8_gemm_with_cublasLt/cublaslt_gemm.h

+    HashValue_(seed, hash_fn, static_cast<int64_t>(batch_offset));
+  }
+
+  void HashValue_(int64_t* seed,


Paddle的编码规范里没有类成员函数以下划线结尾的要求和先例

已修改成普通成员函数的命名

Wangzheee · 2024-06-25T06:45:37Z

通过增加一个属性来控制输出dtype呢？这样就能用一个api了，可以减少很多重复代码吧

改成一个API也行

jzhang533

LGTM for your excellent work
LBTM for PR-CI-Paddle-Doc-Preview

Aurelius84

LGTM for const_cast and @unittest.SkipIf

Aurelius84 · 2024-06-26T09:03:52Z

test/legacy_test/test_float8.py

+class TestFP8CastOp(unittest.TestCase):
+    def setUp(self):
+        self.dtype_dict = {
+            "float8_e4m3fn": core.VarDesc.VarType.FP8_E4M3FN,


建议不再依赖core.VarDesc，直接使用phi下的DataType，类似paddle.float32

sunzhongkai588

no docs changes, LGTM

vivienfanghuagood reviewed Jun 6, 2024

View reviewed changes

Wangzheee force-pushed the fp8_fp8_gemm_fused branch from 2d9da2d to 2baf1be Compare June 11, 2024 11:32

Wangzheee commented Jun 19, 2024

View reviewed changes

ming1753 reviewed Jun 19, 2024

View reviewed changes

Wangzheee force-pushed the fp8_fp8_gemm_fused branch from ed53665 to 375099d Compare June 20, 2024 12:04

qingqing01 reviewed Jun 24, 2024

View reviewed changes

yuanlehome reviewed Jun 24, 2024

View reviewed changes

Wangzheee closed this Jun 24, 2024

Wangzheee reopened this Jun 24, 2024

yuanlehome reviewed Jun 24, 2024

View reviewed changes

fix

51f6a21

Wangzheee force-pushed the fp8_fp8_gemm_fused branch from 7da01a4 to 51f6a21 Compare June 24, 2024 15:29

Wangzheee added 4 commits June 25, 2024 03:38

fix

6751148

fix

680202d

fix

dae72e7

fix

2eb65ba

fix

21e4651

qingqing01 previously approved these changes Jun 25, 2024

View reviewed changes

fi

1f31117

Wangzheee dismissed qingqing01’s stale review via 1f31117 June 25, 2024 13:27

fix

0915491

jzhang533 approved these changes Jun 26, 2024

View reviewed changes

Aurelius84 approved these changes Jun 26, 2024

View reviewed changes

sunzhongkai588 approved these changes Jun 26, 2024

View reviewed changes

zhangting2020 approved these changes Jun 26, 2024

View reviewed changes

raindrops2sea approved these changes Jun 26, 2024

View reviewed changes

Wangzheee changed the title ~~Add phi kernel fp8_fp8_fp16_gemm_fused, fp8_fp8_bf16_gemm_fused~~ Add phi kernel fp8_fp8_half_gemm_fused Jun 26, 2024

Wangzheee merged commit 599d8c4 into PaddlePaddle:develop Jun 26, 2024
33 of 34 checks passed

		@@ -324,6 +324,164 @@ def __check_input(x, y):
		return out


		def fp8_fp8_fp16_gemm_fused(

Add phi kernel fp8_fp8_half_gemm_fused #64955

Add phi kernel fp8_fp8_half_gemm_fused #64955

Uh oh!

Conversation

Wangzheee commented Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Jun 6, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuanlehome left a comment

Choose a reason for hiding this comment

Uh oh!

Wangzheee commented Jun 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wangzheee commented Jun 6, 2024 •

edited

Loading

CLAassistant commented Jun 20, 2024 •

edited

Loading