Unify fused_dot_product_attention with scaled_dot_product_attention #65507

deepllz · 2024-06-27T02:22:53Z

PR Category

Performance Optimization

PR Types

Performance

Description

此PR继承自#63028

重构cudnn版本的flash attention API，与scaled_dot_product_attention对齐
性能测试：
Hopper卡上，cudnn版本的fa，对比开源版本的fa，性能有大幅提升，对不同shape的输入进行测试，得出性能提升与head_dim正相关，不同head_dim性能提升如下：
causal_mask=True，即mask=None：
head_dim=64， forward+backward性能提升36%~45%
head_dim=128，forward+backward性能提升65%~75%
arbitraty mask：
head_dim=64， forward+backward性能提升13%~16%
head_dim=128，forward+backward性能提升1%~5%
在Ampere卡上，cudnn版本的fa对比开源版本有30%~60%的降速，后续会考虑根据架构来选择使用两个API。
精度测试：
由于目前此单测对卡型号以及cuda和cudnn版本有要求，所以CI中暂时未执行到此单测，一下是单测执行结果：

Pcard-73145

fix error fix pir pass fix ci add more ut minor changes remove workspace_opt env minor change update api doc

paddle-bot · 2024-06-27T02:22:58Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Wong4j · 2024-07-04T08:54:24Z

LGTM

zhiqiu

LGTM

sneaxiy · 2024-07-05T03:51:24Z

python/paddle/incubate/nn/functional/fused_dot_product_attention.py

+    cu_seqlen_q = None
+    cu_seqlen_k = None
+    head_dim = query.shape[3]
+    scaling_factor = head_dim**-0.5


scaling_factor可以保留None的默认参数，这样以后可以给其他值。在为None的时候，自动变成head_dim**-0.5.

sneaxiy

请回退graph_pattern_detector.cc的改动，并给fused_dot_product_attention补回scaling_factor的参数选项。

deepllz · 2024-07-05T04:14:29Z

请回退graph_pattern_detector.cc的改动，并给fused_dot_product_attention补回scaling_factor的参数选项。

Done

…addlePaddle#65507) * update cudnn flash attention fix error fix pir pass fix ci add more ut minor changes remove workspace_opt env minor change update api doc * polish * fix static test * revert graph_pattern_detector.cc modify --------- Co-authored-by: Shijie Wang <jaywan@nvidia.com>

* update cudnn flash attention (#63028) update cudnn flash attention * Unify fused_dot_product_attention with scaled_dot_product_attention (#65507) * update cudnn flash attention fix error fix pir pass fix ci add more ut minor changes remove workspace_opt env minor change update api doc * polish * fix static test * revert graph_pattern_detector.cc modify --------- Co-authored-by: Shijie Wang <jaywan@nvidia.com> * add hopper arch support for flash_attention * add flag PADDLE_DISABLE_CUDNN_FA to disable use cudnn fa * fix environment * fix --------- Co-authored-by: Shijie <jaywan@nvidia.com> Co-authored-by: zhengzhonghui <zhengzhonghui@baidu.com>

Wong4j and others added 4 commits June 24, 2024 19:12

update cudnn flash attention

a5bf80e

fix error fix pir pass fix ci add more ut minor changes remove workspace_opt env minor change update api doc

Merge remote-tracking branch 'upstream/develop' into cudnn_fa

7bef2a2

polish

0b03d4c

Merge remote-tracking branch 'upstream/develop' into cudnn_fa

7b76834

deepllz added 4 commits June 27, 2024 17:36

fix static test

a5411ea

Merge remote-tracking branch 'upstream/develop' into cudnn_fa

8d277fc

fix conflict

3be383a

Merge remote-tracking branch 'upstream/develop' into cudnn_fa

d7e4636

deepllz changed the title ~~Update cudnn flash attention~~ Unify fused_dot_product_attention with scaled_dot_product_attention and polish some codes Jul 3, 2024

Wong4j self-requested a review July 4, 2024 08:51

zhiqiu previously approved these changes Jul 5, 2024

View reviewed changes

sneaxiy approved these changes Jul 5, 2024

View reviewed changes

deepllz added 2 commits July 5, 2024 11:56

Merge remote-tracking branch 'upstream/develop' into cudnn_fa

0eee3ca

Merge remote-tracking branch 'upstream/develop' into cudnn_fa

211ca93

Wong4j approved these changes Jul 5, 2024

View reviewed changes

sneaxiy requested changes Jul 5, 2024

View reviewed changes

deepllz dismissed zhiqiu’s stale review via e8c609a July 5, 2024 04:05

revert graph_pattern_detector.cc modify

e8c609a

deepllz changed the title ~~Unify fused_dot_product_attention with scaled_dot_product_attention and polish some codes~~ Unify fused_dot_product_attention with scaled_dot_product_attention Jul 5, 2024

deepllz requested a review from sneaxiy July 8, 2024 03:16

sneaxiy approved these changes Jul 8, 2024

View reviewed changes

sneaxiy merged commit 47461b5 into PaddlePaddle:develop Jul 8, 2024
31 of 32 checks passed

deepllz mentioned this pull request Jul 9, 2024

use cudnn fa on Hopper devices #65884

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unify fused_dot_product_attention with scaled_dot_product_attention #65507

Unify fused_dot_product_attention with scaled_dot_product_attention #65507

Uh oh!

deepllz commented Jun 27, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented Jun 27, 2024

Uh oh!

Wong4j commented Jul 4, 2024

Uh oh!

zhiqiu left a comment

Uh oh!

sneaxiy Jul 5, 2024

Uh oh!

deepllz Jul 5, 2024

Uh oh!

sneaxiy left a comment

Uh oh!

deepllz commented Jul 5, 2024

Uh oh!

Uh oh!

Uh oh!

Unify fused_dot_product_attention with scaled_dot_product_attention #65507

Unify fused_dot_product_attention with scaled_dot_product_attention #65507

Uh oh!

Conversation

deepllz commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Jun 27, 2024

Uh oh!

Wong4j commented Jul 4, 2024

Uh oh!

zhiqiu left a comment

Choose a reason for hiding this comment

Uh oh!

sneaxiy Jul 5, 2024

Choose a reason for hiding this comment

Uh oh!

deepllz Jul 5, 2024

Choose a reason for hiding this comment

Uh oh!

sneaxiy left a comment

Choose a reason for hiding this comment

Uh oh!

deepllz commented Jul 5, 2024

Uh oh!

Uh oh!

Uh oh!

deepllz commented Jun 27, 2024 •

edited

Loading