[Paddle Inference] support inference in dynamic graph #65962

zhoutianzi666 · 2024-07-11T12:09:51Z

PR Category

Inference

PR Types

New features

Description

Pcard-71500

背景

步骤一动态图代码

class ExampleLayer(paddle.nn.Layer):
    def __init__(self, hidd):
        super().__init__()
        self.fn = paddle.nn.Linear(hidd, hidd, bias_attr=False)
    def forward(self, x):
        for i in range(10):
            x = paddle.nn.functional.softmax(x,-1)
        x = x.cast("float32")
        x = self.func(x)
        return x
    def func(self, x):
        x = x + x
        return self.fn(x)

步骤二动转静代码

batch = 4096
hidd = 1024
dtype = "bfloat16"
x = paddle.rand([batch, hidd], dtype=dtype)
mylayer = ExampleLayer(hidd)
model = paddle.jit.to_static(
    mylayer,
    input_spec=[
        paddle.static.InputSpec(
            shape=[None, None], dtype=dtype),
    ])
# save to static model
save_path = "./checkpoints/infer"
paddle.jit.save(model, save_path)
print(f"static model has been to {save_path}")

步骤三静态图推理代码

from paddle.inference import Config
from paddle.inference import create_predictor
from paddle.inference import PrecisionType
model_dir = "checkpoints/"
model_file = model_dir + "/infer.pdmodel"
params_file = model_dir + "/infer.pdiparams"
config = Config(model_file, params_file)
config.enable_memory_optim()
gpu_precision = PrecisionType.Float32
config.enable_use_gpu(1000, 0, gpu_precision)
predictor = create_predictor(config)
result = predictor.run([x])

长久以来，Paddle Inference 推理流程是，
- 步骤1. 算法人员在动态图组网上完成组网代码和训练
- 步骤2. 部署人员开发动转静脚本将模型转化成静态图模型，保存到磁盘上
- 步骤3. 部署人员用python/C++ API 开发 静态图推理脚本
- 以上三个步骤的代码见上文，当遇到复杂模型时，步骤2和步骤3的代码量也会跟着多起来。
上述流程存在的问题
- 问题1: 全图用paddle.jit.save转换静态图有时存在较高的使用门槛（学习成本）-> 到底有没有必要将整个模型都转为静态图？
  - 解释：
    - 尽管全图 jit.to_static基本都可以成功，但是全图 jit.save在一些复杂模型中会遇到报错，需要修改用户代码才能避免。
- 问题2: 用户需要开发专门的动转静脚本、静态图推理脚本，学习静态图相关配置（如TensorRT配置等，同样存在一定学习成本）

我们拟提出混合动态图和静态图推理的模式，

解决问题1：
- 繁琐控制逻辑的部分或者那些根本没办法动转静的部分都不应该做动转静操作！
- 只将核心耗时部分转为静态图推理。
解决问题2:
- 用户只需要维护动态图的推理脚本，所有转静、Pass优化、缓存生成的操作均隐式进行，对用户不感知。
针对文章开头的例子，用户可以用下面的代码直接取代步骤2和步骤3的代码，达到同等推理效果。

# 用基于装饰器的方式推理，帮助用户省略步骤2和步骤3的代码，达到同等效果。
mylayer = paddle.incubate.jit.inference(mylayer)
# 开启TensorRT后端
# mylayer = paddle.incubate.jit.inference(mylayer, with_trt=True)
# 进行推理，返回得到结果
decorator_result = mylayer(x)

装饰器方式使用

此特性只支持py动态图推理部署用户
动态图推理时候，当用户意识到某个模块比较费时间，可以将此模块封装成py函数
然后加上装饰器paddle.incubate.jit.inference()，即可获得推理加速，用法十分简单。
举例
例如 transformer 架构的模型中，绝大部分的耗时应该是这样的语句
代码1

        for block in self.blocks:
            x, y = block(x, y, c, mask)

那么用户可以将上述语句抽象成下面的函数，并加上装饰器@paddle.incubate.jit.inference，调用即可获得加速
代码2

   @paddle.incubate.jit.inference()
    def transformer_blocks(self, x,y,c,mask):
        for block in self.blocks:
            x, y = block(x, y, c, mask)
        return x, y

将代码1换成调用[x,y] = self.transformer_blocks(x,y,c,mask) 即可获得加速

⚠️ 使用注意事项

@paddle.incubate.jit.inference() 仅适用于推理，不能用于训练。
确保要加速的函数是某nn.Layer的成员函数
- 也就是说该函数的第一个参数必须是self，且self是nn.Layer的子类。
- 代码中已经加上assert判断了
如果是用装饰器调用，则必须保证调用这个函数的时候，id(self)是该进程唯一的Type(self)！
- 这个是因为导出的静态图是和self强绑定的，例如权重。
- 如果存在model1和model2是同一个类的实例，应该避免使用装饰器的方式，而是用model1=paddle.incubate.jit.inference()(model1), model2=paddle.incubate.jit.inference(model2)代替。
确保该函数的每个参数都是
- paddle.Tensor , list[paddle.Tensor],None三者之一
- 函数定义的参数里面禁止含有*args和**kwargs之类的参数
- 已加assert判断了
- 并且每个参数在该函数的所有次调用的时候类型必须维持不变，也就是说，你如果第一次是None，那么你永远都必须是None，如果你第一次是个Tensor，那么你永远都必须是Tensor。
确保该函数的每个返回值都是paddle.Tensor的类型
输入如果是动态shape的话，当输入的维度的某个值第一次发生变化时，会重新做jit.save，并将此维度的这个值标记为None，表明此维度可变化，当再次变化的时候则无需再做jit.save

TODO
尝试自动释放不再需要的显存
根据业务需要，加入更多的推理参数。

paddle-bot · 2024-07-11T12:09:56Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

python/paddle/jit/api.py

python/paddle/jit/to_static_with_inference_backend.py

test/dygraph_to_static/test_to_static_inference.py

zhoutianzi666 · 2024-07-15T06:34:49Z

python/paddle/jit/to_static_with_inference_backend.py

+
+        # This is the inner_most decorator, ie. when user invoke the function decorated by @paddle.jit.to_static(backend='inference', )
+        # he is actually invoke this internel function.
+        def innermost_decorator(*args, **kwargs):


这个是装饰器最内的调用

test/dygraph_to_static/test_to_static_inference.py

python/paddle/jit/api.py

python/paddle/jit/inference.py

test/dygraph_to_static/test_to_static_inference.py

…ncubate_jit_inference.py

SigureMo · 2024-07-24T06:06:06Z

python/paddle/incubate/jit/inference_decorator.py

+            delete_pass_lists=delete_pass_lists,
+        )
+
+        # This is the inner_most decorator, ie. when user invoke the function decorated by @paddle.jit.to_static(backend='inference', )


Suggested change

# This is the inner_most decorator, ie. when user invoke the function decorated by @paddle.jit.to_static(backend='inference', )

# This is the inner_most decorator, ie. when user invoke the function decorated by @paddle.incubate.jit.inference(mylayer)

SigureMo · 2024-07-24T06:15:40Z

python/paddle/incubate/jit/inference_decorator.py

+def inference(
+    function=None,
+    cache_static_model=False,
+    save_model_dir=None,
+    memory_pool_init_size_mb=1000,
+    precision_mode="float32",
+    switch_ir_optim=True,
+    switch_ir_debug=False,
+    enable_cinn=False,
+    with_trt=False,
+    trt_precision_mode="float32",
+    trt_use_static=False,
+    collect_shape=False,
+    enable_new_ir=False,
+    exp_enable_use_cutlass=False,
+    delete_pass_lists=None,
+):


公开 API 需要加一下类型注解

公开 API 需要加一下类型注解

done.THX

XieYunshen

LGTM

SigureMo

类型注解不是很符合我们现在的规范，我来修改下吧

SigureMo · 2024-07-24T11:22:03Z

python/paddle/incubate/jit/inference_decorator.py

+
+def inference(
+    function=None,
+    cache_static_model: Optional[bool] = False,


这个不支持传 None 吧？为啥是 Optional？

改好了

好哦，感谢感谢！

这个不支持传 None 吧？为啥是 Optional？

none的时候会内部把它认为是默认路径.

SigureMo

LGTMeow

zyfncg

LGTM for setup.py.in

…5962) * support inference in dynamic graph * refine annotations --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>

all commit

26053da

zhoutianzi666 requested review from SigureMo, Aurelius84 and gouzil as code owners July 11, 2024 12:09

zhoutianzi666 changed the title ~~all commit~~ [Paddle Inference] support inference in dynamic graph Jul 11, 2024

zhoutianzi666 mentioned this pull request Jul 11, 2024

[Paddle Inference] support using triton ops #64560

Open

This comment was marked as resolved.

Sign in to view

format and review

86b0454

yuanlehome reviewed Jul 12, 2024

View reviewed changes

python/paddle/jit/api.py Outdated Show resolved Hide resolved

yuanlehome reviewed Jul 12, 2024

View reviewed changes

python/paddle/jit/api.py Outdated Show resolved Hide resolved

zhoutianzi666 added 4 commits July 12, 2024 06:44

fix CI erros

d1e0340

add Path.home()

ae40a8e

remove pir

fafbe99

add to_static_with_inference_backend.py

a601f52

SigureMo reviewed Jul 15, 2024

View reviewed changes

python/paddle/jit/to_static_with_inference_backend.py Outdated Show resolved Hide resolved

test/dygraph_to_static/test_to_static_inference.py Outdated Show resolved Hide resolved

zhoutianzi666 added 7 commits July 15, 2024 03:00

add get_inference_precision and register_triton_custom_ops

c9f88c6

reformat code , make easy understand

9c828f0

reformat code

c6d2c26

final commit

9ef9e95

add PIR

b825c8f

rename python/paddle/jit/inference_backend.py

be40491

final commit

d22f3c4

zhoutianzi666 commented Jul 15, 2024

View reviewed changes

final commit

0f523ac

SigureMo reviewed Jul 15, 2024

View reviewed changes

python/paddle/jit/api.py Outdated Show resolved Hide resolved

python/paddle/jit/inference.py Outdated Show resolved Hide resolved

SigureMo reviewed Jul 15, 2024

View reviewed changes

python/paddle/jit/inference.py Outdated Show resolved Hide resolved

python/paddle/jit/inference.py Outdated Show resolved Hide resolved

python/paddle/jit/inference.py Outdated Show resolved Hide resolved

python/paddle/jit/inference.py Outdated Show resolved Hide resolved

zhoutianzi666 added 2 commits July 15, 2024 08:59

final commit

8241c3a

10->5

e86be1e

qingqing01 reviewed Jul 16, 2024

View reviewed changes

zhoutianzi666 added 8 commits July 23, 2024 17:39

add more doc

c9f24eb

for CI coverage

116f90e

final commit

c6839f9

final commit

1284e23

fianl commit

920fb30

final commit

8b0bc45

commit

64b614f

/zhoukangkang/2023-04-26SM80/tmp/Paddle/test/dygraph_to_static/test_i…

2417b49

…ncubate_jit_inference.py

SigureMo reviewed Jul 24, 2024

View reviewed changes

YuanRisheng previously approved these changes Jul 24, 2024

View reviewed changes

XieYunshen previously approved these changes Jul 24, 2024

View reviewed changes

risemeup1 previously approved these changes Jul 24, 2024

View reviewed changes

zhoutianzi666 dismissed stale reviews from risemeup1, XieYunshen, and YuanRisheng via 5ce3e7e July 24, 2024 06:39

zhoutianzi666 added 3 commits July 24, 2024 06:40

fianl cpommit

5ce3e7e

final commit

426b1c1

final commit

7871a91

SigureMo reviewed Jul 24, 2024

View reviewed changes

refine annotations

828ba6e

XieYunshen approved these changes Jul 24, 2024

View reviewed changes

yuanlehome approved these changes Jul 24, 2024

View reviewed changes

SigureMo approved these changes Jul 24, 2024

View reviewed changes

qingqing01 approved these changes Jul 25, 2024

View reviewed changes

risemeup1 approved these changes Jul 25, 2024

View reviewed changes

YuanRisheng approved these changes Jul 25, 2024

View reviewed changes

zyfncg approved these changes Jul 25, 2024

View reviewed changes

zhoutianzi666 merged commit c4a6876 into PaddlePaddle:develop Jul 25, 2024
30 of 31 checks passed

co63oc pushed a commit to co63oc/Paddle that referenced this pull request Jul 25, 2024

[Paddle Inference] support inference in dynamic graph (PaddlePaddle#6…

2488bc0

…5962) * support inference in dynamic graph * refine annotations --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>

lixcli pushed a commit to lixcli/Paddle that referenced this pull request Aug 5, 2024

[Paddle Inference] support inference in dynamic graph (PaddlePaddle#6…

9f5ac06

…5962) * support inference in dynamic graph * refine annotations --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>

	# This is the inner_most decorator, ie. when user invoke the function decorated by @paddle.jit.to_static(backend='inference', )
	# This is the inner_most decorator, ie. when user invoke the function decorated by @paddle.incubate.jit.inference(mylayer)

[Paddle Inference] support inference in dynamic graph #65962

[Paddle Inference] support inference in dynamic graph #65962

Uh oh!

Conversation

zhoutianzi666 commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

背景

装饰器方式使用

⚠️ 使用注意事项

Uh oh!

paddle-bot bot commented Jul 11, 2024

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XieYunshen left a comment

Choose a reason for hiding this comment

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

zyfncg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhoutianzi666 commented Jul 11, 2024 •

edited

Loading