[Paddle TensorRT] Support inference using predictor.run #67755

lizexu123 · 2024-08-27T13:05:07Z

PR Category

Inference

PR Types

Others

Description

card-71500
本PR设计了
1、Paddle TensorRT用户接口，包括paddle.tensorrt.Input,
paddle.tensorrt.TensorConfig, paddle.tensorrt.export, paddle.tensorrt.export_loaded_model四个接口
2、以及使用predictor跑通新ir下的paddle-tensorrt，修复了tensorrt_op的data_layout的问题，修复了paddle.static.save_inference_model接口调用get_pir_parameters函数时候，program中var无 is_parameter属性导致无法保存的问题
3、增加了pybind中scope方法 local_var_names，此功能可查看scope中的var的name，方便调试
4、解决了动转静接口中，无法拿到start_program，导致warmup_shape_infer函数中执行executor时，报scope找不到var的问题
1.本PR提供了四个用户接口

paddle.tensorrt.Input,
paddle.tensorrt.TensorConfig,
paddle.tensorrt.export,
paddle.tensorrt.export_loaded_model

一:
paddle.tensorrt.Input，提供用户输入
参数:
min_input_shape:最小输入shape
max_input_shape:最大输入shape
optim_input_shape:最优输入shape
input_data_type:输入类型，比如float32,int32等
input_range:输入的范围，比如(0,1)

class Input:
    def __init__(
        self,
        min_input_shape,
        max_input_shape,
        optim_input_shape=None,
        input_data_type=None,
        input_range=None,
    ):
使用示例:
input_config = Input(
     min_input_shape=(1, 3, 224, 224),
     optim_input_shape=(2, 3, 224, 224),
     max_input_shape=(4, 3, 224, 224),
     input_data_type='float32',
)

二:
paddle.tensorrt.TensorRTConfig:
参数:
inputs:这里是为了适配模型有多个输入。
min_subgraph_size:最小形成tensorrt op的最小paddle子图数量
save_model_dir:将带有tensorrt op的program保存成json的路径
disable_ops:禁止进入trt的op名称

class TensorRTConfig:
    def __init__(
        self,
        inputs,
        min_subgraph_size=3,
        save_model_dir=None,
        disable_ops=None,
    ):
使用示例:
trt_config = TensorRTConfig(inputs=[input_config])
trt_config.disable_ops = ["pd_op.dropout"]
trt_save_path = os.path.join(self.temp_dir.name, 'trt')
trt_config.save_model_dir = trt_save_path

三:
paddle.tensorrt.export 动转静场景使用，使用此接口，将动态图模型转为静态图，并得到优化后的带有tensorrt op的program

def export(funtion=None, input_spec=None, config=None, **kwargs):
    # Converts dynamic graph APIs into static graph
    static_net = paddle.jit.to_static(
        funtion,
        input_spec=input_spec,
        **kwargs,
    )
参数:
function: 需要转换为静态图的动态图网络
config: TensorRT 配置
input_spec: 动转静时的输入规格 (InputSpec)
kwargs: 其他 `to_static` 支持的参数

四:
paddle.tensorrt.export_loaded_model 直接加载模型(pdmodel或json)场景使用，得到优化后的带有tensorrt op的program

def export_loaded_model(model_dir, trt_config):
使用示例：
input_config = Input(
                min_input_shape=(9, 10, 11),
                optim_input_shape=(9, 10, 11),
                max_input_shape=(9, 10, 11),
            )
# Create a TensorRTConfig with inputs as a required field.
trt_config = TensorRTConfig(inputs=[input_config])
trt_save_path = os.path.join(self.temp_dir.name, 'trt')
trt_config.save_model_dir = trt_save_path
model_dir = self.save_path
# Obtain tensorrt_engine_op by passing the model path and trt_config.(converted_program)
program_with_trt = export_loaded_model(model_dir, trt_config)

paddle-bot · 2024-08-27T13:05:13Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

YuanRisheng · 2024-08-28T12:36:07Z

python/paddle/tensorrt/util.py

@@ -110,3 +110,22 @@ def warmup_shape_infer(program, min_shape_feed, max_shape_feed):
                executor.run(
                    program, feed=max_shape_feed, fetch_list=[output_var]
                )
+
+
+def warmup_shape_infer_v2(


统一使用一个warmup_shape_infer，不要整v1,v2这种情况

已经修改为统一的warmup_shape_infer

YuanRisheng · 2024-08-28T12:38:01Z

test/tensorrt/test_converter_model_cumsum.py

+)
+
+
+class TestConverterResNet50(unittest.TestCase):


你这测的不是resnet50吧，这个文件直接改成test_predict_model吧

修改为test_paddle_to_tensorrt_conversion_cumsum

YuanRisheng · 2024-09-02T03:08:34Z

python/paddle/tensorrt/predictor.py

+        if self.input_data_type and not self.input_range:
+            if 'int' in self.input_data_type:
+                self.input_min_data = np.random.randint(
+                    1, 10, size=self.min_input_shape
+                )
+                self.input_max_data = np.random.randint(
+                    1, 10, size=self.max_input_shape
+                )
+            else:
+                low, high = self.input_range
+                self.input_min_data = np.random.uniform(
+                    low, high, size=self.input_min_data
+                ).astype(self.input_data_type)
+                self.input_max_data = np.random.uniform(
+                    low, high, size=self.input_max_data
+                ).astype(self.input_data_type)
+
+        elif self.input_data_type and self.input_range:
+            low, high = self.input_range
+            self.input_min_data = np.random.uniform(
+                low, high, size=self.min_input_shape
+            ).astype(self.input_data_type)
+            self.input_max_data = np.random.uniform(
+                low, high, size=self.max_input_shape
+            ).astype(self.input_data_type)


这段代码写的有点绕，可以提前判断给input_data_type和input_range设置默认值，在生成随机数的过程中就不用判断input_data_type和Input_range是否存在了

YuanRisheng · 2024-09-02T03:10:34Z

python/paddle/tensorrt/predictor.py

+        else:
+            self.input_min_data = np.ones(self.min_input_shape).astype(
+                'float32'
+            )
+            self.input_max_data = np.ones(self.max_input_shape).astype(
+                'float32'
+            )


统一使用随机数生成，这里的代码感觉没啥意义

YuanRisheng · 2024-09-02T03:13:23Z

python/paddle/tensorrt/predictor.py

@@ -0,0 +1,275 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved


这个文件改个名字吧，叫export.py

已修改为export.py

YuanRisheng · 2024-09-02T03:14:11Z

python/paddle/tensorrt/predictor.py

+        input_min_data=None,
+        input_max_data=None,
+        input_optim_data=None,


有input了这还有用吗

YuanRisheng · 2024-09-02T03:15:53Z

test/tensorrt/test_converter_model_cumsum.py

+            # Step1.1: get original results(for tests only)
+            pred_res = predict_program(
+                program,
+                {feed_name[0]: input_data_min_shape},
+                fetch_var_list=output_var,
+            )
+
+            # # Step2: run warmup for collecting shape
+            warmup_shape_infer_v2(
+                program,
+                min_shape_feed={feed_name[0]: input_data_min_shape},
+                max_shape_feed={feed_name[0]: input_data_min_shape},
+                fetch_var_list=output_var,
+            )
+
+            # # Step3: run pir pass(including some fusion pass and trt_op_marker_pass)
+            # program_with_pir = run_pir_pass(program, partition_mode=False)
+
+            program_with_pir = run_pir_pass(program, partition_mode=True)
+
+            output_var = program_with_pir.list_vars()[-2]
+
+            trt_output_var = []
+
+            for op in program_with_pir.global_block().ops:
+                if op.name() == "pd_op.fetch":
+                    for operand in op.operands():
+                        source = operand.source()
+                        trt_output_var.append(source)
+
+            # Step5: run TRTConverter(would lower group_op into tensorrt_engine_op)
+            converter = PaddleToTensorRTConverter(program_with_pir, scope)
+            converter.convert_program_to_trt()
+
+            # #executor.run
+            # output_converted=predict_program(
+            #     program_with_pir,
+            #     {feed_target_names[0]: input_data_min_shape},
+            #     trt_output_var
+            # )


这些逻辑用新封装的接口来替代

这些都还没改呢

修改完毕，上述代码是跑通时候提交的

YuanRisheng · 2024-09-02T03:18:14Z

test/tensorrt/get_program.py

+def get_program(model_dir, prefix, use_pir=False):
+    """
+    Load a PaddlePaddle inference model with optional PIR API support.
+
+    Args:
+        model_dir (str): The directory where the model and parameters are stored.
+        prefix (str): The prefix of the model files without file extension.
+        use_pir (bool, optional): Flag to determine if PIR API should be used. Default is False.
+
+    Returns:
+        tuple: A tuple containing the loaded program, scope, feed target names, fetch targets, and model filename.
+    """
+    scope = paddle.static.global_scope()
+    place = paddle.CUDAPlace(0)
+    exe = static.Executor(place)
+
+    # Check if we should use PIR API
+    if use_pir:
+        # Use PIR API context manager if required
+        model_filename = os.path.join(model_dir, prefix + ".json")
+        params_filename = os.path.join(model_dir, prefix + ".pdiparams")
+    else:
+        model_filename = os.path.join(model_dir, prefix + ".pdmodel")
+        params_filename = os.path.join(model_dir, prefix + ".pdiparams")
+
+    with paddle.pir_utils.IrGuard():
+        # Load the model
+        [program, feed_target_names, fetch_targets] = (
+            paddle.static.io.load_inference_model(
+                model_dir,
+                executor=exe,
+                model_filename=model_filename,
+                params_filename=params_filename,
+            )
+        )
+    return program, scope, feed_target_names, fetch_targets


前边已有类似功能，这个函数是不是可以删掉

这个函数得加载用的

使用前边的get_trt_program不可以吗

哦哦对，忘了

get_program已经删除，跑通阶段提交的代码

YuanRisheng · 2024-09-02T03:23:41Z

python/paddle/tensorrt/predictor.py

+                op.set_bool_attr("__l_trt__", False)
+
+
+def converter_trt_program(program, trt_config, scope):


改个名字吧，叫convert_to_trt

已修改为convert_to_trt

… pir_trt_7

YuanRisheng · 2024-09-04T09:04:30Z

python/paddle/tensorrt/converter.py

+                if not source.initialized():
+                    _logger.warning(f"Skipping uninitialized source: {source}")
+                    continue
+                else:


有了上边的if这里不需要再else了

… pir_trt_7

YuanRisheng · 2024-09-06T02:49:24Z

python/paddle/tensorrt/export.py

+    def forbid_op_lower_trt(self, program, disabled_ops):
+        if isinstance(disabled_ops, str):
+            disabled_ops = [disabled_ops]
+        for op in program.global_block().ops:
+            if op.name() in disabled_ops:
+                op.set_bool_attr("__l_trt__", False)


这里不需要这个函数吧，util.py里已经有类似的功能了，直接在converter_to_trt里调用不行吗

在util.py中保留了

YuanRisheng · 2024-09-06T02:49:45Z

python/paddle/tensorrt/export.py

+
+
+# return an optimized program with pd_op.tensorrt_engine operations.
+def converter_to_trt(program, trt_config, scope):


改个名吧，叫convert_to_trt怎么样，converter是名词吧

YuanRisheng · 2024-09-06T02:56:34Z

python/paddle/tensorrt/util.py

+            if fetch_var_list is None:
+                for _ in range(1):
+                    output_original = executor.run(
+                        program, feed=min_shape_feed, fetch_list=[output_var]
+                    )
+
+                # Run the program with input_data_max_shape (fake max_shape input)
+                for _ in range(1):
+                    executor.run(
+                        program, feed=max_shape_feed, fetch_list=[output_var]
+                    )
+            else:
+                for _ in range(1):
+                    output_original = executor.run(
+                        program, feed=min_shape_feed, fetch_list=fetch_var_list
+                    )
+
+                    # Run the program with input_data_max_shape (fake max_shape input)
+                for _ in range(1):
+                    executor.run(
+                        program, feed=max_shape_feed, fetch_list=fetch_var_list
+                    )


这段代码可以简化吧，直接将output_var放到fetch_var_list里不就可以了吗，不需要写这么多executor.run吧

YuanRisheng · 2024-09-06T03:00:40Z

test/tensorrt/test_converter_model_cumsum.py

+
+            model_dir = self.save_path
+            # Obtain tensorrt_engine_op by passing the model path and trt_config.(converted_program)
+            program_with_trt = export_loaded_model(model_dir, trt_config)


再加一个单测吧，组网可以不用太复杂，测试export接口的，另外这个单测名字改一下吧，叫test_export.py吧

YuanRisheng · 2024-09-09T08:14:05Z

python/paddle/tensorrt/export.py

+            else:
+                for param_or_buffer in concrete_program.parameters:
+                    if param_or_buffer.type == core.VarDesc.VarType.VOCAB:
+                        scr_tensor = param_or_buffer.value().get_map_tensor()
+                        tgt_var = scope.var(param_or_buffer.name)
+                        tgt_var.set_vocab(scr_tensor)
+                    else:
+                        # share to scope
+                        param_or_buffer_tensor = scope.var(
+                            param_or_buffer.name
+                        ).get_tensor()
+                        src_tensor = (
+                            state_var_dict[param_or_buffer.name]
+                            .value()
+                            .get_tensor()
+                        )
+                        param_or_buffer_tensor._share_data_with(src_tensor)
+                    # record var info
+                    if param_or_buffer.name not in extra_var_info:
+                        extra_info_dict = {}
+                        if param_or_buffer.name in state_names_dict:
+                            extra_info_dict['structured_name'] = (
+                                state_names_dict[param_or_buffer.name]
+                            )
+                        extra_info_dict['stop_gradient'] = (
+                            param_or_buffer.stop_gradient
+                        )
+                        if isinstance(param_or_buffer, EagerParamBase):
+                            extra_info_dict['trainable'] = (
+                                param_or_buffer.trainable
+                            )
+                        extra_var_info[param_or_buffer.name] = extra_info_dict


这些代码跑不到吧，我们这个接口就是在pir模式下跑的

确实，删除

SigureMo · 2024-09-09T08:30:08Z

python/paddle/tensorrt/export.py

+
+
+# Obtain a program with tensorrt_op for dynamic-to-static scenarios.
+def export(funtion=None, input_spec=None, config=None, **kwargs):


Suggested change

def export(funtion=None, input_spec=None, config=None, **kwargs):

def export(function=None, input_spec=None, config=None, **kwargs):

SigureMo · 2024-09-09T08:43:37Z

python/paddle/tensorrt/export.py

+                        extra_var_info[param_or_buffer.name] = extra_info_dict
+
+    with paddle.pir_utils.IrGuard():
+        main_program = static_net.forward.main_program


如果支持函数，那么这里这样是有问题的，不应该用 static_net.forward.main_program 而是应该用前面构造好的 concrete_program.main_program

SigureMo · 2024-09-09T08:52:14Z

python/paddle/tensorrt/export.py

+    combine_vars = {}
+    combine_program = []


这俩应该都没用

qingqing01 · 2024-09-10T06:25:38Z

python/paddle/tensorrt/export.py

+        optim_input_shape : tuple, optional
+            The shape of the optimal input tensor (default is None).
+        input_data_type : str, optional
+            The data type for the input tensors, such as 'float32' or 'int64' (default is None).


当前仅支持float32、int64种类型吗？int32、bool类型是否支持？下面代码里看，default是float32类型，这里写None不精确。

int，float类型都支持,bool没支持，我改下

qingqing01 · 2024-09-10T06:36:32Z

python/paddle/tensorrt/export.py

+            place = paddle.CUDAPlace(0)
+            exe = paddle.static.Executor(place)
+
+            paddle.static.save_inference_model(


现在统一的静态图存储接口是paddle.jit.save，用这个接口有啥问题？

save_inference_model可以对program进行保存，paddle.jit.save接受的参数是一个Layer

YuanRisheng · 2024-09-10T07:55:23Z

test/tensorrt/test_convert.py

+        return out
+
+
+class TestConvert_loaded_model(unittest.TestCase):


类名规范一点，驼峰式写法

YuanRisheng · 2024-09-10T07:56:24Z

test/tensorrt/test_convert.py

@@ -0,0 +1,173 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.


文件名要不还叫test_export吧，因为我们测试的接口都在export.py里写的，同时也为了和其他单测名字区分开来

… pir_trt_7

YuanRisheng · 2024-09-11T12:00:15Z

python/paddle/tensorrt/export.py

+        trt_output_var = []
+
+        for op in program_with_pir.global_block().ops:
+            if op.name() == "pd_op.fetch":
+                for operand in op.operands():
+                    source = operand.source()
+                    trt_output_var.append(source)


这段代码有逻辑问题，需要将其放到converter.convert_program_to_trt()之后才行

YuanRisheng · 2024-09-12T03:02:27Z

python/paddle/tensorrt/export.py

+        inputs: list,
+        min_subgraph_size: int | None = 3,
+        save_model_dir: str | None = None,
+        disable_ops: str | None = None,


disable_ops应该是一个列表吧

可以是str或者list,如果是一个单独的str，就把它转换成list，如果是list,就保持list

YuanRisheng · 2024-09-12T03:02:45Z

python/paddle/tensorrt/export.py

+            disable_ops : str, optional
+                A string representing the names of operations that should not be entering by TensorRT (default is None).


�注释也改一下

predictor.run跑通

3d4f52c

lizexu123 added 2 commits August 29, 2024 07:17

TensorRTConfig设计,跑通test_converter_model_bert.py

5088172

可以适配多个输入的场景

14787ed

YuanRisheng reviewed Sep 2, 2024

View reviewed changes

lizexu123 added 8 commits September 2, 2024 07:29

trt_op_marker跑两遍

f6185f3

解决冲突

9c2ccd2

删除基础pass,并修改bug

cfae486

消除冲突

39bc3d0

trt接口

6b2e04a

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d11794b

… pir_trt_7

消除core.py

b49b85f

ci超时

94cf5ac

YuanRisheng reviewed Sep 4, 2024

View reviewed changes

lizexu123 added 8 commits September 5, 2024 09:48

fix冲突

115d89f

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

1193a8d

… pir_trt_7

暂存代码

af6b672

pir用户接口设计

3bacbf2

fix 注释

9e34a0b

fix 注释

58deabd

fix

52269b5

delete ununsed code

efb30e8

YuanRisheng reviewed Sep 6, 2024

View reviewed changes

lizexu123 added 6 commits September 7, 2024 07:27

解决了动转静scope找不到var的问题

4101a87

test_converter_model_cumsum.py -> test_converter_export.py

8bb8b5b

增加了动转静单测

2b5c543

删除get_program函数

c673ae5

测试develop合入后

5d5706d

消除冲突+export接口测试跑通

f5d3612

YuanRisheng reviewed Sep 9, 2024

View reviewed changes

SigureMo reviewed Sep 9, 2024

View reviewed changes

lizexu123 added 5 commits September 9, 2024 14:04

提交用户接口文档

acde597

修改接口命名和文档

1fae0e9

删除中文注释

85836b5

test_export改名test_convert

fa81c62

代码检查

4787d2e

qingqing01 reviewed Sep 10, 2024

View reviewed changes

修改input_data的描述

e9b7785

YuanRisheng reviewed Sep 10, 2024

View reviewed changes

lizexu123 added 5 commits September 10, 2024 07:59

修改测试名称以及修改单侧名称

08d3de0

修复test_converter*中没适配warm_up_shape导致的问题

932c741

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

19e6d6c

… pir_trt_7

相对路径下报错，并修改只能load json的问题

802e020

fix

9e08e38

YuanRisheng reviewed Sep 11, 2024

View reviewed changes

lizexu123 added 2 commits September 11, 2024 12:37

修复pd_op.fetch获取的位置

ecaea81

消除冲突

6c0fbe1

YuanRisheng reviewed Sep 12, 2024

View reviewed changes

lizexu123 added 4 commits September 12, 2024 05:13

修改注释

d5571b9

冲突

c17aec4

解决冲突

01deb9d

修改

50932fe

YuanRisheng approved these changes Sep 14, 2024

View reviewed changes

XieYunshen approved these changes Sep 14, 2024

View reviewed changes

vivienfanghuagood approved these changes Sep 14, 2024

View reviewed changes

qingqing01 approved these changes Sep 14, 2024

View reviewed changes

YuanRisheng merged commit 50604b8 into PaddlePaddle:develop Sep 14, 2024
28 of 30 checks passed

		@@ -0,0 +1,275 @@
		# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved

		op.set_bool_attr("__l_trt__", False)


		def converter_trt_program(program, trt_config, scope):



		# return an optimized program with pd_op.tensorrt_engine operations.
		def converter_to_trt(program, trt_config, scope):



		# Obtain a program with tensorrt_op for dynamic-to-static scenarios.
		def export(funtion=None, input_spec=None, config=None, **kwargs):

	def export(funtion=None, input_spec=None, config=None, **kwargs):
	def export(function=None, input_spec=None, config=None, **kwargs):

		return out


		class TestConvert_loaded_model(unittest.TestCase):

		@@ -0,0 +1,173 @@
		# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.

		disable_ops : str, optional
		A string representing the names of operations that should not be entering by TensorRT (default is None).

[Paddle TensorRT] Support inference using predictor.run #67755

[Paddle TensorRT] Support inference using predictor.run #67755

Uh oh!

Conversation

lizexu123 commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Aug 27, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lizexu123 commented Aug 27, 2024 •

edited

Loading