[Paddle TensorRT] Paddle-TensorRT support fp16 and dynamic graph APIs into static graph support predictor #69597

lizexu123 · 2024-11-21T12:09:30Z

PR Category

Inference

PR Types

New features

Description

card-71500
本pr支持了pir-trt float16精度推理，并支持了将某些op强制转换成float32精度推理,本pr支持了动转静场景下可以使用predictor.run，之前使用executor.run，并修改了接口，动转静场景下使用paddle.tensorrt.export.convert，直接加载json或者pdmodel模型使用paddle.tensorrt.export.convert

paddle-bot · 2024-11-21T12:09:37Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… pir-jiekou-5

vivienfanghuagood · 2024-11-29T07:55:18Z

python/paddle/tensorrt/converter.py

@@ -93,6 +102,16 @@ def __init__(self, paddle_program, scope):
            # weights = trt.Weights(weight_array)
            param_dict.update({name: weight_array})
        self.param_dict = param_dict
+        global force_fp32_ops


非必要不要引入全局对象吧，这有点奇怪

done,改为了全局单例

qingqing01 · 2024-11-29T08:02:11Z

python/paddle/tensorrt/export.py

@@ -150,6 +151,8 @@ def __init__(
        min_subgraph_size: int | None = 3,
        save_model_dir: str | None = None,
        disable_ops: str | list | None = None,
+        tensorrt_precision_mode: str | None = "FP32",


命名可以不加tensorrt_前缀

qingqing01 · 2024-11-29T08:10:20Z

python/paddle/tensorrt/export.py

@@ -150,6 +151,8 @@ def __init__(
        min_subgraph_size: int | None = 3,
        save_model_dir: str | None = None,
        disable_ops: str | list | None = None,
+        tensorrt_precision_mode: str | None = "FP32",
+        tensorrt_ops_run_float: set | None = None,


这个是和旧ir的命名保持同步的

qingqing01 · 2024-11-29T08:13:46Z

python/paddle/tensorrt/export.py

+                - "INT32": 32-bit integer precision.
+                - "BFP16": 16-bit Brain Floating Point precision. Only supported in TensorRT versions greater than 9.0.
+                - "INT64": 64-bit integer precision. Only supported in TensorRT versions greater than 10.0.
+            tensorrt_ops_run_float (set, optional):


上面disable_ops是 str|list类型，这里保持一致比较好

已修改为str|list类型

YuanRisheng · 2024-12-01T02:35:37Z

python/paddle/tensorrt/converter.py

+class PrecisionMode(Enum):
+    FP32 = "FP32"
+    FP16 = "FP16"
+    BF16 = "BF16"
+    INT8 = "INT8"
+
+    @staticmethod
+    def from_string(mode_str):
+        mode_map = {
+            "FP32": PrecisionMode.FP32,
+            "FP16": PrecisionMode.FP16,
+            "BF16": PrecisionMode.BF16,
+            "INT8": PrecisionMode.INT8,
+        }
+        mode_upper = mode_str.upper()
+        return mode_map.get(mode_upper, PrecisionMode.FP32)
+
+
+class TensorRTConfigManager:
+    _instance = None
+
+    def __new__(cls, *args, **kwargs):
+        if not cls._instance:
+            cls._instance = super().__new__(cls, *args, **kwargs)
+        return cls._instance
+
+    def _init(self):
+        self.force_fp32_ops = []
+
+    def set_force_fp32_ops(self, ops):
+        if ops is None:
+            self.force_fp32_ops = []
+        elif isinstance(ops, str):
+            self.force_fp32_ops = [ops]
+        elif isinstance(ops, list):
+            self.force_fp32_ops = ops
+        else:
+            raise ValueError("Ops should be a string, list, or None.")
+
+    def get_force_fp32_ops(self):
+        return self.force_fp32_ops
+
+
+# In TensorRT FP16 inference, this function sets the precision of specific
+# operators to FP32, ensuring numerical accuracy for these operations.
+def support_fp32_mix_precision(op_type, layer):
+    trt_manager = TensorRTConfigManager()
+    force_fp32_ops = trt_manager.get_force_fp32_ops()
+    if op_type in force_fp32_ops:
+        layer.reset_precision()
+        layer.precision = trt.DataType.FLOAT


把这些都放到util.py里吧，不然容易造成循环引用

YuanRisheng · 2024-12-01T02:37:19Z

python/paddle/tensorrt/export.py

+            precision_mode (str, optional):
+                Specifies the precision mode for TensorRT optimization. The options are:
+                - "FP32": 32-bit floating point precision (default).
+                - "FP16": 16-bit floating point precision.
+                - "INT8": 8-bit integer precision.
+                - "BFP16": 16-bit Brain Floating Point precision. Only supported in TensorRT versions greater than 9.0.


这里为啥要传字符串，已经有了PrecisionMode，可以直接让用户传这个枚举吧

YuanRisheng · 2024-12-01T02:39:33Z

python/paddle/tensorrt/export.py

        program_with_trt = convert_to_trt(main_program, config, scope)
        return program_with_trt, scope


 # Obtain a program with tensorrt_op by directly loading the model.
-def convert_loaded_model(model_dir, config):
+def convert(model_dir, config):


把这个接口名字也添加到__init__.py下吧，方便用户直接通过paddle.tensorrt.convert进行调用，你在单测里也最好使用paddle.tensorrt.convert来试一下看能否用到

在test_export.py这样测试了，可以用

YuanRisheng · 2024-12-02T03:34:38Z

python/paddle/tensorrt/export.py

@@ -150,6 +151,8 @@ def __init__(
        min_subgraph_size: int | None = 3,
        save_model_dir: str | None = None,
        disable_ops: str | list | None = None,
+        precision_mode: str | None = "FP32",


这里怎么还是str？

YuanRisheng · 2024-12-02T03:35:41Z

python/paddle/tensorrt/export.py

            disable_ops : (str|list, optional):
                A string representing the names of operations that should not be entering by TensorRT (default is None).
+            precision_mode (str, optional):


同上，注意细节

YuanRisheng · 2024-12-02T03:36:33Z

python/paddle/tensorrt/converter.py

+            precision_mode = PrecisionMode.from_string(
+                self.trt_config.precision_mode
+            )


都直接传入PrecisionMode了，这里from_string还有必要吗

YuanRisheng · 2024-12-02T03:36:47Z

python/paddle/tensorrt/util.py

+    def from_string(mode_str):
+        mode_map = {
+            "FP32": PrecisionMode.FP32,
+            "FP16": PrecisionMode.FP16,
+            "BF16": PrecisionMode.BF16,
+            "INT8": PrecisionMode.INT8,
+        }
+        mode_upper = mode_str.upper()
+        if mode_upper not in mode_map:
+            _logger.warning(
+                f"Unsupported precision mode: '{mode_str}'. Defaulting to FP32."
+            )
+        return mode_map.get(mode_upper, PrecisionMode.FP32)


这个还有必要吗

… pir-jiekou-5

fp16

e22a165

lizexu123 added 5 commits November 27, 2024 12:55

support fp16

1da745b

fix

5e7bc6d

fix

b2e6fc6

fix

1700c42

fix

d78f8b7

lizexu123 changed the title ~~[Paddle Tensorrt] fp16~~ [Paddle TensorRT] Paddle-TensorRT support fp16 Nov 27, 2024

lizexu123 added 4 commits November 28, 2024 02:56

merge develop

c780ea7

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

903be7e

… pir-jiekou-5

merge develop

8239396

fix

21c21e7

lizexu123 changed the title ~~[Paddle TensorRT] Paddle-TensorRT support fp16~~ [Paddle TensorRT] Paddle-TensorRT support fp16 and dynamic graph APIs into static graph support predictor Nov 29, 2024

vivienfanghuagood reviewed Nov 29, 2024

View reviewed changes

qingqing01 reviewed Nov 29, 2024

View reviewed changes

fix

4a2235c

YuanRisheng reviewed Dec 1, 2024

View reviewed changes

lizexu123 added 2 commits December 1, 2024 11:54

fix

2d119fd

fix

53aa2e0

yuanlehome previously approved these changes Dec 2, 2024

View reviewed changes

fix

1b95ae2

lizexu123 dismissed yuanlehome’s stale review via 1b95ae2 December 2, 2024 03:01

YuanRisheng reviewed Dec 2, 2024

View reviewed changes

lizexu123 added 7 commits December 2, 2024 07:41

merge develop

e25b9ec

fox

62eadd0

fix

258cc01

fix

73e3de6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

1b24b6e

… pir-jiekou-5

fix

1957bf2

fix

f44e5ea

fix

e2f416e

YuanRisheng previously approved these changes Dec 3, 2024

View reviewed changes

暴露PrecisionMode

0cef4b3

lizexu123 dismissed YuanRisheng’s stale review via 0cef4b3 December 3, 2024 09:57

lizexu123 added 3 commits December 3, 2024 12:43

fix

3a29385

fix

46d80fb

fix

92e96e7

vivienfanghuagood approved these changes Dec 5, 2024

View reviewed changes

qingqing01 approved these changes Dec 5, 2024

View reviewed changes

lizexu123 merged commit d4e1784 into PaddlePaddle:develop Dec 5, 2024
28 checks passed

[Paddle TensorRT] Paddle-TensorRT support fp16 and dynamic graph APIs into static graph support predictor #69597

[Paddle TensorRT] Paddle-TensorRT support fp16 and dynamic graph APIs into static graph support predictor #69597

Uh oh!

Conversation

lizexu123 commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Nov 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YuanRisheng Dec 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lizexu123 commented Nov 21, 2024 •

edited

Loading

YuanRisheng Dec 1, 2024 •

edited

Loading