API Improvement: fix paddle.median 易用性提升 #64444

NKNaN · 2024-05-20T03:52:19Z

PR Category

User Experience

PR Types

Bug fixes

Description

修复 paddle.median 在min分支下不支持输入为除浮点类型以外的类型。
由于paddle.topk不支持bool类型，如果不单独处理bool类型输入，avg和min分支都不能支持，是否需要对bool类型添加支持？

paddle-bot · 2024-05-20T03:52:24Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhwesky2010 · 2024-05-21T12:02:52Z

python/paddle/tensor/stat.py

@@ -463,6 +463,28 @@ def median(x, axis=None, keepdim=False, mode='avg', name=None):
            >>> print(median_indices)
            Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
            [1, 1, 1])
+
+            >>> # cases containing nan values
+            >>> x = paddle.to_tensor(np.array([[1,2,3,float('nan')],[1,2,3,4],[float('nan'),1,2,3]])


这个x不是float64的Tensor吗

是的，这里是想加一下如果输入里有nan的例子

zhwesky2010 · 2024-05-21T12:09:17Z

test/legacy_test/test_median.py

@@ -230,6 +289,47 @@ def test_index_odd_case(self):
        np.testing.assert_allclose(out.numpy(), [4.0, 14.0, 24.0])
        np.testing.assert_equal(index.numpy(), [4, 4, 4])

+    def test_nan(self):


不能运行的case不是这个int32/int64的case吗，这个和nan的关系是？另外这里也没有看到有测int32/int64

int32/int64添加在上面 test_median_static 和 test_median_dygraph 中测了。

int32/int64不能运行的是因为我之前添加 min 分支的时候加在了处理 nan 的这部分之前，在没有nan的情况下int32/int64输入在这里就会出错，out_tensor 是 int32/int64 类型而后面的 sum 是cast成了float64

现在的修改是让 min 和 avg 分支分别处理 nan 的情况：avg 保持之前的处理逻辑，输入是float32时输出是float32，其他情况输出是float64；min 在这个地方改了一下，cast的dtype改成了x.dtype，让输入输出的数据类型保持一致，同时如果要输出index的话也加了对index的相应处理。

zhwesky2010 · 2024-05-22T07:35:27Z

python/paddle/tensor/stat.py

@@ -521,6 +543,11 @@ def median(x, axis=None, keepdim=False, mode='avg', name=None):
                ),


第525行dtype的设置，这个设置其实不太合理，仅放在avg分支下吧，不影响min的分支

zhwesky2010 · 2024-05-22T07:39:25Z

python/paddle/tensor/stat.py

@@ -538,12 +565,29 @@ def median(x, axis=None, keepdim=False, mode='avg', name=None):
                out_idx = paddle.slice(
                    idx, axes=[axis], starts=[kth], ends=[kth + 1]
                )
+        # if contain nan on axis, return nan for that axis
+        out_tensor = out_tensor + paddle.sum(


最后这一个 astype(x.dtype) 不需要吧

paddle.sum在输入是int32时输出会变成int64，最后这个 astype(x.dtype)是针对int32这种情况
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/sum_cn.html

zhwesky2010 · 2024-05-22T07:41:22Z

python/paddle/tensor/stat.py

+            axis=axis,
+            keepdim=True,
+        ).astype(x.dtype)
+        if need_idx:


关于nan的问题就先不大改吧，按之前的逻辑来，主要是适配dtype的影响

好的，这里的 if need_idx 分支是对输入有nan且需要输出index的情况处理，需要删掉吗？如果要删掉的话就是输入有nan的时候不输出index这样？目前torch的median输入有nan的时候会输出index，之前添加min分支的时候没有考虑这个情况，所以这里想补一下

那就补上吧

zhwesky2010 · 2024-05-22T07:43:01Z

test/legacy_test/test_median.py

@@ -164,6 +182,40 @@ def test_median_exception(self):
        self.assertRaises(ValueError, paddle.median, x, 2, False, 'max')
        self.assertRaises(ValueError, paddle.median, paddle.to_tensor([]))

+    def test_nan(self):


单独专门测一下int32/int64吧，关于nan的问题先不用深究了，保持之前的逻辑就行

zhwesky2010 · 2024-05-23T04:10:40Z

第525行dtype的设置为fp32/fp64，这个设置其实不太合理，仅放在avg分支下吧。这个改一下

zhwesky2010 · 2024-05-23T04:11:07Z

PR-CI-Static-Check有示例代码错误的问题，要修一下

zhwesky2010 · 2024-05-23T04:28:21Z

python/paddle/tensor/stat.py

+            index_along_axis = paddle.argsort(
+                x_all_zero, axis=axis, stable=True
+            )
+            nan_index = paddle.sum(


如果有多个nan，取paddle.sum好像也会出问题吧。多个nan应该按第一个nan的坐标来计算

改了一下，多个nan按第一个nan的坐标计算

NKNaN · 2024-05-24T05:49:06Z

第525行dtype的设置为fp32/fp64，这个设置其实不太合理，仅放在avg分支下吧。这个改一下

已修改

PR-CI-Static-Check有示例代码错误的问题，要修一下

已修改

zhwesky2010 · 2024-05-27T11:20:41Z

python/paddle/tensor/stat.py

+                index_along_axis * x_isnan, axis=axis, keepdim=True
+            )
+            nan_index_mask = paddle.sum(x_isnan, axis=axis, keepdim=True)
+            out_idx = (


可以简化下写法：

out_idx = out_idx * paddle.logical_not(nan_index_mask) + nan_index

zhwesky2010 · 2024-05-28T03:53:43Z

python/paddle/tensor/stat.py

+                index_along_axis * x_isnan, axis=axis, keepdim=True
+            )
+            nan_index_mask = paddle.sum(x_isnan, axis=axis, keepdim=True)
+            out_idx = out_idx * paddle.logical_not(nan_index_mask) + nan_index


out_idx = out_idx * paddle.logical_not(nan_index_mask).astype('int64') + nan_index

zhwesky2010

LGTM

luotao1 · 2024-05-29T03:27:00Z

@NKNaN 需要提交对应的中文文档

paddle-bot bot added the contributor External developers label May 20, 2024

NKNaN force-pushed the median-fix branch from 3a481a6 to aa2d3aa Compare May 20, 2024 10:13

NKNaN added 4 commits May 20, 2024 21:52

fix median min dtype

a5c540e

fix median min dtype

e8e93cd

fix test

df58412

fix test

ac87e05

NKNaN force-pushed the median-fix branch from aa2d3aa to ac87e05 Compare May 20, 2024 13:53

NKNaN added 3 commits May 21, 2024 10:31

fix idx calculation branch

658bd7c

fix code example

24e9936

fix code example

2146178

zhwesky2010 reviewed May 21, 2024

View reviewed changes

zhwesky2010 reviewed May 22, 2024

View reviewed changes

zhwesky2010 mentioned this pull request May 22, 2024

[Not Merge] fix cast error of paddle.median #64489

Closed

zhwesky2010 reviewed May 23, 2024

View reviewed changes

update

0cb37fe

update docs

e72dda9

zhwesky2010 reviewed May 27, 2024

View reviewed changes

update

7d6c4e7

zhwesky2010 reviewed May 28, 2024

View reviewed changes

update

b533318

zhwesky2010 approved these changes May 28, 2024

View reviewed changes

luotao1 merged commit 1b91822 into PaddlePaddle:develop May 29, 2024

NKNaN mentioned this pull request May 29, 2024

API Improvement: fix paddle.median 易用性提升 PaddlePaddle/docs#6662

Merged

		@@ -521,6 +543,11 @@ def median(x, axis=None, keepdim=False, mode='avg', name=None):
		),

API Improvement: fix paddle.median 易用性提升 #64444

API Improvement: fix paddle.median 易用性提升 #64444

Uh oh!

Conversation

NKNaN commented May 20, 2024

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented May 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NKNaN May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 commented May 23, 2024

Uh oh!

zhwesky2010 commented May 23, 2024

Uh oh!

zhwesky2010 May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NKNaN commented May 24, 2024

Uh oh!

zhwesky2010 May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 left a comment

Choose a reason for hiding this comment

Uh oh!

luotao1 commented May 29, 2024

Uh oh!

Uh oh!

NKNaN May 22, 2024 •

edited

Loading

zhwesky2010 May 22, 2024 •

edited

Loading

zhwesky2010 May 23, 2024 •

edited

Loading

zhwesky2010 May 27, 2024 •

edited

Loading