【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 -part #63346

AndPuQing · 2024-04-09T07:00:22Z

PR Category

User Experience

PR Types

New features

Description

黑客松 25 题拆分PR

…linalg.py

paddle-bot · 2024-04-09T07:00:26Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

AndPuQing · 2024-04-09T07:16:46Z

关于Weight参数的兼容性，是需要参考这篇文档中, 向op_version添加一个add_input吗

zhwesky2010 · 2024-04-10T09:17:22Z

@AndPuQing 如果是在原API的最后面添加默认参数，且默认值也不会改变原有行为的，那么就不会有不兼容问题，无需修改其他位置。

因此你在修改的过程中，需要进行兼容性的设计，确保之前的代码运行时不会失败（语法失败、结果改变）。

AndPuQing · 2024-04-12T07:00:33Z

我将Weight放置在后面，现有的单侧会出现错误

不知道是否可以修改现有单测？

… no23-histogram

zhwesky2010 · 2024-04-17T07:09:05Z

paddle/phi/api/yaml/ops.yaml

@@ -1351,8 +1351,9 @@
  backward : heaviside_grad

 - op : histogram
-  args : (Tensor input, int64_t bins = 100, int min = 0, int max = 0)
+  args : (Tensor input, Tensor weight, int64_t bins = 100, int min = 0, int max = 0, bool density = false)


这个weight并没有放到最后面，你这个是不兼容的修改方式

zhwesky2010 · 2024-04-17T07:10:13Z

python/paddle/tensor/linalg.py

@@ -2190,21 +2190,26 @@ def bmm(x, y, name=None):
        return out


-def histogram(input, bins=100, min=0, max=0, name=None):
+def histogram(


你这个weight要放到最后面，进行兼容性的修改。

一个接口前面突然插入一个参数，导致所有参数位置发生改变，是一种不兼容的修改。

比如以前某行代码调用时没有指定关键字：paddle.histogram(inputs, 4, 0, 3)，前面突然插个参数就会直接挂了，放后面就没有这个问题

zhwesky2010 · 2024-04-17T08:44:32Z

关于Weight参数的兼容性，是需要参考这篇文档中, 向op_version添加一个add_input吗

是的，需要修改op_version.yaml

paddle-ci-bot · 2024-04-20T03:14:44Z

Sorry to inform you that 4e8b5ac's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… no23-histogram

zhwesky2010 · 2024-06-05T08:10:26Z

python/paddle/tensor/linalg.py

    else:
        helper = LayerHelper('histogram', **locals())
        check_variable_and_dtype(
            input, 'X', ['int32', 'int64', 'float32', 'float64'], 'histogram'
        )
-        out = helper.create_variable_for_type_inference(VarDesc.VarType.INT64)
+
+        if density or weight:


这个是不是 if weight:

paddle/phi/ops/yaml/op_version.yaml

test/legacy_test/test_histogram_op.py

paddle-ci-bot · 2024-06-12T03:16:23Z

Sorry to inform you that ef946d9's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… no23-histogram

zhwesky2010 · 2024-06-19T04:17:50Z

python/paddle/tensor/linalg.py

@@ -2205,10 +2207,13 @@ def histogram(input, bins=100, min=0, max=0, name=None):
        bins (int, optional): number of histogram bins. Default: 100.
        min (int, optional): lower end of the range (inclusive). Default: 0.
        max (int, optional): upper end of the range (inclusive). Default: 0.
+        weight (Tensor, optional): If provided, it must have the same shape as input.


这里除了需要写weight的形状外，还需要写weight的功能

python/paddle/tensor/linalg.py

zhwesky2010 · 2024-06-19T04:30:49Z

paddle/phi/kernels/cpu/histogram_kernel.cc

+      float gap = static_cast<float>(nbins) /
+                  static_cast<float>((output_max - output_min)) / *sum_data;
+      for (int64_t i = 0; i < nbins; i++) {
+        out_data[i] *= gap;


这个公式不是直接除以sum_data吗，上面这个gap的写法不太好看出公式，优化下写法

这里我是提前除以sum_data，然后就不用在每个for 循环中多一个除操作

这里我是提前除以sum_data，然后就不用在每个for 循环中多一个除操作

这里再乘以 static_cast<float>(nbins) / static_cast<float>((output_max - output_min)) 是因为？公式没看明白

因为连续型概率分布密度函数 (probability density function) 需要保证：

$$ \int _{-\infty}^{+\infty} f(x) , dx =1 $$

即密度函数与坐标轴围成面积为 $1$。当然在工程实践上需要离散化上述公式，即确保：

$$ \sum^{n}f(x) \Delta x= 1 $$

这里的 $\Delta x$ 是指 histogram 每个小矩形的宽度。

有别于在离散型随机变量上定义的概率质量函数 (probability mass function)：

$$ f(x)=\begin{cases} P {X=x_{k}}=p_{k} & \text { 当 } x=x_{k}(k=1,2,3, \cdots) \text { 时 } \\ 0 & \text { 其他 } \end{cases} $$

二者的区别是，概率质量函数每个点表示该点的概率值，而分布密度函数得在区间做积分才能得出概率。

所以在计算时，在得出每个区间的 $p(x)$ 后需要标准化一下，

$$ f(x) = \frac{p(x)}{\Delta x} = \frac{N_{x}}{N \Delta x} $$

所以代码就有

float* sum_data = sum.data<float>(); float gap = static_cast<float>((output_max - output_min)) / static_cast<float>(nbins); for (int64_t i = 0; i < nbins; i++) { out_data[i] /= (gap * *sum_data) }

考虑到除法相比于乘法更慢，所以变形了一下。

float* sum_data = sum.data<float>(); float gap = static_cast<float>(nbins) / static_cast<float>((output_max - output_min)) / *sum_data; for (int64_t i = 0; i < nbins; i++) { out_data[i] *= gap; }

zhwesky2010

麻烦修改一下

zhwesky2010 · 2024-06-20T03:59:11Z

paddle/phi/kernels/cpu/histogram_kernel.cc

+      float gap = static_cast<float>(nbins) /
+                  static_cast<float>((output_max - output_min)) / *sum_data;
+      for (int64_t i = 0; i < nbins; i++) {
+        out_data[i] *= gap;


这里我是提前除以sum_data，然后就不用在每个for 循环中多一个除操作

这里再乘以 static_cast<float>(nbins) / static_cast<float>((output_max - output_min)) 是因为？公式没看明白

python/paddle/tensor/linalg.py

… no23-histogram

zhwesky2010 · 2024-06-27T08:38:10Z

paddle/phi/infermeta/unary.cc

@@ -2072,8 +2072,30 @@ void HashInferMeta(const MetaTensor& x,
  out->set_dtype(x.dtype());
 }

-void HistogramInferMeta(
-    const MetaTensor& input, int64_t bins, int min, int max, MetaTensor* out) {
+void HashInferMeta(const MetaTensor& x,


这里的修改是不是commit弄错了，将其他commit内容弄过来了

zhwesky2010 · 2024-06-27T09:12:07Z

paddle/phi/kernels/cpu/histogram_kernel.cc

+      }
+    }
+    if (density) {
+      DenseTensor sum = phi::Sum<float, Context>(


对output所有元素求和了之后得到sum，再output逐个除以sum，得到的值不是概率密度吗

这个输出结果和pytorch的结果进行了对比检查没

zhwesky2010

LGTM

先合入，后面我们再通过Pytorch代码转换验证一遍结果。

luotao1 · 2024-07-02T02:55:46Z

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/11000308/job/26709448

2024-07-01 16:39:44 ++ exit 5\n++ '[' -d /paddle/build/pr_whl ']'
2024-07-01 16:39:44 ++ pip install /paddle/build/pr_whl/paddlepaddle_gpu-0.0.0-cp310-cp310-linux_x86_64.whl
2024-07-01 16:39:44 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
2024-07-01 16:39:44 ++ python -c 'import paddle;print(paddle.__version__);paddle.version.show()'
2024-07-01 16:39:44 ++ cd /paddle/tools
2024-07-01 16:39:44 ++ '[' cpu = cpu ']'
2024-07-01 16:39:44 ++ python sampcd_processor.py --mode cpu
2024-07-01 16:39:44 ----------------Check results--------------------
2024-07-01 16:39:44 >>> Sample code test capacity: {'cpu'}
2024-07-01 16:39:44 >>> 1 sample codes ran failed in env: {'cpu'}
2024-07-01 16:39:44 <DocTest(<modname?> paddle.Tensor.histogram_bin_edges:1:0 ln 1)>, running time: 0.899s
2024-07-01 16:39:44 >>> Mistakes found in sample codes in env: {'cpu'}!
2024-07-01 16:39:44 >>> Please recheck the sample codes.
2024-07-01 16:39:44 ----------------End of the Check--------------------

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/10999911/job/26708825

It is recommended to use 'np.testing.assert_allclose' and 'np.testing.assert_array_equal' instead of 'self.assertTrue(np.allclose(...))' and 'self.assertTrue(np.array_equal(...))'.
2024-07-01 15:34:22 Please modify the code below. If anything is unclear, please read the specification [ https://github.com/PaddlePaddle/community/blob/master/rfcs/CodeStyle/20220805_code_style_improvement_for_unittest.md#background ]. If it is a mismatch, please request SigureMo (Recommend) or luotao1 or Aurelius84 review and approve.
2024-07-01 15:34:22 The code that do not meet the specification are as follows:
2024-07-01 15:34:22 +        self.assertTrue(np.allclose(self.out, out))

看下覆盖率没过的原因，如果本地能过，请贴截图

zyfncg · 2024-07-04T02:43:23Z

paddle/phi/infermeta/unary.h


-void HistogramInferMeta(
-    const MetaTensor& input, int64_t bins, int min, int max, MetaTensor* out);
+void HistogramInferMeta(const MetaTensor& input,
+                        const MetaTensor& weight,
+                        int64_t bins,
+                        int min,
+                        int max,
+                        bool density,
+                        MetaTensor* out);


新增参数之后属于binary类型算子，需要换个文件放了

python/paddle/tensor/linalg.py

SigureMo · 2024-07-04T07:03:49Z

python/paddle/tensor/linalg.py

+    min: float = 0,
+    max: float = 0,


这两个到底是 float 还是 int？为什么文档和类型提示不一致？

python/paddle/tensor/linalg.py

SigureMo · 2024-07-04T11:34:22Z

paddle/phi/infermeta/unary.cc

实现也要移动吧？

SigureMo

LGTMeow for new API annotations

luotao1

请修改对应的中文文档

sunzhongkai588

LGTM

AndPuQing added 3 commits April 9, 2024 06:52

Refactor histogram function to support weight and density in linalg.py

8cce546

Add histogram_bin_edges function to support bin edges computation in …

ab9598f

…linalg.py

Add class to test histogram weight and density

4629f9e

paddle-bot bot added the contributor External developers label Apr 9, 2024

AndPuQing mentioned this pull request Apr 9, 2024

【Hackathon 6th No.25】为 paddle.histogram/paddle.nn.functional.threshold 进行功能对齐与功能增强 #63044

Closed

luotao1 assigned luotao1 and zhwesky2010 Apr 9, 2024

luotao1 added the PaddlePaddle Hackathon label Apr 9, 2024

luotao1 changed the title ~~【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强~~ 【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强 -part Apr 9, 2024

luotao1 mentioned this pull request Apr 10, 2024

【Hackathon 6th】开源贡献个人挑战赛 #62905

Closed

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4e8b5ac

… no23-histogram

zhwesky2010 reviewed Apr 17, 2024

View reviewed changes

luotao1 added the API label Apr 24, 2024

AndPuQing added 3 commits June 3, 2024 06:02

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0f41a8d

… no23-histogram

Modify parameter position

cae04a1

add histogram_bin_edges unit test

ef946d9

zhwesky2010 reviewed Jun 5, 2024

View reviewed changes

AndPuQing added 3 commits June 16, 2024 09:29

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d50af02

… no23-histogram

add test case

2450401

fix doc

d198672

zhwesky2010 reviewed Jun 19, 2024

View reviewed changes

zhwesky2010 reviewed Jun 20, 2024

View reviewed changes

add test case

cd71cb8

AndPuQing added 2 commits June 26, 2024 09:59

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5a725c1

… no23-histogram

fix doc

ea993dc

zhwesky2010 reviewed Jun 27, 2024

View reviewed changes

remove merge mistake

8744860

zhwesky2010 previously approved these changes Jul 1, 2024

View reviewed changes

luotao1 changed the title ~~【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强 -part~~ 【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强 Jul 1, 2024

fix doc case

a27f482

AndPuQing dismissed zhwesky2010’s stale review via a27f482 July 2, 2024 09:41

luotao1 assigned zyfncg and sunzhongkai588 Jul 3, 2024

zyfncg reviewed Jul 4, 2024

View reviewed changes

sunzhongkai588 reviewed Jul 4, 2024

View reviewed changes

python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved

python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved

python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved

sunzhongkai588 added 3 commits July 4, 2024 14:56

Update python/paddle/tensor/linalg.py

6f5a298

Update python/paddle/tensor/linalg.py

f78291e

Update python/paddle/tensor/linalg.py

f20b1f5

SigureMo changed the title ~~【Hackathon 6th No.25】为 paddle.histogram 进行功能对齐与功能增强~~ 【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 Jul 4, 2024

SigureMo reviewed Jul 4, 2024

View reviewed changes

SigureMo and others added 3 commits July 4, 2024 15:04

Apply suggestions from code review

14acb6f

fix typing

a7cb943

move to binary

22a348c

SigureMo reviewed Jul 4, 2024

View reviewed changes

paddle/phi/infermeta/unary.cc

Copy link

Member

SigureMo Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实现也要移动吧？

move histogramInferMeta to binary

fade646

SigureMo approved these changes Jul 4, 2024

View reviewed changes

luotao1 changed the title ~~【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强~~ 【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 -part Jul 5, 2024

luotao1 reviewed Jul 5, 2024

View reviewed changes

sunzhongkai588 approved these changes Jul 5, 2024

View reviewed changes

luotao1 merged commit 0c1eed4 into PaddlePaddle:develop Jul 5, 2024
31 of 32 checks passed

AndPuQing mentioned this pull request Jul 11, 2024

【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 PaddlePaddle/docs#6760

Merged

AndPuQing deleted the no23-histogram branch July 12, 2024 09:05

【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 -part #63346

【Hackathon 6th No.25】[Typing] 为 paddle.histogram 进行功能对齐与功能增强 -part #63346

Uh oh!

Conversation

AndPuQing commented Apr 9, 2024

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Apr 9, 2024

Uh oh!

AndPuQing commented Apr 9, 2024

Uh oh!

zhwesky2010 commented Apr 10, 2024

Uh oh!

AndPuQing commented Apr 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhwesky2010 Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 commented Apr 17, 2024

Uh oh!

paddle-ci-bot bot commented Apr 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paddle-ci-bot bot commented Jun 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 left a comment

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 left a comment

Choose a reason for hiding this comment

Uh oh!

luotao1 commented Jul 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SigureMo Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndPuQing commented Apr 12, 2024 •

edited

Loading

zhwesky2010 Apr 17, 2024 •

edited

Loading

zhwesky2010 Apr 17, 2024 •

edited

Loading

zhwesky2010 Jun 20, 2024 •

edited

Loading

zhwesky2010 Jun 20, 2024 •

edited

Loading

zhwesky2010 Jun 27, 2024 •

edited

Loading

SigureMo Jul 4, 2024 •

edited

Loading