CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 10:48:09 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=440.854634,conversation_content-fragment;desc="conversation_content fragment";dur=912.818694,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=365.552429,nginx;desc="NGINX";dur=1.465849,glb;desc="GLB";dur=101.467795
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: fd8fbbc
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=BOOyTNM5lE8LBwXrV2aHglLYBOj8r99aIxCyq1hKak8YLcOQe3uYHx2qD1eyKQdPgCQ8WRFAnimJtIafz%2B5OzDntwJ3NvbReicMO5R2V4%2Fpo9B39sk8hXceO%2BurASAWg%2Bdq9XvvotTlzHkpKqXJYSdIjQsQLOIwiVHmGS91HxsKEBt6c%2Fi6l5%2BwcJg92q58uzDHOHV42lHaeOTwP6GvXj6zs8swdKAi4lekfYIVbUjfLUb7AdNKTYkM7OnJAXUYTXVEWMKShrKE7124lGMpfXQ%3D%3D--EjgXaehs5qBf06bk--5MBcSgBsPM%2FPFUs3Ew2anA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.271799744.1753267688; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 10:48:08 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 10:48:08 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: 8C2C:256CA9:998527:B9656F:6880BDE8
maximum minimum support 0-size by wanghuancoder · Pull Request #71829 · PaddlePaddle/Paddle · GitHub
wanghuancoder
requested review from
xiaoguoguo626807 and
JiabinYang
as code owners
March 24, 2025 03:01
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
maximum minimum support 0-size #71829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
wanghuancoder
merged 15 commits into
PaddlePaddle:develop
from
wanghuancoder:0_size_maximum
Mar 26, 2025
Merged
maximum minimum support 0-size #71829
wanghuancoder
merged 15 commits into
PaddlePaddle:develop
from
wanghuancoder:0_size_maximum
Mar 26, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献! |
xiaoguoguo626807
approved these changes
Mar 26, 2025
XiaoguangHu01
approved these changes
Mar 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
co63oc
pushed a commit
to co63oc/Paddle
that referenced
this pull request
Mar 27, 2025
* maximum minimum support 0-size
YqGe585
pushed a commit
to YqGe585/Paddle
that referenced
this pull request
May 7, 2025
* maximum minimum support 0-size
This was referenced Jun 9, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
PR Category
Execute Infrastructure
PR Types
New features
Description
maximum minimum mean支持0-Size。
修复了OpTest机制,使其支持0-Size的测试。
本PR作为0-Size修改的示例PR,修改历程介绍如下:
a. 在Paddle代码中检索def maximum,发现maximum的核心实现调用的是_C_ops的maximum
b. 以_C_ops的maximum在paddle/phi/ops/yaml中检索,发现maximum的InferMeta函数使用到两个:ElementwiseInferMeta,ElementwiseRawInferMeta
c. 在代码中检索ElementwiseInferMeta,ElementwiseRawInferMeta,并检查其dims(shape)的推导是否正确(在maximum中他的推导是正确的因此不用修改)
d. 如果修改了maximum的InferMeta函数,则必须同时修改maximum的符号推导函数:在paddle/fluid/pir/dialect/operator/interface/infer_symbolic_shape中检索maximum找到对应的函数进行修改(如果有必要修改)。符号推到的知识介绍见:【快乐开源】CINN编译器符号推导扩量 #66444
e. 在paddle/phi/kernels中检索maximum,找全所有maximum的实现Kernel。一般Kernel肯定会有CPU、GPU、XPU版本,还有些Kernel还有有OneDNN、Sparse、SelectedRows版本。务必找全、改全。
f. 对于单输出的Kernel修改方式一般为:
g. dev_ctx.template Alloc是在给out分配一个显存长度为0的Alloction。因为0-Size Tensor虽然没有显存,但是有place(device)信息。在Paddle中,Allocation用于存储Place和显存块信息。因此需要分配一个空的有Place的Allocation。
h. 直接return的原因为,如果不return,很多Kernel在拉起CUDA Kernel时,会因为numel为0导致Kernel拉起失败而崩溃。
i. 注意,有些Kernel有多个output,如果只有其中一个output numel为0,其余output仍然需要运算。需要阅读Kernel后精确修改,确保Kernel能够正确执行。
j. 对于maximum,修改了paddle/phi/kernels/legacy/cpu/elementwise_kernel.cc和paddle/phi/kernels/legacy/xpu/elementwise_kernel.cc,之所以没有修改GPU Kernel,因为没有找到。这是因为GPU、CPU共用了paddle/phi/kernels/legacy/cpu/elementwise_kernel.cc里的实现。因为maximum和minimum处理逻辑一致,因此同时修改maximum和minimum
k. maximum和minimum没有找到OneDNN、Sparse、SelectedRows相关的Kernel实现,因此不做处理
3. 反向修复:
a. 在paddle/phi/ops/yaml中检索maximum_grad,找到InferMeta函数为GeneralBinaryGradInferMeta
b. 根据报错日志,分析大概率是反向Shape推导的问题。但实际分析未发现GeneralBinaryGradInferMeta的问题。调试后发现,其实是MaximumGradKernel在处理前向需要Broadcast的场景时,反向需要Reduce。而Reduce没有考虑0-Size的问题,具体逻辑在paddle/phi/kernels/gpu/elementwise_grad.h中ReduceWrapper中
c. 不需要修改Reduce相关逻辑,只需要对GradKernel做0-Size的特殊处理即可。对于maximum_grad修改方式如下,其它算子可以参考如下代码结合算子反向梯度的数学含义做修改:
d. 依照以上原则修改CPU、GPU、XPU Kernel,没有找到OneDNN、Sparse、SelectedRows相关的Kernel实现,因此不做处理
a. test/legacy_test/test_maximum_op.py是为了验证反向maximum_grad的正确性
b. test/legacy_test/test_elementwise_max_op.py是为了验证前向的正确性,以及在Paddle多种不同机制下是否都支持0-Size,机制如:动态图、静态图、老静态图、CINN、组合算子等
c. 单测本地验证中,暴露了一些新的问题,因此修改了test/legacy_test/op_test.py、python/paddle/incubate/autograd/composite_rules.py(通常不会遇到这种问题,可以不用太关注)
d. 单测本地验证中,暴露了,我们还需要修改组合算子的maximum相关逻辑,因此修改了paddle/fluid/prim/api/composite_backward/composite_backward_api.h和paddle/fluid/primitive/decomp_rule/decomp_vjp/details.h
e. 单测本地验证中,OpTest基类使用到了mean,进而暴露了mean不支持0-Size的问题,因此增加了对mean 0-Size的修改。可以看到:
i. mean的前向处理更复杂一些,因此不同算子的数学含义导致,处理方式有所不同,需要我们逐一分析
ii. mean除了GPU、CPU、XPU,还有OneDNN,因此共同修改
总结大致流程是:
Pcard-67164