CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 21:02:02 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=316.222456,conversation_content-fragment;desc="conversation_content fragment";dur=507.401053,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=312.979859,nginx;desc="NGINX";dur=1.232253,glb;desc="GLB";dur=101.662193
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: fd8fbbc
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=RCv4NT0COFxt1hB9yNftHxLq7MsZvtE4S1h5jRih7vqjm%2F%2FIdPmnZ%2FwN08GyY21HJe%2FrmKmhxAinBAeozTc9%2F9VhbbkyrHOMz7QwxDVtyk0p1KAjVOyuU2JywOpTsr91mJnJxYFdF0TO7j0WdIjJPTAImmqK112%2BvuUpl2J62vdyOhamN%2F9b8VOoqvN4SdbhHEx6IYN27mN42aLDWhvWsEE640qrRauPb1WLFHCUUkv5O7VRdmHfjN904KrKV%2Fy5OLgLxPSi84Zn7uw3PPLCxQ%3D%3D--dFOGU32o2M2gDEsb--3mpNV8pm2jwGRATtdg0Xbg%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.816422886.1753304522; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 21:02:02 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 21:02:02 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: D8F2:351DD4:10B92C3:13EBEE3:68814DCA
[Dy2St] Cleanup no need buffer inputs in grad node by SigureMo · Pull Request #69043 · PaddlePaddle/Paddle · GitHub
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[Dy2St] Cleanup no need buffer inputs in grad node #69043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
SigureMo
merged 3 commits into
PaddlePaddle:develop
from
cattidea:dy2st/cleanup-no-need-buffer-inputs-in-grad-node
Oct 30, 2024
Merged
[Dy2St] Cleanup no need buffer inputs in grad node #69043
SigureMo
merged 3 commits into
PaddlePaddle:develop
from
cattidea:dy2st/cleanup-no-need-buffer-inputs-in-grad-node
Oct 30, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献! |
wanghuancoder
approved these changes
Oct 30, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,描述写的太给力了!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
PR Category
Execute Infrastructure
PR Types
Bug fixes
Description
动转静 grad node 会 hold 住前向输入,但对于 no need buffer 的 Tensor 而言,只需要 meta,不需要 holder,因此对于这些 no need buffer Tensor,反向 grad node hold 的 Tensor 不再是原来的 Tensor,而是 copy 后不持有 holder 的 Tensor(holder 被 Move 走了)
如图所示,图中只表示了反向 no need buffer 的 Tensor,长度表示生命周期,
x -> y
表示 x 持有 y 的引用,此时 y 的生命周期必然大于等于 x 的生命周期由于这些 Tensor 还没释放(受 Python 端调度,Python 端 PyObject 引用计数到 0),就被设置到反向 GradNode,所以直到反向结束才真正释放
因此本 PR 对于这些 no need buffer Tensor,copy 了一个 Tensor,持有一个 copy 的 DenseTensor,没有 holder,这就确保了反向持有的是有 meta 但没 holder 的 Tensor
PT 同样有该问题,但没暴露,是因为 PT 的输入会多一个 cast,导致 no need buffer 的是输入 cast 后的 value 而不是输入,不是输入是没有这个问题的
修复前后显存如下
Max allocated 已经明显低于 PT(PT 没有修这个问题)
不过值得注意的是,因为 x 是 ad func 的输入,受到 Python 端 GC 调度,ad func 是不能擅自删掉它的,否则就可能导致后面使用 x 时出问题,这会导致动转静下,输入总是在整个子图执行完才 GC,而不能随 OP 执行完释放,导致对比动态图峰值显存会高一些,SOT 因为有多个子图,会有一些中间变量作为子图输入,这个问题会更加凸显一些,但这个问题目前是比较无解的
PCard-66972