CARVIEW |
Select Language
HTTP/2 200
date: Sat, 26 Jul 2025 00:46:12 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=475.820279,conversation_content-fragment;desc="conversation_content fragment";dur=756.455801,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=308.119109,nginx;desc="NGINX";dur=0.606728,glb;desc="GLB";dur=101.534657
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: a2eb102
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=1tLpe%2B9D9FyC54UD5o0MmbCriIdtE771uofS%2BJF%2BqA9AN9IABHUQgyyRvl9nWHh8meO42xrDrEyZZXYZX4IB4yEGYORH9vgdwUjJcauhcM2Okx2xTcJcQufAca1X2QqdYnpkAEnbhdzspaJGuoV3J8dp0anXicscDP2LI9f%2FrusbLtXLQxzASRWyyCM3WoUOQNg3QTxSHED98GUlkQT2zNUBFcXxo1lyKNjhSFg7CWKhO40N%2FARu%2BBcjwpTfoDZNyA3CfcBqQitW%2Fhw7RlBYfw%3D%3D--u9wN%2FOF5%2BHtRMMB5--s9eJfXzT3ribdKVFQuPXlw%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.171789011.1753490771; Path=/; Domain=github.com; Expires=Sun, 26 Jul 2026 00:46:11 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sun, 26 Jul 2026 00:46:11 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: EC90:872F4:642C3:A180F:68842553
[Prim/Comp] Fix place setting to avoid redundant H2D/D2H copy for backward decomposition by HydrogenSulfate · Pull Request #69479 · PaddlePaddle/Paddle · GitHub
HydrogenSulfate
requested review from
xiaoguoguo626807 and
JiabinYang
as code owners
November 18, 2024 12:12
HydrogenSulfate
changed the title
[Prim/Comp] Fix place setting to avoid redundant H2D/D2H copy
[Prim/Comp] Fix place setting to avoid redundant H2D/D2H copy for backward decomposition
Nov 19, 2024
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[Prim/Comp] Fix place setting to avoid redundant H2D/D2H copy for backward decomposition #69479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
HydrogenSulfate
merged 4 commits into
PaddlePaddle:develop
from
HydrogenSulfate:fix_prim_comp_place
Nov 19, 2024
Merged
[Prim/Comp] Fix place setting to avoid redundant H2D/D2H copy for backward decomposition #69479
HydrogenSulfate
merged 4 commits into
PaddlePaddle:develop
from
HydrogenSulfate:fix_prim_comp_place
Nov 19, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献! |
zyfncg
approved these changes
Nov 19, 2024
This was referenced Nov 25, 2024
github-merge-queue bot
pushed a commit
to deepmodeling/deepmd-kit
that referenced
this pull request
Dec 17, 2024
Summary of this PR: 1. upload DPA-1 related code 2. merge much develop code 3. add all eager composite operators except `softmax_grad`, `p_norm_grad`, `split_grad`, and `concat_grad` to the composite operator blacklist(<https://github.com/deepmodeling/deepmd-kit/pull/4414/files#diff-e678abb052b278f8a479f8d13b839a9ec0effd9923478a850bc13758f918e1e9R134-R148>) to significantly improve model execution speed (reducing the time taken from 100% more than PyTorch to about 10% to 15% more). related PR: lanpa/tensorboardX#728 ### Training curve:  ### Accuracy test(left: paddle, right: torch):  Ralated optimization of Paddle framework: - [x] PaddlePaddle/Paddle#69349 - [x] PaddlePaddle/Paddle#69333 - [x] PaddlePaddle/Paddle#69479 - [x] PaddlePaddle/Paddle#69515 - [x] PaddlePaddle/Paddle#69487 - [x] PaddlePaddle/Paddle#69661 - [x] PaddlePaddle/Paddle#69660 - [x] PaddlePaddle/Paddle#69596 - [x] PaddlePaddle/Paddle#69556 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced several new classes for molecular descriptors, including `DescrptDPA1`, `DescrptBlockSeAtten`, and `LayerNorm`, enhancing the modeling capabilities for molecular simulations. - Added new JSON configuration files for model parameters and multitask models related to water simulations. - Implemented new test classes for validating the functionality of the `DPAtomicModel` and various descriptor classes. - Added new test classes for evaluating denoising models, including `TestDenoiseModelDPA1` and `TestDenoiseModelDPA2`. - Enhanced the `ModelWrapper` class to clarify the handling of model parameters and state management. - **Bug Fixes** - Improved internal logic for handling model state saving and loading, ensuring consistency in outputs. - **Documentation** - Enhanced type hints and return annotations across various classes and methods for better clarity. - **Tests** - Expanded the testing framework with new test cases for denoising models and descriptor functionalities, ensuring robust validation of features. - Activated previously skipped tests for energy models, improving test coverage. - Enhanced multitask training tests with new configuration handling and test classes. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
github-merge-queue bot
pushed a commit
to deepmodeling/deepmd-kit
that referenced
this pull request
Dec 25, 2024
Support DPA-2 in paddle backend. This PR will be updated after #4414 is merged. ### Training curve:  ### Accuracy test(left: paddle, right: torch):  Ralated optimization of Paddle framework: - [x] PaddlePaddle/Paddle#69349 - [x] PaddlePaddle/Paddle#69333 - [x] PaddlePaddle/Paddle#69479 - [x] PaddlePaddle/Paddle#69515 - [x] PaddlePaddle/Paddle#69487 - [x] PaddlePaddle/Paddle#69661 - [x] PaddlePaddle/Paddle#69660 - [x] PaddlePaddle/Paddle#69596 - [x] PaddlePaddle/Paddle#69556 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced new classes for molecular descriptors: `DescrptDPA2`, `DescrptBlockRepformers`, `DescrptSeTTebd`, and `DescrptBlockSeTTebd`. - Added new functions for tensor operations and descriptor management, enhancing the capabilities of the module. - Updated JSON configurations for multitask models to refine selection criteria and data paths. - **Bug Fixes** - Improved error handling and parameter validation across various descriptor classes. - **Documentation** - Enhanced test coverage for new descriptor functionalities and configurations. - **Tests** - Added new test classes to validate the functionality of `DescrptDPA2` and multitask training scenarios. - Expanded test capabilities for descriptor classes based on installed dependencies. - Updated existing tests to support new configurations and functionalities. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
PR Category
Performance Optimization
PR Types
Bug fixes
Description
Pcard-75624
问题描述
现有动态图/静态图组合算子大量使用了带有
place
信息的基础算子,如full
、full_scalar
、full_with_tensor
,这些算子末尾的参数place
决定了首先在哪个设备上创建张量。但目前所有的组合算子调用方式中均未手动指定这些张量的place参数,导致全部默认创建在 CPU 上,Paddle/paddle/fluid/eager/backward.cc
Lines 379 to 382 in d774689
继而在
backward.cc
(以动态图为例)的梯度累加过程中,由于累加时会对 source 和 target Tensor 所在设备进行检测并将 target 拷贝到 source 设备上,因此会不断地触发 H2D 或者 D2H 拷贝,拖慢反向运行速度Paddle/paddle/fluid/imperative/gradient_accumulator.cc
Lines 194 to 197 in 7aae194
解决方案
本 PR 修复了所有动态图、静态图组合反向算子里的相关调用,指定了
place
与输入张量保持一致,避免了大量的H2D/D2H拷贝操作。上方为修复后的timeline,下方为修复前的timeline,可以看到修复后基本看不到红色的拷贝操作

上方为修复前的H2D/D2H拷贝耗时情况,下方为修复后的拷贝耗时情况,可以看到修复后原本异常的拷贝耗时,恢复正常

案例测试结果:
DeePMD-kit DPA2(21s -> 18s,提升 14%)
ldc_2d(二阶微分方程+一阶反向,性能提升63%)
TODO:修复静态图前向组合
composite.h
中的place问题,下一个PR修复