CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 11:02:15 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=365.320689,conversation_content-fragment;desc="conversation_content fragment";dur=617.335473,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=339.711465,nginx;desc="NGINX";dur=0.738259,glb;desc="GLB";dur=102.220367
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: fd8fbbc
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=O2u2JaTcTbc8eOTfNK%2FXWTakcuJ1CrhDjCR6Lxp2XXy4%2F3V624uVUgzmXjC9mM4gE12e1y1hau3BZ9ZeEsHOulfl7JjsB%2FuoXRISAmaqB6lhnnBA6Eya%2BRi%2Brfioi0JvQ%2BVHaW49jZk1m7g3pNJ9czFDE%2FaAOGVNZVIzus4uxZQCPSCH1Sza%2F5a72T9%2FRGMUtApRRDTNRAE3zWrJSxiIrGdx9Fg1a4HsGw%2FImS0l0pLdy3Q1BEIe5hKsFFDwLz5UJJFLPW%2F0su240H7TGA5fOw%3D%3D--WkKvhpbiXN3Bhj4C--JrVw3lq4Fxl8EUX5kQoS8w%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.894792796.1753268534; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 11:02:14 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 11:02:14 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: C4F2:394F4C:9E8349:BED6F5:6880C136
[PHI] Optimize Gather kernel with vectorization by lshpku · Pull Request #72225 · PaddlePaddle/Paddle · GitHub
lshpku
force-pushed
the
vectorize-gather-kernel
branch
from
April 13, 2025 15:48
lshpku
force-pushed
the
vectorize-gather-kernel
branch
from
April 14, 2025 12:03
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[PHI] Optimize Gather kernel with vectorization #72225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献! |
91cf113
to
236c493
Compare
236c493
to
5ad51a7
Compare
zyfncg
approved these changes
Apr 14, 2025
YqGe585
pushed a commit
to YqGe585/Paddle
that referenced
this pull request
May 7, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
PR Category
Performance Optimization
PR Types
Performance
Description
使用向量化优化GatherGPUKernel的性能,并将原有的2种Gather实现合并为一个
注:原来的2种实现分别处理高维和低维,我发现没有必要,就合并成一个了,但仍然保留了2种调用接口,因为一些别的Kernel还依赖于被弃用的接口
性能测试
A100,float16,假设index的长度和shape[axis]相同,用时单位为us
由测试结果可知,本PR主要在可向量化的场景下带来较大的性能提升;对于不可向量化的情况也有略微的提升,这是因为优化了下标的计算方式和增大了loop数量
另外,进行了千级的shape覆盖性测试,也检查了部分shape下float32的性能,均无问题
Pcard-85711