CARVIEW |
Select Language
HTTP/2 200
date: Fri, 25 Jul 2025 06:12:41 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=240.161937,conversation_content-fragment;desc="conversation_content fragment";dur=667.871216,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=293.5851,nginx;desc="NGINX";dur=1.070038,glb;desc="GLB";dur=101.716028
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: a2eb102
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=GH7sNTLab4hZyYI4TyItKUgBWl82biB1T9PEHDRzBmkoRc3mAC8qRfh%2BM1OJ2Iq3XRGWZLoZM5UECjWrttWz2LWh8udja2%2Fesa85%2BcIbPYyFRGV4CiTz4wnN6MECsAaKcQ2IyYbVoonaPeneA5kBP9odbVjjw3Q0tvYUxqXBb3WGmwN4KNkKc8EibtLQxedpUkB8eMECmiUSLYWIRGkY3YqPXXG8VXcsmOyj4DTKzjhqYCI5sBOoE267KYJRd9utwKFLRHAJhmQdmFCe%2B33PTA%3D%3D--wRBhBFzjWNx0YCC5--nDmImHhtzslarToRw%2BT0Tg%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1120072435.1753423960; Path=/; Domain=github.com; Expires=Sat, 25 Jul 2026 06:12:40 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sat, 25 Jul 2026 06:12:40 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: A9FE:274BD1:314AC3:3E9E19:68832058
Enhance FusedROPE with MQA and GQA by MarioLulab · Pull Request #60979 · PaddlePaddle/Paddle · GitHub
MarioLulab
changed the title
Enhance MQA and GQA for FusedROPE
Enhance FusedROPE with MQA and GQA
Jan 19, 2024
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Enhance FusedROPE with MQA and GQA #60979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
zhangting2020
merged 12 commits into
PaddlePaddle:develop
from
MarioLulab:luqi/enhance_fused_rope
Jan 29, 2024
Merged
Enhance FusedROPE with MQA and GQA #60979
zhangting2020
merged 12 commits into
PaddlePaddle:develop
from
MarioLulab:luqi/enhance_fused_rope
Jan 29, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献! |
llama-2模型中验证过吗? |
This comment was marked as resolved.
This comment was marked as resolved.
zhangting2020
approved these changes
Jan 29, 2024
Galaxy1458
approved these changes
Jan 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
PR types
Others
PR changes
Others
Description
主要工作:
在
fused_rotary_position_embedding
算子层面支持 MQA 和 GQA,具体的方法是若 q 的 num_heads 与 k, v 的不相等,则开启 MQA 和 GQA 的功能,分别调用三次 CUDA kernel。修改
fused_rotary_position_embedding
算子的单测,并增加 MQA 和 GQA 的单测。发现在当前 develop 分支下,fused_rotary_position_embedding 的 fp32 单测存在微小精度误差,在 num_heads >= 4 时 (如 q.shape=[2, 8, 4, 16], batch_size, seqlen, num_heads, head_dim) 相对误差约为 1.1436 * 1e-5(正常情况下 fp32 单测的 rtol 应为 1e-05)。当前的规避办法是修改单测TestFusedRotaryPositionEmbedding
在 GQA 策略下的输入 shape,使单测的相对误差满足 1e-05 的阈值k, v 都是可选的。目前已支持以下几种输入场景都正确:
Pcard-79849