CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 19:22:53 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=987.139079,conversation_content-fragment;desc="conversation_content fragment";dur=1462.987231,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=450.763559,nginx;desc="NGINX";dur=0.891858,glb;desc="GLB";dur=102.759806
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: fd8fbbc
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=iORtoDjjM%2F0DA0QGocK5SEnj5B33JWadsXVaUI%2BtLFuc3v0FsBVQUeZI6FN47%2F55AZRo%2BggleP3ArrICyi1Gv73mAUsjzVH83lPHIh69uJKadt1eF6RWwrmwIFe733OimIUjnOudRlPYZoF0LN%2BwZ%2BlkTZwVuaeJYtsrOAps5SB6WmpBQAaHzrJ8GIudwA485ZnxGfFFAOcdKNIONjjMCinW0LT19tWNo11TWaCEt6QYQjwo7G8GwcM5bdUvv61ycGwEhXBSFgDuW2BigGK5Lw%3D%3D--Lz1Eu%2FeQQKIUm5qL--%2FTMIjXiDxeS0GF7PjiWnVw%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1548125917.1753298571; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 19:22:51 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 19:22:51 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: AD44:EE750:71DE1:8ED5F:6881368B
Vectorize `rotate` even better by AlexGuteniev · Pull Request #5525 · microsoft/STL · GitHub
Vectorize
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Vectorize rotate
even better
#5525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
StephanTLavavej
approved these changes
May 22, 2025
Thanks! 😻 This was easier to review once I realized it was mostly copied/moved code slightly modified. I pushed trivial changes. |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
When things spin faster, science happens faster! 🚀 ⏱️ 😹 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
Follow up for #5502.
Reasons to consider follow up:
Weakness of the original approach
It deals well with extremes:
The worst case is when the rotation is small, but still large enough to not engage the small rotation branch.
Mitigation approaches
Generally, we need to do multi-range rotating swap to make fewer element assignments. From the original PR:
So how we can do some improvement while avoiding unnecessary complication:
swap_3_ranges
,swap_4_ranges
, etc, but no more than two of them as separate functions.swap_N_ranges
using single source and metaprogramming, pick the best one at runtimeswap_N_ranges
that would work with variable at runtime number of rangesThis makes me think that it would be good to:
swap_N_ranges
The code chages
So I've tried
_Swap_3_ranges
, It resulted in at most 1.40 speedup, and that fixed the slightly regressed cases.I think it is indication to both that the approach is good enough to use, and not too good to try something more complex.
I've moved
_Rotating
closer to__std_swap_ranges_trivially_swappable_noalias
to make the similarity between that and_Swap_3_ranges
more obvious.Coverage
Tests were lacking too long arrays to execute the ranges swapping properly. I've expanded the test to have more elements; to save some run time, I've did this for one of 8-bit elements only. The algorithm does not distinguish element sizes internally anyway.
The same for benchmark, I've added just two examples of the case that became worse.
Benchmark results
Before #5502 / After #5502 may slightly wary from the previous PR description, I've ran the benchmarks again.