CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 20:09:31 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=465.845736,conversation_content-fragment;desc="conversation_content fragment";dur=782.304493,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=473.650454,nginx;desc="NGINX";dur=0.880603,glb;desc="GLB";dur=101.55569
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: fd8fbbc
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=Ik%2FnYL68KJs3x1jX0iqFN%2B%2BtOms2ccW51YciFV1IsK0WWIUMKpuSdj9xUYxA%2BfJYy9FjfaLFmJzpanKLufkZzu1N0SA2NnP3mlQvxj1q%2BwWs5PuthX1DPc44SI2Hf4riqXkXCamz0uzczC2MQ%2FKDonqGaxgWrrG690PynOBvQluoxNS4dvj9M0H54UHO79w6zwK%2FY0mo686yQPBBtemAcXs2mf3ePI5m9%2FV1CxHLr%2FCHAOHprnBBQx1gr3v6lOnI1YqMU3qbIoeuPDRD%2B%2FAcNg%3D%3D--Nn9yU46Kk7jS0qvS--NZbgyPTPEFmt52HNTTMNyQ%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.581330609.1753301370; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 20:09:30 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 20:09:30 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: A602:25703E:B7FD9:EC18A:6881417A
Vectorize `find_end`, make sure ASan passes by AlexGuteniev · Pull Request #5042 · microsoft/STL · GitHub
Vectorize
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Vectorize find_end
, make sure ASan passes
#5042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
StephanTLavavej
approved these changes
Oct 26, 2024
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks for figuring out how to fix this! 🛠️ 🧠 🪄 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
Re-try of #4943
Reverts #5041 and fixes the bug that triggered ASan.
🦠 The Bug
The overall structure
We do SSE4.2 checks that search for up to 16 bytes sequence in a 16 bytes sequence. Such search can result in a match if it is a full match, or if it is a match of the beginning, and can be a match or mismatch when looking into further bytes.
We have two different code branches for needles within 16 bytes and longer needles. The bug is in the longer needles branch.
Short needle branch
The 16 or less bytes branch can have SSE4.2 match as confirmed, when the needle fits fully, or a match that needs further confirmation when beginning of the needle matched.
The search starts from the 16 bytes before the end, to have data for the first SSE4.2 instruction.
The whole haystack is split into three parts:
Long needle branch
Any match needs further confirmation.
The search starts from the byte offset from the end that can have first match. This offset is greater than 16 bytes.
The whole haystack split into two parts:
memcmp
the remaining. For match starting from zero offset we can go directly tomemcmp
, as first 16 bytes are already checkedComparing first 16 bytes before going to
memcmp
is an optimization to check first 16 bytes faster. We already have the needle in a vector register, and we know that there are more than 16 bytes, so we can compare them faster, and skipmemcmp
at all, if there's a mismatch.Meet the bug
On the first iteration of main part, we have a possibility that a match has nonzero offset. In this case, either the 16 bytes check or
memcmp
would do out-of-range read.💥 The Impact
💊 The Fix
Here are two ways possible:
Whereas the second approach is more appealing from the performance perspective, it is harder to reason about. One of the difficulties of the second approach is that skipping more bytes than 16 may result in offset before the beginning. In this case we have to do one matching, with beginning offset, and apply proper mask to the result of such matching.
So, this fix implements the first approach. It uses
pxor
/ptest
for matching, as to match only first offset we don't needpcmp*str*
, and can use faster instructions, and if they confirm, usesmemcmp
for the rest.⏱️ The Fix Impact
About the same results in the benchmark. Within the usual variation.