CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 20:06:26 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=476.834646,conversation_content-fragment;desc="conversation_content fragment";dur=735.884856,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=2438.015309,nginx;desc="NGINX";dur=1.052118,glb;desc="GLB";dur=100.843055
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: fd8fbbc
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=paMl5faQr%2BivNDGQcCZH4pTz%2BFH%2BFIp2p8EwgqIotOJ2aNEvIBUX8KexwBWh6xmdkBLNEARAtn83sEOOELe8hS3ZTaQIZOz4h8iH1hfHKLh%2FDYimwcaWnOw8bz8ovO7zcorFyHewWbQqExN%2FisnK4PTgGWpXRsUj5nZgj9Vg9W3eyALtKf%2BLRpyXonoxKotJoP5nOA4qpk9SWU5Z9pp1IKQ3cUw4cdwciA46DzybUNl%2BHpK5uqD70%2F9NuEbr6R3E0B258%2B01yQyJZ86Tg%2Fywtw%3D%3D--QYxhMDEF%2BRx2JCFQ--VG5YD3Zo005%2BJ07JZVqDzA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.471192220.1753301183; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 20:06:23 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 20:06:23 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: C126:3AEC1B:B208C:E415A:688140BF
`<regex>`: Simplify matching of `_N_if` NFA nodes in leftmost-longest mode by muellerj2 · Pull Request #5405 · microsoft/STL · GitHub
StephanTLavavej
added
enhancement
Something can be improved
regex
meow is a substring of homeowner
labels
Apr 14, 2025
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
<regex>
: Simplify matching of _N_if
NFA nodes in leftmost-longest mode
#5405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
StephanTLavavej
merged 1 commit into
microsoft:main
from
muellerj2:regex-simplify-if-matching
Apr 22, 2025
Merged
<regex>
: Simplify matching of _N_if
NFA nodes in leftmost-longest mode
#5405
StephanTLavavej
merged 1 commit into
microsoft:main
from
muellerj2:regex-simplify-if-matching
Apr 22, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
StephanTLavavej
approved these changes
Apr 15, 2025
Thanks for the exceptional analysis and writeup - I wouldn't have noticed that this code was dead! 😻 Anything to mitigate stack consumption is also greatly appreciated. |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks again for this simplification! 🧹 ✨ 🐱 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
Since we settled on some reasonable semantics for leftmost-longest matching in #5218, I think we should remove the code for some other (abandoned) attempt to implement the leftmost-longest rule in
_Matcher::_Do_if
: An attempt to setTgt_state
to the leftmost-longest match found under this_N_if
node.This is neither necessary (in leftmost-longest mode, we take the final result from
_Res
, not_Tgt_state
) nor sufficient (it assigns one of the longest matches to_Tgt_state
, but if there are several of the same length it doesn't necessarily pick the correct one among these matches).On the other hand, this saves 96 bytes of stack space per call to
_Do_if
in debug mode, slightly alleviating the stack overflow issues (#997, #1528).Potentially leaving
_Tgt_state
in a garbage state is fine here, because none of the functions further up on the callstack rely on its value:_Match_pat
in the_N_if
case: Will just return immediately to its caller without changing_Tgt_state
._Do_rep0
: Can't be a caller of_Match_pat
because a repetition containing an_N_if
node anywhere is not simple._Do_assert
/_Do_neg_assert
: Can't be the callers of_Match_pat
because there are no lookahead assertions in the POSIX grammars that demand application of the leftmost-longest rule. (But even if there were lookahead assertions -- notwithstanding their currently unknown semantics -- it would be fine in the sense that the matcher wouldn't crash:_Do_assert
would reset the position pointer to a savepoint, while_Do_neg_assert
would fail, resulting in_Match_pat
returning immediately to its caller so that the remaining analysis here applies. Even so, there is the issue that_Do_assert
would not reset the capture groups in_Tgt_state
-- but we don't know what to set them to either as long as we don't know the semantics of such assertions. Nevertheless, all "valid" capture groups would still point to legal ranges in the input, so even the matcher with this PR's change wouldn't crash. This means we wouldn't have to worry about a newer parser emiting assertion nodes because running them with the old matcher would at worst produce wrong results. To get correct semantics, an updated parser and matcher are required, but this would also be the case if we didn't do this PR's change.)_Do_if
: Will either reset_Tgt_state
to some savepoint before calling_Match_pat
or immediately return to its caller and leave_Tgt_state
as-is._Do_rep
: Will either reset_Tgt_state
to some savepoint before doing the next_Match_pat
call or return to its caller while leaving_Tgt_state
as-is._Match_pat
in the_N_rep
or_N_end_rep
cases: Will just return immediately to its caller without changing_Tgt_state
._Match
: Will evaluate_Res
in leftmost-longest mode, not_Tgt_state
. (_Match
only evaluted_Res
before<regex>
: Fix depth-first and leftmost-longest matching rules #5218 as well, so this change also doesn't pose a problem if old and new functions are mixed.)So in all cases, either
Tgt_state
isn't evaluated anymore or it is reset to some savepoint before it is used again.