CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 20:13:05 GMT
content-type: text/html; charset=utf-8
cache-control: no-cache
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
referrer-policy: no-referrer-when-downgrade
server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=534.607955,conversation_content-fragment;desc="conversation_content fragment";dur=950.921478,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=494.422218,nginx;desc="NGINX";dur=0.960339,glb;desc="GLB";dur=100.526337
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: fd8fbbc
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=LJ%2Fpi6fxpSnNkOy%2BNN7bMprW9CJYI7WTJJsjJbH2iaOSIukCoWWBqkVzg%2F9zU1Hoqb9b319sNQznu47YlRZQ0uKIrkN6qrjHZUsCwK2GM%2Bcm3AAXfb2lwW6BXeBQwMfzQpUO6T7rNSCwUU0FGaOl44OIbxD%2FmrdoRuogrQhtADnXfRzqh1xooVFEBPlLVBqoQo3DwFqQitWphGt5feLksZ%2B%2FUV1V1rirdYh3mEnugPR4b5%2BC16T8SQNystA4uR%2Fp6fftqzdAFqga1D3Va9ZrTQ%3D%3D--oP11wlTuhRKgdney--cxlq6noRMuHZrBQ5d0hRIA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1394436269.1753301584; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 20:13:04 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 20:13:04 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: 9E64:2A88C5:B8436:EDC7D:6881424F
`<regex>`: Reject empty repetitions when required by regex grammars by muellerj2 · Pull Request #5494 · microsoft/STL · GitHub
muellerj2
changed the title
May 11, 2025
StephanTLavavej
added
bug
Something isn't working
regex
meow is a substring of homeowner
labels
May 11, 2025
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
<regex>
: Reject empty repetitions when required by regex grammars
#5494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
StephanTLavavej
merged 6 commits into
microsoft:main
from
muellerj2:regex-fix-empty-repetitions
May 17, 2025
Merged
<regex>
: Reject empty repetitions when required by regex grammars
#5494
StephanTLavavej
merged 6 commits into
microsoft:main
from
muellerj2:regex-fix-empty-repetitions
May 17, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<regex>
: Reject to match empty repetitions when required by regex grammars<regex>
: Reject empty repetitions when required by regex grammars
Thanks! 😻 I expanded the test coverage slightly and fixed a couple of test bugs. |
StephanTLavavej
approved these changes
May 15, 2025
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
🪹 🪫 🔅 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.
Fixes #5490.
Standards
ECMA-262 15.10.2.5 "Term" specifies in the internal continuation of the internal helper function "RepeatMatcher" for production
Term :: Atom Quantifier
:Here, min is decremented for each successful repeat match until it reaches zero, so the match is supposed to fail if an additional repetition yields an empty match and the specified minimum number of repetititions have been reached.
Similary, POSIX says (here for basic regexes, but the standard contains similar language for extended regexes):
Note the subtle difference that POSIX allows an empty match if there is only one repetition.
Changes
_Matcher::_Do_rep()
The logic has been restructured:
_Progress
is true), we do another repetition. If the last match was empty, then we can speed this up by doing only one more match, because the repetitions are internally independent, so the difference can't be observed if we reorder when empty matches appear when repeating the loop. (This is the same change I already mentioned in<regex>
: Fix depth-first and leftmost-longest matching rules #5218 (comment), but I wasn't aware at that time that ECMAScript requires this change as well.)_Matched0
stays false)._Matcher::_Do_rep0()
The logic changes are similar, but one difference is exploited: Since the repeated subexpression must be branchless, all repetitions must either be non-empty or empty. Therefore, we only have to check at the first repetition if the match is non-empty, and we also don't have to try one last match to achieve minimum requirements, because the match will be empty again anyway.
However, the logic changes can't be observed with current NFAs, because capture groups currently can't appear in loops that the simple loop optimization is enabled for. This is because the parser generates unnecessary
_N_if
nodes.By applying the following diff, these unnecessary nodes are no longer generated and we can check that the tests still pass when the updated logic for
_Do_rep0()
is actually exerted:(I haven't proposed this change yet because I'm not sure yet what to do about its side effects and what is most appropriate for a non-recursive matcher, but I do want to make sure that this change doesn't become impossible due to some erroneous change in
_Do_rep0()
.)_Builder::_Add_rep2()
In this function, quantifiers
?
,??
,{0,1}
and{0,1}?
were not represented using_N_rep
nodes, but instead emitted as two branches of an_N_if
node. But this will accept an empty match, so if(a?)?
is matched to the empty string, the capture group will be matched to the empty string as well. However, it should actually be unmatched according to the ECMAScript rule.For this reason, this PR disables this alternative representation when the subexpression contains a capture group or a non-capturing group (for simplicity, because otherwise we would have to keep searching in the non-capturing group for capture groups).
_Parser::_Calculate_loop_simplicity()
Disabling the alternative representation would have the consequence that some loops would no longer be marked as simple. This is undesirable because it can make regexes containing them newly susceptible to stack overflow.
To avoid this, the loop simplicity calculation now also marks loops as simple if they are branchless and all outer loops can be repeated at most once (so in particular, if they were all generated by
?
quantifiers).Note that we must mark a loop is non-simple if some outer loop can be repeated at least twice (and contains a capture group somewhere) because
_Do_rep0()
does not reset the positions of capture groups when recursive_Match_pat()
calls return. If an outer loop were repeated twice, i.e., another repetition of the outer loop could happen in one of_Do_rep0()
's_Match_pat()
, the positions of capture groups in the outer loop would change. Since_Do_rep0()
does not reset the capture groups, the starting positions of any of these capture groups would be corrupted during the remaining processing in_Do_rep0()
.Miscellaneous
_Any_posix
because there was a need for it in_Do_rep()
and_Do_rep0()
now as well._Do_rep()
and_Do_rep0()
no longer reset_Tgt_state
when they didn't match. The correctness argument for these changes is the same as for the corresponding change to_Do_if()
in<regex>
: Simplify matching of_N_if
NFA nodes in leftmost-longest mode #5405.