You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In icase and collate mode, characters are supposed to be passed through translate_nocase() and translate() of the traits class before comparing them. This PR makes sure we always apply these translations exactly once before any character or string comparison.
The test deliberately defines non-idempotent translation functions. They are very weird, but they make it easy to catch any repeated or unbalanced applications of these functions before character or string comparisons.
This PR also replaces _Cmp_cs by equal_to. While this means for now that a static call is replaced by a non-static one, there is already machinery in place to recognize when equal_to can be vectorized, and vectorization will help us significantly improve performance in one of the following PRs.
I realized later that the tests didn't cover two cases related character ranges in icase mode:
A non-collating range that triggers the small range optimization (right bound - left bound < 4 after translation).
A non-collating range that triggers neither the small range optimization nor the bitmap optimization.
For the idempotent translation functions used in the test, this means we need code points with values >= 0x200 for the lower-case variants, so I had to make use of wide strings for the two additional tests.
@muellerj2
The number of <regex> bug fixes you've introduced for the last few months is a bit shocking.
Is <regex> really that buggy, how did it even functionate?๐ค
Anyway, good job and thanks for making the STL better for all of us.
Is really that buggy, how did it even functionate?๐ค
That's relatively easy to say: Most of the fixed bugs are or were specific to some subclasses of regexes (rarely used syntax options, specific escapes, wregex with specific character classes, usage of collating elements, icase or collate mode etc.), so if your regex wasn't in any of these subclasses, you weren't affected.
The bug fixed by this PR, for example, would only be observed if the traits classed provided (a) a non-trivial (or worse a non-idempotent) translate() function or (b) a non-idempotent translate_nocase() function. I'm not aware of any code that defined such translation functions. Well, except for the test code in this PR, but translation functions in the test code are deliberately weird to catch that they aren't applied exactly once. They aren't designed to do anything useful.
bugSomething isn't workingregexmeow is a substring of homeowner
3 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In
icase
andcollate
mode, characters are supposed to be passed throughtranslate_nocase()
andtranslate()
of the traits class before comparing them. This PR makes sure we always apply these translations exactly once before any character or string comparison.The test deliberately defines non-idempotent translation functions. They are very weird, but they make it easy to catch any repeated or unbalanced applications of these functions before character or string comparisons.
This PR also replaces
_Cmp_cs
byequal_to
. While this means for now that a static call is replaced by a non-static one, there is already machinery in place to recognize whenequal_to
can be vectorized, and vectorization will help us significantly improve performance in one of the following PRs.