You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
__builtin_wmemcmp and __builtin_wmemchr are simply lowered to wchar-by-wchar loops by MSVC backend, which may perform poorly than vectorized CRT wmemcmp and wmemchr. Therefore, for non-constexpr cases in string view, we call CRT functions instead of the builtins.
I'd like to see a benchmark and its results. I expect that maintainers would be interested in that too. #4387 is an example for a similar benchmark added to this repo.
I also recall my past observation that wmemchr is slow and not vectorized. See DevCom-1614562. If this still confirms, I suggest reusing the std::find vectorization implementation instead of wmemchr
@mcfi's internal OS-PR-11111227 "[Perf] Optimize wmemcmp/wmemchr in UCRT" has enhanced <wchar.h> with vectorized implementations for x64 and ARM64, which have been benchmarked there as up to 10x faster. Although it'll be some unspecified amount of time before we're able to pick up updated UCRT headers, I'm fine with calling the UCRT now - this is our historical practice and only in highly unusual cases (e.g. #4654) do we avoid the UCRT.
Since this is deferring to our usual dependency, I also don't think it's absolutely necessary to add benchmarks to the STL repo.
Also, we're already calling wmemcmp and wmemchr in C++14 mode (expand context below the diffs), so I have absolutely no concerns here.
StephanTLavavej
changed the title
Call CRT wmemcmp/wmemchr when possible in string view for better performance
Call CRT wmemcmp/wmemchr when possible in char_traits for better performance
Aug 2, 2024
Good catch @AlexGuteniev! I think we can handle those in a followup PR (except that changing the ABI-retained function is pointless). I've recorded a note.
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.
except that changing the ABI-retained function is pointless
Don't think so. The function used to be the optimization that we reverted to make ASAN happy.
Using wmemchr brings the optimization back, mitigating the ASAN happiness impact. (Not fully eliminating though, as there are 32 and 64 bit versions too)
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
__builtin_wmemcmp and __builtin_wmemchr are simply lowered to wchar-by-wchar loops by MSVC backend, which may perform poorly than vectorized CRT wmemcmp and wmemchr. Therefore, for non-constexpr cases in string view, we call CRT functions instead of the builtins.