You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The algorithm is not very similar though. I have not attempted the one-index pcmpestri instruction and variable step, but it looked like multiple-indices pcmpestrm and fixed step would work better for this direction, because the former would return more partial false matches than in forward direction, as the match with higher index is more likely to be partial.
If that's not convincing enough, I can try the other approach
pcmpestrm approach could have been made more efficient if DevCom-10689455 is fixed, but honestly I don't hope for that much
Also restructured control flow a vit in __std_search_impl, the place where _Match_1st_16 is introduced. I basically implemented the same thing from scratch, and the new attempt gave clearer control flow implementation. They are not shared though, as _First1 > _Stop1 check is needed for variable step, but not needed for fixed step, so here's still some variation.
The types larger than 2 bytes were excluded from test and benchmark -- same way as we don't test replace with smaller elements.
The benchmark patterns list is expanded to test small and large needle closer to the other end. I deliberately left coverage for search + closer to start and find_end + closer to end, It is useful to test the case where startup cost contributes more.
Also added couple of "evil" patterns in benchmark. The goal for them is not to make them too much worse. I leave up to maintainers to decide if this goal is achieved
πΏ The Evil case
Unlike many other algorithms, search algorithm run time highly depends on the data. It is possible to craft data which takes way more time than typical data of the same length.
The worst case could be haystack of a single repeating value, and the needle with the same repeating values and another value in the end. This is bad for plain search, although Boyer-Moore or similar algorithms are expected to handle that better.
β οΈ The results for find_end are much worse for that case β οΈ
β οΈ The results for search are also worse after vectorization for such case, but not that much as find_endβ οΈ
It look possible to detect the evil case (by the amount of matched beginnings per some amount of data) and switch to another algorithm, or to modify the whole algorithm to be less affected. It can be done in this or subsequent PR.
Also added couple of "evil" patterns in benchmark. The goal for them is not to make them too much worse. I leave up to maintainers to decide if this goal is achieved
I can look into improving the "evil" case results by detecting it and switching strategy
Thanks! πΈ I pushed some fixes, please double-check.
Final perf results on my 5950X look good. The regressions for the highly pathological cases aren't too bad, and I don't think we need to add extra implementation complexity to avoid them.
Nah, I just wasn't consistently complaining. πΉ When an expression in a conditional operator is especially long then I don't find extra parens to be as objectionable.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
π Overview
std::search
of 1 and 2 bytes elements withpcmpestri
Β #4745 in the opposite directionpcmpestri
instruction and variable step, but it looked like multiple-indicespcmpestrm
and fixed step would work better for this direction, because the former would return more partial false matches than in forward direction, as the match with higher index is more likely to be partial.pcmpestrm
approach could have been made more efficient if DevCom-10689455 is fixed, but honestly I don't hope for that much__std_search_impl
, the place where_Match_1st_16
is introduced. I basically implemented the same thing from scratch, and the new attempt gave clearer control flow implementation. They are not shared though, as_First1 > _Stop1
check is needed for variable step, but not needed for fixed step, so here's still some variation.replace
with smaller elements.search
+ closer to start andfind_end
+ closer to end, It is useful to test the case where startup cost contributes more.πΏ The Evil case
Unlike many other algorithms,
search
algorithm run time highly depends on the data. It is possible to craft data which takes way more time than typical data of the same length.The worst case could be haystack of a single repeating value, and the needle with the same repeating values and another value in the end. This is bad for plain
search
, although Boyer-Moore or similar algorithms are expected to handle that better.find_end
are much worse for that casesearch
are also worse after vectorization for such case, but not that much asfind_end
It look possible to detect the evil case (by the amount of matched beginnings per some amount of data) and switch to another algorithm, or to modify the whole algorithm to be less affected. It can be done in this or subsequent PR.
I doubt how much the Evil case is important.
π Benchmark results
Legend for the benchmark parameter:
STL/benchmarks/src/search.cpp
Lines 71 to 77 in 60825c4
Re-testong of the search PR against then-main as there are more cases in the benchmark: