You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We have three outcomes of comparison of elements pair:
Haystack element less. Advance only haystack
Needle element less. We found mismatch, return false
Elements equal. Advance haystack and needle
Out of these outcomes, we can get one in one branch, the rest will be two branches.
Since we have "less" predicate, we can't have equal in one branch.
Out of haystack less and needle less, the current std:: algorithm favors needle less, and the ranges algorithm favors haystack less.
Favoring haystack less optimizes long algorithm run, where haystack is way bigger than needle.
Favoring needle less optimizes early return, which is not so useful, as it happens only once per algorithm run.
π Benchmarks
Not sure what is common case of includes, trying these three:
(0) Needle is seen contiguously in haystack
(1) Needle is seen not completely contiguously in haystack, but still compact (stride is geometric distribution)
(2) Needle is spread in haystack with almost the same stride (plus-minus one)
(3) Needle is sampled randomly from haystack
All these are multiplied by
(1) The needle is actually found
(0) The middle element of the needle does not match anything in haystack
I think these combinations are enough to explore the algorithm performance properties
Benchmark results show being affected by codegen gremlin, and expectedly 300/290 cases are less faster and more slower.
But overall looks like an improvement.
π Benchmark results
β οΈ mixed result with improvements over 2.0 sometimes, but degradation below 0.3 sometimes too β οΈ
β οΈ whole benchmark runs long β οΈ
Mixed results with overall improvement, but some great degradation
I think the chance of UB were lower, than your commit message says. We pick the middle needle elemnt to break the match, not an arbitrary random number.
Otherwise fine.
Please confirm that my results where some cases became worse, don't concern you too much.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
β Why ranges approach
We have three outcomes of comparison of elements pair:
return false
Out of these outcomes, we can get one in one branch, the rest will be two branches.
Since we have "less" predicate, we can't have equal in one branch.
Out of haystack less and needle less, the current
std::
algorithm favors needle less, and the ranges algorithm favors haystack less.Favoring haystack less optimizes long algorithm run, where haystack is way bigger than needle.
Favoring needle less optimizes early return, which is not so useful, as it happens only once per algorithm run.
π Benchmarks
Not sure what is common case of
includes
, trying these three:(0) Needle is seen contiguously in haystack
(1) Needle is seen not completely contiguously in haystack, but still compact (stride is geometric distribution)
(2) Needle is spread in haystack with almost the same stride (plus-minus one)
(3) Needle is sampled randomly from haystack
All these are multiplied by
(1) The needle is actually found
(0) The middle element of the needle does not match anything in haystack
I think these combinations are enough to explore the algorithm performance properties
Benchmark results show being affected by codegen gremlin, and expectedly 300/290 cases are less faster and more slower.
But overall looks like an improvement.
π Benchmark results
Mixed results with overall improvement, but some great degradation
And random variation for ranges part