You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Resolves#5111
I decided to look into this issue because I like coroutines and wanted to try contributing something.
It turns out that the implementation is quite straight forward, just adding a yield_value overload taking a generator by lvalue reference which is otherwise the exact same as the overload taking a generator by rvalue reference.
Since the LWG paper doesn't state specific numbers and to test the implementation I wrote a short test which I tried with both the current STL (as shipped with VS 17.13.0) and my patched version and in both Debug and Release configurations.
#include <generator>
#include <algorithm>
#include <chrono>
#include <print>
constexpr int N = 100000;
std::generator<int> f() {
co_yield 1;
}
std::generator<int> g_rvalue()
{
for (int i = 0; i < N; i++) {
auto f1 = f();
co_yield std::ranges::elements_of(std::move(f1));
}
}
std::generator<int> g_lvalue()
{
for (int i = 0; i < N; i++) {
auto f1 = f();
co_yield std::ranges::elements_of(f1);
}
}
int main()
{
{
auto start = std::chrono::high_resolution_clock::now();
int res = std::ranges::fold_left(g_rvalue(), 0, std::plus<int>());
auto duration = std::chrono::high_resolution_clock::now() - start;
std::println("With elements_of(rvalue generator)\nresult: {}, time: {}, per element: {}", res, std::chrono::duration_cast<std::chrono::microseconds>(duration), duration / N);
}
{
auto start = std::chrono::high_resolution_clock::now();
int res = std::ranges::fold_left(g_lvalue(), 0, std::plus<int>());
auto duration = std::chrono::high_resolution_clock::now() - start;
std::println("With elements_of(lvalue generator)\nresult: {}, time: {}, per element: {}", res, std::chrono::duration_cast<std::chrono::microseconds>(duration), duration / N);
}
}
Results on my machine (time per element):
Debug
Release
rvalue
lvalue
rvalue
lvalue
current
~870ns
~1700ns
~49ns
91ns
patched
~870ns
~870ns
~49ns
~49ns
The results are bit noisy across multiple runs but show clearly that the general overload of yield_value (which is used in the current version) takes almost twice as much time as the generator specialised version (both unoptimised and optimised). This is unsurprising since the general overload wraps the range in an extra generator resulting in two coroutine calls per element. The results also show that the difference disappears in the patched version since the lvalue generator also uses the specialised overload.
StephanTLavavej
changed the title
LWG-3899 co_yielding elements of an lvalue generator is unnecessarily inefficient #5111
LWG-3899 co_yielding elements of an lvalue generator is unnecessarily inefficient
Feb 18, 2025
Thanks, looks perfect! I've edited the PR title to remove the issue number - it doesn't get linked to anything there, and would appear in the git history of PR titles which otherwise only contain PR numbers.
I'll get this merged this week - we have a semi-manual process of merging simultaneously to the GitHub and MSVC-internal repos.
generatorC++23 generatorLWGLibrary Working Group issue
4 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #5111
I decided to look into this issue because I like coroutines and wanted to try contributing something.
It turns out that the implementation is quite straight forward, just adding a
yield_value
overload taking a generator by lvalue reference which is otherwise the exact same as the overload taking a generator by rvalue reference.Since the LWG paper doesn't state specific numbers and to test the implementation I wrote a short test which I tried with both the current STL (as shipped with VS 17.13.0) and my patched version and in both Debug and Release configurations.
Results on my machine (time per element):
The results are bit noisy across multiple runs but show clearly that the general overload of
yield_value
(which is used in the current version) takes almost twice as much time as the generator specialised version (both unoptimised and optimised). This is unsurprising since the general overload wraps the range in an extra generator resulting in two coroutine calls per element. The results also show that the difference disappears in the patched version since the lvalue generator also uses the specialised overload.