C++20 Coroutines
There seems to be a lot of confusion around the implementation of C++20 coroutines, which I think is due to the draft technical specification for C++20 stating that coroutines are a work in progress so we can’t expect full compiler and library support at this point in time.
A lot of the problems probably arise from the lack of official documentation about working with coroutines. We have been given C++ syntax support for coroutines (the co_yield and co_return) but without all of what I consider full library support. The standard library has hooks and basic functionality for supporting coroutines, but we must incorporate this into our own classes. I anticipate that there will be full library support for generator style coroutines in C++23.
The C++20 specification is obviously looking to provide support for parallel (or asynchronous) coroutines using co_await, which makes the implementation of a simpler generator style synchronous coroutines more complex. The implementation requirements for our coroutines utilises a Future and Promise mechanism similar to the std::async mechanism for asynchronous threads.
If you are a Python or C# developer expecting a simple coroutine mechanism, you’ll be disappointed because the C++20 general purpose framework is incomplete. Having said that, there are many blogs and articles on the web that include a template class that supports generator style coroutines. This blog contains a usable coroutine template and example code after discussing what a coroutine is.
What is a Coroutine?
I first came across coroutines via the yield statement in CLU which, like generators in Python (and yield return in C#), are defined using function syntax and accessed using the for loop syntax. They were described as cooperating routines (not concurrent routines) which execute on a single thread. There are other styles of coroutines, and Wikipedia provides a good starting point for comparing functions, generators and threads.
For this blog, I’ll concentrate on coroutines that execute in the context of the caller and allow two separate blocks of code to interleave flow-of-control between them.
The new C++20 co_yield statement allows one routine to provide a piece of data and, at the same time, return control flow to the calling routine for processing. This is a long-winded way of saying they provide a single threaded implementation of the producer-consumer pattern.
We can see a classic coroutine’s producer-consumer interaction in the following UML Sequence Diagram:
The control bars on the diagram show the flow of control moving from one routine to another.
When the flow of control is transferred from one routine to another, the current state of the routine must be saved and then restored when the routine resumes. In the case of the consumer, this happens as part of the usual function call mechanism where the current stack frame holds the state of the routine. In the case of the producer (the coroutine), extra support from the compiler and runtime system is required to save the producer’s stack frame whenever a value is yielded up to the consumer.
The C++20 specification says that the coroutine state is saved on the heap which means they are not suitable for embedded systems that do not use dynamic memory. But, the specification does state that a given implementation can optimise away heap usage if:
- the lifetime of the coroutine is strictly within the lifetime of the caller
- the size of coroutine state can be determined at compile time
In practice, for the simple generator coroutines we are considering in this blog they meet this criteria and could save the coroutine state in the callers stack frame. Examining the heap usage for both examples in this blog shows that GCC-11 and Clang-12 (the latest at the time of writing) both use the heap to save the coroutine state. Given compiler support for coroutines is relatively new and evolving it is quite possible that later versions may optimise this code, or support compiler options to enable or disable saving coroutine state in dynamic memory.
To support the save and restore of the coroutine state we must provide a supporting class that integrates with the coroutine library support provided in the #include <coroutine> header file. This is where the current complexity of implementing a coroutine lies.
C++20 Coroutine Support
To put C++20 coroutines into context, we can create a coroutine to yield up “Hello world!” as three separate objects as follows (not we need to include the <coroutine> header file):
#include <coroutine>
X coroutine()
{
co_yield "Hello ";
co_yield "world";
co_return "!";
}
The first thing to note is that this is not a function definition! We have just used the function syntax to define a block of code that can be passed arguments when instantiated. A function would have a return statement (or an implied return for void functions), whereas here, we yield three separate values. Note that you cannot put a return statement in a coroutine.
Secondly, we return some unknown (for now) object of type X. This is the object that implements the coroutine. The compiler will reorder our code block to implement the coroutine mechanism with its save and restore state code, but it currently needs a little help from us in writing this supporting class X.
Inside the coroutine code block, we use co_yield to yield a value and save the coroutine state, and co_return to yield a value and discard the state.
This is a naive example of a coroutine in that it must be used exactly three times as shown by this example:
auto x = coroutine();
std::cout << x.next();
std::cout << x.next();
std::cout << x.next();
std::cout << std::endl;
Once we have consumed all of the yield values, the coroutine terminates and releases all memory used to store its state.
In our example, the coroutine object has a next() method which will:
- suspend the current consumer code
- restore the state of the coroutine (producer)
- resume the coroutine code from the last yield statement (or the start of the code block)
- save the value of the next yield statement
- save the coroutine state
- restore the consumer state
- resume the consumer by returning the saved value from the yield statement
Currently, there is no standard library template for our class X. Hence, we have to look at the available library support for coroutines. An example templated class is shown later when we look at a practical application of coroutines but at the moment we’ll look at a basic hard-coded example.
To write our own coroutine support class X we need to support the lifecycle operations by providing implementations for specific methods. The C++20 standard defines these method requirements using concepts which we introduce in our blog posts for concepts part1 and part2. To keep this blog focused on coroutines we’ll adopt the traditional C++ approach of stating the implied methods we need to provide as part of our class.
Unfortunately, C++20 coroutines require us to provide two inter-related supporting classes:
- a class to save the coroutine state and save the yield data – usually called the promise
- a class to manage the coroutine (promise) object – this is our class X, traditionally called the future
In the promise object we will need to provide several lifecycle methods. For now, we’ll just look at a supporting the yield statements and ignore the methods required for managing the coroutine state.
As our coroutine uses a co_yield statement with a const char* value we need a method with the following signature:
std::suspend_always yield_value(const char* value);
The argument is the yield object, and the return type tells the runtime system whether to save the thread state which, for a single threaded coroutine we always want to do by returning a std::suspend_always object. There is the ability to return a std::suspend_never which allows for asynchronous coroutines but that leads to a lot complications about managing and resuming suspended threads which we don’t want to get involved with for our simple synchronous coroutine.
The yield_value method must save its argument so it can be returned to the calling routine (consumer). A typical implementation is:
std::suspend_always yield_value(const char* value) {
this->value = value;
return {};
}
If you haven’t already come across the modern C++ syntax of return {} it simply means create a default constructed object of the return type for this method. We could have also used return std::suspend_always{}.
To support the co_return statement which yields a value but doesn’t save state we need a second lifecycle method:
void return_value(const char* value) {
this->value = std::move(value);
}
As co_return terminates the coroutine, the lifecycle function has void return type because the coroutine state will be destroyed.
Without looking in detail at the implementation of class X we can show how the compiler might expand our consumer code into inline sequential operations to see the lifecycle methods. In the following code the method promise() provides access to the promise object which saves the coroutine state and the yield value. A next() method can retrieve the saved value from the promise:
auto x = coroutine();
x.promise().yield_value("Hello "); // save value and state
std::cout << x.next();
x.promise().yield_value("world"); // save value and state
std::cout << x.next();
x.promise().return_value("!"); // save value, discard state
std::cout << x.next();
std::cout << std::endl;
You can see that the compiler reorders our two code blocks into a single sequential set of interleaved method calls.
Just before we look at a complete example with all the templated code for the promise and future classes, we should look at an alternate style of writing our generator coroutine:
X coroutine()
{
co_yield "Hello ";
co_yield "world";
co_yield "!";
// implied co_return;
}
In this approach, we use co_yield for all our values and don’t identify the final yield with a separate co_return value; we simply allow the code block to terminate. The compiler will provide a co_return statement (with no return value) to terminate the coroutine.
For a co_return (no value) statement we need a separate lifecycle method void return_void():
void return_void() {
this->value = nullptr;
}
A promise class cannot provide both a return_value and a return_void method which are considered mutually exclusive.
For this trivial example, our consumer code does not change as it reads exactly three values. In a more realistic example where we loop reading values from our coroutine, we will have to mark the end of the data stream in some way. This example uses pointers, so a nullptr can be used to terminate a loop; otherwise a std::optional object is the most general approach.
Our new terminating consumer would look like:
auto x = coroutine();
while (const char* item = x.next()) {
std::cout << item;
}
std::cout << std::endl;
We could have coded the loop as
while (auto item = x.next()) {
but have chosen to keep the explicit type declaration so it is clear how the generator is used.
The full example of this code is in the file char_demo.cpp in the accompanying GitHub repo called coroutines-blog.
Working with Coroutines
Coroutines are a convenient mechanism for implementing multiple algorithms in separate code blocks rather than combining those algorithms into a single block of convoluted code.
As an example consider an embedded device that monitors data values, such as temperature, and writes these values to a serial port (RS232) complete with a timestamp, which could be time from device boot or a network synchronised clock time.
The timestamp and value are stored as float values (4 bytes each) and simply dumped as binary to the serial byte stream to reduce code complexity and data size. The data stream looks like the following (little-endian byte order):
In our data collector application, we want to read this stream into a struct with two float values and then print out those values to a logging device, such as a display screen, with an alarm message if the value exceeds a given threshold.
Our combined algorithm would involve:
- read 4 bytes to construct the time stamp
- read 4 bytes to construct the data value
- create the data structure with both float values
- print the data structure values
- print a warning message if the data value exceeds a threshold
Now consider what happens when the data stream ends part way through a float value. Our code has the handle the end of stream error condition for each of the separate read operations (that’s eight separate one byte read operations). Even with good use of functions this code will be a complex set of conditional tests and data reconstruction statements – challenging to write and maintain.
Using coroutines we can break this down into two steps:
- parse the data
- display the data and optional warning message
In practice we’ll go further and break the first step into:
- parse raw bytes into float values
- store a timestamp and data point in a structure
Coroutine Future Template
The first step is to create a template for the classes we’ve discussed representing the coroutine future and the data promise.
Promise data holder
Here’s the Promise as a nested structure in the Future class:
template <typename T>
class Future
{
class Promise
{
public:
using value_type = std::optional<T>;
Promise() = default;
std::suspend_always initial_suspend() { return {}; }
std::suspend_always final_suspend() noexcept { return {}; }
void unhandled_exception() {
std::rethrow_exception(std::move(std::current_exception()));
}
std::suspend_always yield_value(T value) {
this->value = std::move(value);
return {};
}
void return_void() {
this->value = std::nullopt;
}
inline Future get_return_object()
value_type get_value() {
return value;
}
private:
value_type value{};
};
…
};
The Promise structure (which we define as private to the enclosing Future class) saves a single data value as a private std::optional object with a get_value accessor method. By using a std::optional object we can use std::nullopt to test for the end of the coroutine after the return_void method has been called. We’ve followed the C++ template meta-programming style of defining the value_type type trait so we can interrogate the class to determine its underlying data type.
We provide a default constructor and the two lifecycle methods required for a coroutine promise object (initial_suspend and final_suspend) which will always suspend the coroutine so that we work on a single thread. These lifecycle methods are required but are just standard implementations that don’t need to be examined further.
We also need to specify how the framework should handle uncaught exceptions. Rather than digress into exception handling and recovery mechanisms we’ll just simply rethrow the exception and let the caller deal with it.
The yield_value and return_void methods discussed earlier are defined to copy or move the yield value to the std::optional holder or use std::nullopt to indicate the end of the coroutine. Note the use of std::move to ensure we support move semantics for our pass by value function argument: this is necessary if we want to yield a std::unique_ptr for example.
The other method we have to provide is get_return_object() which must return the Future object for this promise. As we haven’t yet completed the class Future definition, we need to implement this method after completing the Future/Promise classes.
Future coroutine context manager
The Future class itself provides a constructor/destructor for managing the composite promise object and a mechanism to obtain the values yielded by the coroutine (our next method discussed previously):
template <typename T>
class Future
{
struct Promise { … };
public:
using value_type = T;
using promise_type = Promise;
explicit Future(std::coroutine_handle<Promise> handle)
: handle (handle)
{}
~Future() {
if (handle) { handle.destroy(); }
}
// Promise::value_type next() { … }
private:
std::coroutine_handle<Promise> handle;
};
We have standard library support for managing the promise object via the std::coroutine_handle template class passed as an argument to the future constructor. We need to store this coroutine_handle object and ensure we call its destroy() method when we destroy the future object.
We have one further library requirement for our class Future in that it must define a nested type called promise_type so the standard library templates can determine the underlying promise class type:
using promise_type = Promise;
Our implementation of the next method must check if the promise is still valid or return an empty std::optional object:
Promise::value_type next() {
if (handle) {
handle.resume();
return handle.promise().get_value();
}
else {
return {};
}
}
To return the yield value from our coroutine:
- we simply check that the coroutine still exists (its handle hasn’t been destroyed)
- call resume() on the coroutine_handle to execute code to the next co_yield statement
- return the value saved by the promise yield_value() method: the library support will handle the restore and save of the coroutine state.
- if the coroutine has been destroyed, we return an empty value (std::nullopt).
Now we have defined the Future class we can complete the Promise object with the required get_return_object() method:
template <typename T>
inline Future<T> Future<T>::Promise::get_return_object()
{
return Future{ std::coroutine_handle<Promise>::from_promise(*this) };
}
We use the std::from_promise method to create the coroutine_handle that is passed to the Future constructor.
As you can see, this is just boilerplate code for creating the Future/Promise classes.
That’s the background work done, and this template can be reused with most data types and classes: I’d say all classes, but there will always be an edge case that this template cannot support.
Data Collection Coroutine
Now we can focus on solving our real data handling problem. The first step is to write a coroutine to read from an istream object and yield up float values:
Future<float> read_stream(std::istream& in)
{
int count{};
char byte;
while (in.get(byte)) {
data = data << 8 | static_cast<unsigned char>(byte);
if (++count == 4) {
co_yield *reinterpret_cast<float*>(&data);
data = 0;
count = 0;
}
}
}
We just read blocks of 4 bytes and shift them into a 32-bit word and use a type cast to reinterpret this memory location as a float for the co_yield statement. If the data stream ends part way through a 4-byte word we ignore any partial value and terminate the coroutine.
We can prove this coroutine works by just printing out each float value that we read from standard input:
auto raw_data = read_stream(std::cin);
while (auto next = raw_data.next()) {
std::cout << *next << std::endl;
}
As we want to save pairs of values into a data structure so we can use a second coroutine to encapsulate that algorithm:
struct DataPoint
{
float timestamp;
float data;
};
Future<DataPoint> read_data(std::istream& in)
{
std::optional<float> first{};
auto raw_data = read_stream(in);
while (auto next = raw_data.next()) {
if (first) {
co_yield DataPoint{*first, *next};
first = std::nullopt;
}
else {
first = next;
}
}
}
Again, if the input stream terminates part way through a timestamp and data point, we discard the incomplete datum.
The last step in this example is to process our timestamp data values:
static constexpr float threshold{25.0};
int main()
{
std::cout << std::fixed << std::setprecision(2);
std::cout << "Time (ms) Data" << std::endl;
auto values = read_data(std::cin);
while (auto n = values.next()) {
std::cout << std::setw(8) << n->timestamp
<< std::setw(8) << n->data
<< (n->data > threshold ? " ***Threshold exceeded***" : "")
<< std::endl;
}
return 0;
}
This code shows how we intend to process the data and all the nitty-gritty code for converting bytes to float values to a data structure is hidden inside the coroutines.
Hopefully you can now start to see the benefit of using coroutines to separate out different aspects of a complex algorithm into simpler code blocks. Currently, with C++20, we need to jump through the hoop of creating the Future/Promise classes. Still, I hope C++23 will provide something similar to this template so we can concentrate on writing our code not working with the supporting code.
You can test this code using a simple Python script such as the following to generate four hardcoded datapoints:
import struct
import sys
start = 0.0
for ms, value in enumerate([20.1, 20.9, 20.8, 21.1]):
sys.stdout.buffer.write(struct.pack('>ff', start + ms*0.1, value))
If our compiled executable is called datapoint_demo, we can use the following Linux shell pipeline to show the coroutines working:
# Linux
python3 test_temp.py | ./datapoint_demo
With the following output:
Time (ms) Data
0.00 20.10
0.10 20.90
0.20 20.80
0.30 21.10 ***Threshold exceeded***
The full example of this code is in the files future.h and datapoint_demo.cpp in the accompanying GitHub repo called coroutines-blog. In order to compiler these examples using GCC (version 10 onwards) you will need to add the -std=c++20 and -fcoroutines to the g++ command line.
In the next follow up post I’ll add iterator support to the Future template class so the coroutine can be used in a for loop or as an input iterator to a library algorithm.
Summary
Coroutines are a powerful programming technique for separating different aspects of a complex algorithm into discrete and simpler code blocks.
C++20, like Python and C#, uses the function syntax to define the coroutine code blocks, which many people initially find confusing as this is just syntax to provide the coroutine statements.
The current absence of a simple standard library template for the generator style coroutines we’ve shown here makes it harder for developers to start using coroutines. It’s a bit like a jigsaw puzzle where you haven’t been shown the picture for the completed puzzle – initially daunting but the puzzle can be solved. Hopefully the Future template shown here has provided a picture you can use for your own puzzles.