Embedded Expertise: Beyond Fixed-Size Integers; Exploring Fast and Least Types

The Challenge of Fixed-Size Integers Before C99

In embedded programming, before adopting the C99 standard (ISO/IEC 9899:1999), a significant challenge was ensuring the consistent sizing of key data objects. This complexity stemmed from the C standard’s (ISO/IEC 9899) non-committal stance on the size of an int. We knew:

  • A short is a minimum of 16-bits.
  • A long is a minimum of 32-bits.
  • An int is somewhere between a short and a long.

This flexibility boosted C’s portability, making it a favourite for various architectures, including some with non-standard bit sizes like Unisys’s 36-bit and 48-bit ints. However, from a portability perspective, we could rely on limited ranges (signed int: -32767…32767, unsigned int: 0…65535).

Coding standards before C99 (like MISRA-C1 and MISRA-C2) navigated this by eschewing basic integral types (char, short, int, and long) in favour of specific-length typedefs, e.g.,

typedef i8 signed char;
typedef u8 unsigned char;
// etc.

C99 and <stdint.h>: A Turning Point

C99 introduced fixed-width integer types, offering a standardised approach across systems:

Integer Width Signed Type Unsigned Type
8 bits int8_t uint8_t
16 bits int16_t uint16_t
32 bits int32_t uint32_t
64 bits int64_t uint64_t

Modern coding standards like MISRA-C:2012 now require these types from <stdint.h> to be used instead of traditional integral types.

Tuning for Size and Performance: A Simple Case Study

Continue reading

Posted in ARM, C/C++ Programming, Cortex, Toolchain | Tagged , , | Leave a comment

CMake Presets

Introduction

When we developed the CMake based toolchain for our training projects  we used a shell script to simplify invoking the cmake command line. CMake 3.19 added a presets feature that allows us to define command line parameters in a CMakeSettings.json file which can be used in place of using multiple command parameters.

In previous articles about CMake we have shown how we need to specify  command line parameters to use CMake with an embedded target  toolchain (see CMake Part 3). To generate the build configuration files we use a command line with several options:

$ cmake -S . -B build/debug --warn-uninitialized \
        -DCMAKE_BUILD_TYPE=DEBUG \
        -DCMAKE_TOOLCHAIN_FILE=toolchain-STM32F407.cmake

Each time we actually want to build the system we have to enter a second command line with multiple options:

$ cmake --build build/debug --config Debug -- --no-print-directory

These commands are long and complex so we wrote a Bash script to simplify invoking these commands.

With presets available we can now use a much shorter command line by defining a preset configuration so that the following generates the debug build configuration:

$ cmake --preset debug

And the following performs the build itself:

$ cmake --build --preset debug

If we want a release build we just define a release preset and invoke that from the command line in the same manner.

Presets definition file

The capabilities of the presets have developed over time from supporting configuration and build stages to include testing (the ctest command)  and other CMake features. This blog will concentrate on using presets for the configuration and build stages only. An example project with presets can be downloaded from the Feabhas GitHub repo CMake Presets Blog.

Presets are defined in a CMakePresets.json file stored alongside the CMakelists.txt file and consists of the following outline JSON components:

{
  "version": 3,
  "cmakeMinimumRequired": {
   "major": 3,
   "minor": 21,
   "patch": 0
  },
  "configurePresets": [
  ],
  "buildPresets": [
  ],
}

The version number determines which level of preset support is required and version 3 is the minimum required to support configure and build presets.

After the version requirements there are optional arrays of definitions for each category of preset. It is these entries that define the parameters that previously had to be supplied on the command line.

Each preset has a name that is unique within the preset category so the debug preset in the configuration section is different to the debug preset in the build section.The presets are hierarchical allowing one preset to inherit definitions from another so that common settings can be reused.

Defining configure presets

We’ll start with a preset for the embedded toolchain used with our STM32F407 target board:

"configurePresets": [
  {
    "name": "stm32-base",
    "hidden": true,
    "generator": "Unix Makefiles",
    "binaryDir": "${sourceDir}/build/${presetName}",
    "cacheVariables": {
      "CMAKE_BUILD_TYPE": "Debug",
      "CMAKE_TOOLCHAIN_FILE": {
        "type": "FILEPATH",
        "value": "toolchain-STM32F407.cmake"
      },
      "EXCEPTIONS": "OFF"
    },
    "warnings": {
      "uninitialized": true,
      "dev": true,
      "deprecated": true
    },
    "architecture": {
      "value": "unknown",
      "strategy": "external"
    }
  }
]

This base configuration is not meant to be used directly buts acts like an Abstract Base Class used in Object Oriented Programming. Setting the hidden attribute to true will prevent it from appearing in the list of configure presets obtained using the command:

$ cmake --list-presets

A similar command is used to list the build presets:

$ cmake --build --list-presets

The stm32-base preset defines the common configuration parameters for the debug and release builds. The entries should be self explanatory: they define the build system to use (generator), the build directory (binaryDir) and variables to add to the cache (cacheVariables). The CMakePresets.json file is described on the Presets manual page.

Presets have their own variables and use a different naming conventions to avoid confusion with other CMake variables so that ${sourceDir} defines the workspace root in a preset and and has the same value as the CMake variable ${CMAKE_SOURCE_DIR}.

In the cacheVariables attribute the CMAKE_TOOLCHAIN_FILE entry defines the toolchain file we introduced in the introductory CMake blog article.

At the end of each preset we can add an optional section for vendor specific entries such as this one to support IntelliSense used by Microsoft tools:

"vendor": {
  "microsoft.com/VisualStudioSettings/CMake/1.0": {
  "intelliSenseMode": "linux-gcc-arm"
}

The debug configuration preset used on the command simply inherits from the stm32-base preset:

"configurePresets": [   
  {
    ...
    {
      "name": "debug",
      "displayName": "Debug",
       "inherits": "stm32-base"
    }
  }
]

Similarly a release preset inherits from stm32-base but we override the CMAKE_BUILD_TYPE value:

"configurePresets": [   
  {
    ...
    {
      "name": "release",
      "displayName": "Release",
      "inherits": "stm32-base",
      "environment": {
        "CMAKE_BUILD_TYPE": "Release"
     }
  }
]

Note that any variable defined in the environment section will override its value in the cacheVariables section.

Generating the project build files no longer requires any cmake command line parameters:

$ cmake --preset debug

Defining build presets

The presets for using CMake to build the project follow the same idiom by starting with an abstract base definition:

"buildPresets": [
  {
    "name": "build-base",
    "hidden": true,
    "configurePreset": "debug",
    "nativeToolOptions": [
      "--no-print-directory"
    ]
  }
 ]

The build preset needs to know the target build directory location and the build type used in the configuration so the configurePreset entry is used to ensure we pickup the same values as those used in the named configure preset.

The debug build preset used on the command line just inherits from this build-base definition:

"buildPresets": [   
  {
  ...
{
      "name": "debug",
      "inherits": "build-base"
    }

We can now build the system using:

$ cmake --build --preset debug

The release build uses the same build-base but overrides the configurePreset to use the release configuration:

    {
      "name": "release",
      "inherits": "build-base",
      "configurePreset": "release"
    }

We can also provide additional presets to invoke custom build tasks. Our CMakeLists.txt configuration has one target to run clang-tidy and a second to run any test cases we have supplied in the project. Presets for these tasks are:

{
  "name": "clang-tidy",
  "inherits": "debug",
  "targets": [ "clang-tidy" ]
},
{
  "name": "test",
  "inherits": "debug",
  "targets": [ "test" ],
  "nativeToolOptions": [
    "ARGS=--output-on-failure"
  ]
}

The test preset also shows how we can pass target options to the underlying command. We can invoke these presets using:

$ cmake --build --preset clang-tidy
$ cmake --build --preset test

Conclusion

The addition of presets to CMake to define configure and build variables and other options in a JSON configuration file has dramatically simplified invoking the cmake command. Prior to being able to use presets  the complex CMake command lines typically required supporting Bash or PowerShell scripts to be used to avoid typing long and complex command lines.

With presets the cmake command line is noticeably simpler and can be entered directly without the need for a supporting script.

Posted in ARM, Build-systems, C/C++ Programming, Toolchain | Leave a comment

Disassembling a Cortex-M raw binary file with Ghidra

BlackHat Europe 2022

During the first week of December, I had the pleasure of attending a training course at BlackHat Europe 2022 titled Assessing and Exploiting Control Systems and IIoT run by Justin Searle.

Part of the course involved Assessing and Exploiting Embedded Firmware by reading on-chip Flash using OpenOCD. Unfortunately, we ran out of time to finish the last labs during the training (we ran 9 am-6 pm each day). So I decided to follow along with the very comprehensive notes and finish the last lab.

Note that this is not meant as any criticism of an excellent course. Ideally, I would have liked an extra day (it was already 4-days). As an instructor, I know how often you run out of time based on student questions and lab work.

Once you’ve got a raw binary file, the challenge is to disassemble that. Many toolchains will supply BinUtils tools, such as GCC’s arm-none-eabi-objdump. However, in my experience, this tends to have limited success with raw binary files (I’m sure people far more skilled than myself have greater success).

The most widely referenced tool for reverse engineering code is IDA Pro. IDA Pro is a powerful commercial tool, and I can see why it’s the tool of choice for many professionals. However, the free version doesn’t support Arm (Intel only), and the full version is out of the price range for the casual experimenter.

National Security Agency – Ghidra Software Reverse Engineering Framework

Yes, you’ve read that correctly, the NSA. At the 2019 RSA Conference, the NSA published a press release announcing the release of a tool for reverse engineering. They now have their very own GitHub account.

Taken from the Ghidra GitHub account

Ghidra is a software reverse engineering (SRE) framework created and maintained by the National Security Agency Research Directorate. This framework includes a suite of full-featured, high-end software analysis tools that enable users to analyze compiled code on various platforms, including Windows, macOS, and Linux.

So, let’s give it a go…

Continue reading

Posted in ARM, Cortex, Security, training | Tagged , , , | Leave a comment

Using final in C++ to improve performance

Dynamic polymorphism (virtual functions) is central to Object-Oriented Programming (OOP). Used well, it provides hooks into an existing codebase where new functionality and behaviour can (relatively) easily be integrated into a proven, tested codebase.

Subtype inheritance can bring significant benefits, including easier integration, reduced regression test time and improved maintenance.

However, using virtual functions in C++ brings a runtime performance overhead. This overhead may appear inconsequential for individual calls, but in a non-trivial real-time embedded application, these overheads may build up and impact the system’s overall responsiveness.

Refactoring an existing codebase late in the project lifecycle to try and achieve performance goals is never a welcome task. Project deadline pressures mean any rework may introduce potential new bugs to existing well-tested code. And yet we don’t want to perform unnecessary premature optimization (as in avoiding virtual functions altogether) as this tends to create technical debt, which may come back to bite us (or some other poor soul) during maintenance.

The final specifier was introduced in C++11 to ensure that either a class or a virtual function cannot be further overridden. However, as we shall investigate, this also allows them to perform an optimization known as devirtualization, improving runtime performance.

Interfaces and subtyping

Unlike Java, C++ does not explicitly have the concept of Interfaces built into the language. Interfaces play a central role in Design Patterns and are the principal mechanism to implement the SOLID ‘D’ Dependency Inversion Principle pattern.

Simple Interface Example

Let’s take a simplified example; we have a mechanism layer defining a class named PDO_Protocol. To decouple the protocol from the underlying utility layer, we introduced an interface called Data_link. The concrete class CAN_bus then realizes the Interface.

This design would yield the following Interface class:

Side note: I’ll park the discussion about using pragma once, virtual-default-destructors and pass-by-copy for another day.

The client (in our case, PDO_protocol) is only dependent on the Interface, e.g.

Any class realizing the Interface, such as CAN_bus, must override the pure-virtual functions in the Interface:

Finally, in main, we can bind a CAN_bus object to a PDO_protocol object. The calls from PDO_protocol invoke the overridden functions in CAN_bus.

Using dynamic polymorphism

It then becomes very straightforward to swap out the CAN_bus for an alternative utility object, e.g. RS422 :

In main, we bind the PDO_protocol object to the alternative class.

Importantly, there are no changes to the PDO_protocol class. With appropriate unit testing, introducing the RS422 code into the existing codebase involves integration testing (rather than a blurred unit/integration test).

There are many ways we could create the new type (i.e. using factories, etc.), but, again, let’s park that for this post.

The cost of Dynamic Polymorphic behaviour

Using subtyping and polymorphic behaviour is an important tool when trying to manage change. But, like all things in life, it comes at a cost.

Continue reading

Posted in C/C++ Programming, Design Issues | Tagged , , , , , | 12 Comments

Understanding Arm Cortex-M Intel-Hex (ihex) files

Creating a flash image

The primary purpose of the ihex file in the embedded space is to create a file that is used to program/reprogram a target system. There are various file formats around, with the Intel Hex (ihex) format being among the most widely used.

The output of the linker stage of a build process is typically to generate a .elf file (Executable and Linkable Format). Many debuggers and programmers can work directly with the ELF file format. However, in many cases to availability of the significantly smaller ihex file can improve reprogramming times (especially for Over-The-Air updates).

The ihex files are generated by the arm-none-eabi-objcopy from the .elf file, e.g.

$ arm-none-eabi-objcopy -O ihex <filename>.elf <filename>.hex

One other advantage of the ihex file format is that it is a plain text file. The details of ihex file format are covered, as expected, in Wikipedia.

ihex raw file

For our explanation, we are going to dissect an ihex file generated by the Arm GCC toolchain. Given a raw hex file, such as

:020000040800F2
:10000000000002208D0100089301000897010008FC
:10001000090200080D020008550200080000000057
:100020000000000000000000000000009D02000829
...
:100180008901000889010008FEE7FFFF08B500F0BB
:1001900093F800BEFEE71EF0040F0CBFEFF30880DB
:1001A000EFF309807146414A10472DE9F04182B0D2
:1001B00004468846124B5E6B9F6B9D6A1B6B13F067
...
:10769400000041534349490000000000000000007D
:1076A40000000000000000000000000000000000D6
:0876B40000000000325476983A
:04000005080002B934
:00000001FF

we can start to deconstruct it.

Extended Linear Address Records (ELAR)

This first line is a extended linear address record. This record contains the upper 16-bits of the Cortex-M 32-bit address range. The ELAR always has two data bytes and is broken down into the following fields: Continue reading

Posted in ARM, Build-systems, C/C++ Programming, Cortex, Toolchain | Tagged , | Leave a comment

Working with Strings in Embedded C++

In this post, by Embedded I’m generally referring to deeply embedded/bare-metal systems as opposed to Linux-based embedded systems.

Embedded systems and strings

Historically, the need for and thus the use of strings in embedded systems was fairly limited. However, this has changed with the advent of cheaper, full graphic displays and the growth of the ‘Internet of Things’ (IoT).

Many embedded systems sport full-colour graphics displays, supported by embedded-specific graphics libraries, including:

  • free open-source – e.g. LVGL
  • vendor-specific – e.g. TouchGFX from STMicroelectronics
  • fully specialised graphics environments – e.g. Qt for MCUs.

Naturally, these environments will use strings extensively for labels, message boxes, alerts, etc.

Many of the major IoT frameworks utilise web services built on top of HTTP, such as REST. In conjunction with the web services, embedded applications utilise data interchange formats such as XML XCAP or JSON. Both XML and JSON require character encoding based on ISO/IEC 10646 such as UTF-8.

Character literals

Modern C++ extends the character literal model to support ISO 10646 character encoding. C++11 (and C11) added UTF-16 and UTF-32 support, with C++20 finally adding UTF-8 support.

int main()
{
  char     c1{ 'a' };       // 'narrow' char
  char8_t  c2{ u8'a' };     // UTF-8  - (C++20)
  char16_t c3{ u'貓' };     // UTF-16 - (C11/C++11)
  char32_t c4{ U'????' };     // UTF-32 - (C11/C++11)
  wchar_t  c5{ L'β' };      // wide char - wchar_t
}

Example Code

C Strings

Null-Terminated Byte Strings (NTBS)

A ‘C-Style’ string is any null-terminated byte string (NTBS), where this is a sequence of nonzero bytes followed by a byte with zero (0) value (the terminating null character). The terminating null character is represented as the character literal '\0';

The length of an NTBS is the number of elements that precede the terminating null character. An empty NTBS has a length of zero.

A string literal (constant) is a sequence of characters surrounded by double quotes (" ").

#include <cstring>
#include <iostream>

int main(void)
{
  char message[] = "Hello World";     

  std::cout << sizeof(message) << '\n';  // 12
  std::cout << strlen(message) << '\n';  // 11
}

Example Code

In C/C++, single quotes (') are used to identify character literals. Single quotes (' ') cannot be used (unlike some other programming languages) to represent strings.

C-Strings and string literals

What is the difference in the memory model between the following two program object definitions?

#include <iostream>

int main()
{
   char message[] = "this is a string";
   std::cout << sizeof(message) << '\n';

   const char *msg_ptr = "this is a string";
   std::cout << sizeof(msg_ptr) << '\n';
}

Example Code Continue reading

Posted in C/C++ Programming | Tagged , , , , , , | 9 Comments

TDD in C with Ceedling and WSL2 – performance issues

Ceedling is still probably the best Test-Driven Development (TDD) environment for C programmers out there. But, as with many Free Open-Source Software ( FOSS ), tools getting it to work natively on a Windows environment involves the odd hoop-jumping exercise; either involving messing around with the likes of Cygwin or Mingw; or using a full Virtual Machine (VM) environment such as VirtualBox or VMware.

However, with the introduction of Windows-Subsystem-for-Linux (WSL) and the much-improved update to WSL2, running Linux centric FOSS on a Windows machine has become more straightforward.

Installing Ceedling on WSL2 follows the normal Linux process of installing Ruby and then using the Ruby-based gem to install Ceedling.

Unfortunately, the currently WSL2 environment has one major performance issue. When working in WSL2 with a typical Linux shell (e.g., bash), there are, in effect, a local (Linux) filesystem (where the home directory ‘~’ is located) and a ‘mounted’ file system for the Windows files.

When running a Linux shell using WSL2, the Windows filesystem is accessed using a mount location, such as

/mnt/c/Users/<username>/<windows users filesystem>

For example, on my Win10 laptop, my local Document folder (C:\Users\NiallCooling\Document) is located at:

/mnt/c/Users/NiallCooling/Documents

For many organisations, there are good reasons to locate the working project files in the Windows filesystem. Significantly, many traditional embedded development tools are still (frustratingly) Windows-only. Also, many company’s IT systems do not allow Linux machines on the company network (all they know is Windows, and it is easier to keep their lives simple by just not allowing anything but Windows!).

The update from WSL to WSL2 improves many aspects of working within a Linux environment (mainly due to it moving to a full Linux kernel implementation) but at the expense of Windows file access performance.

For example, building a small Ceedling-based project located on the Windows filesystem from WSL2 gives the following (using time):

real  0m16.505s
user  0m1.399s
sys   0m2.200s

Whereas building the same project mounted in the WSL2 file system gives

real  0m0.576s
user  0m0.511s
sys   0m0.050s

Note: both projects were clobbered (cleaned) before building

Obviously, your mileage may vary, but it will still likely be a significant performance difference.

Running on the WSL2 file system

This assumes the project should exist primarily in the Windows environment

There are three approaches to mirroring a Windows codebase in WSL2.

  • Copy the Windows-based project code to the WSL2 Linux file system
  • Remote cloning to both Windows and WSL2
  • Local cloning from Windows to WSL2

Copying from Windows to Linux

This may be the most obvious but is not recommended. The WSL2 filesystem can be accessed from Windows at the location

\\wsl$\<Linux-distro-name>\home\<user>\

For example, my WSL2 home (~) is at:

\\wsl$\Ubuntu\home\niall\

When using Windows File Explorer, the WSL2 filesystem is found in the navigation pane under Network. You can then copy the project files from the Windows drive (e.g. C:\Users\NiallCooling\Document\project) to the WSL2 location.

However, working with a copy bypasses version control, and, ultimately, you’re probably going to end with inconsistencies in the project codebase unless you are very careful.

Remote cloning to Linux

In many ways, cloning from the project’s remote repository (e.g. hosted on GitHub or BitBucket) is probably the best option. This way, all changes are managed through version control back to the remote repository.

The downside of this approach is that if you need to switch regularly between the Windows and Linux environments, the constant commit/push/pull cycle can feel cumbersome. It is easy to change environments and forget to pull (or even to have pushed) from the remote, thus getting the project code out of sync.

In addition, depending on your internet bandwidth (especially when working from home), the push/pull cycle can negate some of the time savings of working in the Linux environment in the first place.

Local cloning from Windows to Linux

A nice ‘halfway house’ between copying and remote-cloning is to locally clone. It is easy to forget that git has always supported simple file-based cloning using the Local Protocol capability.

A Windows-based git project can be locally cloned to the Linux filesystem rather than cloning from a remote-hosted repository using the Local Protocol. Any changes committed when working on the Linux codebase can simply be pushed back to the Windows clone, thus eliminating the requirement to ‘pull’ when returning to the Windows-based project code.

To clone a window’s-based git project:

git clone file:///mnt/c/Users/<username>/<project folder> 

Note the three / as the Local Protocol uses file:// + /mnt/…
For example:

git clone file:///mnt/c/Users/NiallCooling/Documents/projects/tddc-wsl

You can create a local WSL branch (if needed) and add/commit/push as usual, but, importantly, the push is back to the windows filesystem, not the remote repo.

Summary

The addition of Linux support in Windows 10 through WSL2 is a helpful addition to an embedded programmer’s toolbox (especially combined with Docker and VS Code). Nevertheless, the current performance issues of using Windows-hosted projects directly in WSL2 may give a negative first impression of the overwhelming benefits it brings.

Hopefully, this little ‘hack’ means you can enjoy all the benefits of Linux on a Windows machine without a performance bottleneck.

Finally, a thank you to Robin for raising this issue

Posted in Build-systems, C/C++ Programming, Linux, Testing | Tagged , , , , , | 3 Comments

C++20 Coroutine Iterators

In my first blog post about C++20 Coroutines I introduced the concepts behind a synchronous or generator style coroutine and developed a template class to support coroutines for any data type.

In this post I’ll add an iterator to the template to support the range-for loop and iterative algorithms. You may want to review that post before reading this one but the following code should act as a reminder about how to write and use a coroutine to read two floating point values into a data structure for subsequent analysis.

struct DataPoint { 
    float timestamp; 
    float data; 
}; 

Future<DataPoint> read_data(std::istream& in) 
{ 
    std::optional<float> first{}; 
    auto raw_data = read_stream(in); 
    while (auto next = raw_data.next()) { 
        if (first) { 
            co_yield DataPoint{*first, *next}; 
            first = std::nullopt; 
        } 
        else { 
            first = next; 
        } 
    } 
}
static constexpr float threshold{25.0}; 

int main() 
{ 
    std::cout << std::fixed << std::setprecision(2); 
    std::cout << "Time (ms)   Data" << std::endl; 
    auto values = read_data(std::cin); 
    while (auto n = values.next()) { 
        std::cout << std::setw(8) << n->timestamp 
                  << std::setw(8) << n->data 
                  << (n->data > threshold ? " ***Threshold exceeded***" : "") 
                  << std::endl; 
    } 
    return 0; 
}

The full example of this code is in the files future.h and datapoint_demo.cpp in the accompanying GitHub repo coroutines-blog.

Using Iterative For Loops

To add support for the C++11 range-for loop in the Future template described in the coroutines blog post (the coroutine approach used by Python and C#) we need to add support for iterating over the coroutine sequence of values.

To differentiate this refactored version of the template from the one in the last blog the class has been renamed to Generator which more accurately describes its purpose.

Firstly, to simplify the examples using the coroutines, we add operator << support for DataPoint objects:

static constexpr float threshold{21.0};

std::ostream& operator<<(std::ostream& out, const std::optional<DataPoint>& dp)
{
    if (dp) {
        std::cout << std::fixed << std::setprecision(2);
        std::cout << std::setw(8) << dp->timestamp
                  << std::setw(8) << dp->data
                  << (dp->data > threshold ? " ***Threshold exceeded***" : "");
    }
    return out;
}

We now want to add support to the generator so that we can refactor our client code to use the range-for loop:

std::cout << "Time (ms)   Data" << std::endl;
for (auto&& dp: read_data(std::cin)) {
    std::cout << dp << std::endl;
}

This is equivalent to the traditional C++ iteration loop:

auto stream = read_data(std::cin);
std::cout << "Time (ms)   Data" << std::endl;
for (auto it = stream.begin(); it != stream.end(); ++it) {
    std::cout << *it << std::endl;
}

Support for range-for loops requires iterator support to be added to the Generator template. To do this we provide a class that encapsulates the ability to iterate (step through) each value in the coroutine and stop at the end of the sequence which requires an iterator type that supports:

  • accessing the current value in the sequence via operator*
  • moving the iterator forward one value using operator++
  • checking for the end of the sequence with operator==

In the Generator class we add a begin method to create an iterator object and an end method which returns an object used when testing for the end of the sequence (it is a little more complex than that but we want to keep this discussion short and to the point).

Following the style of the standard library containers, we define our iterator type as a nested class named iterator in the Generator class.

Compared with a traditional container such as std::vector which stores all values in memory and can randomly access those values, our coroutine can only store the latest value in the sequence. In C++ terms our generator supports a simple input_iterator  for which we need only construct one iterator object (in the Generator::begin method) that points to the underlying Promise object. If you are interested you can read more about iterator styles and concept requirements on this iterators page.

To conform to C++20 iterator concepts our nested iterator class has to define a set of type traits and a default constructor:

class iterator
{
public:
    using value_type = Promise::value_type;
    using iterator_category = std::input_iterator_tag;
    using difference_type =  std::ptrdiff_t;
    
    iterator() = default;
    iterator(Generator& generator) : generator{&generator}
    {}

    // iterator methods below

private:
    Generator* generator{};
};

The value_type identifies the type of object that the iterator acessews; in this case a std::optional<DataPoint> which we already have as type trait in our promise class. The other two type traits are required to ensure C++ algorithms generate the correct code for an input iterator.

The iterator requires access to the underlying Generator object so it can retrieve the value from the Promise so we pass that as a constructor argument and store it in a private variable.

An iterator is considered to be a pointer to the underlying sequence so we supply an operator* method to retrieve the current iteration value. The recommended approach for an iterator is to return a reference to the original data (or a constant reference for readonly access) but we will simply return the value from the promise: it’s a simpler solution and is sufficient for our purposes.

value_type operator*() const { 
    if (generator) {
        return generator->handle.promise().get_value();
    }
    return {}; 
}

We should also provide operator-> for working with pointers:

value_type operator*() const {    
    if (generator) {
        return generator->handle.promise().get_value();    
    }
    return {};  
}

Next we require the ability to move the iterator forward using operator++:

iterator& operator++() {
    if (generator&& generator->handle) {
        generator->handle.resume();
    }
    return *this;
}

iterator& operator++(int) {
    return ++(*this);
}

The iterator concepts require both the prefix and postfix versions despite the fact we’re not going to use the postfix version. The increment operators simply resumes the coroutine executing code up to the next yield statement (or the end of the coroutine). Note that a C++ input iterator does not need to allow access to previous data values which is why we only need the one iterator object.

We can now write our Generator::begin method to create the iterator and step forward to the first sequence value:

template <typename T>
class Generator
{
    class Promise {...}
    class iterator {...}

    iterator begin() {
        auto it = iterator{*this};
        return ++it;
    }

    // omitted code
};

The last requirement of the iterator is to identify the end of the sequence so that the for loop can terminate as shown in the traditional form of iteration:

for (auto it = stream.begin(); it != stream.end(); ++it)

This is a little more complex with our generator’s input iterator than it is with a container iterator (such as std::vector) because we don’t have all the values stored in memory locations. Instead we have to identify when we have reached the end of the coroutine and test for that condition.

Firstly we add a finished method to the Promise class to simplify testing for the end of the coroutine so that our revised Promise class is:

class Promise
{
public:
    using value_type = std::optional<T>;
    // coroutine lifecycle methods

    value_type get_value() {
       return std::move(value);
    }

    bool finished() {
        return !value.has_value();
    }

private:
    value_type value{};
};

Remember that the template supports movable types so we have to make sure that testing for the end of the coroutine does not read (move) and discard the value in the generator.

Prior to C++17 we had the restriction that both begin and end must return the same type of object from both methods which meant we had to provide the following comparison method:

// end of iteration comparison prior to C++17
bool operator== (const iterator&) const {
    return generator ? generator->handle.promise().finished() : true;
}

The comparison indicates end of iteration if the coroutine has finished or  if there is no generator object because the default constructor was used.

This works but does not properly capture the concept of an input iterator as the following nonsensical test will return false if the coroutine has not ended:

stream.begin() == stream.begin()

To resolve this problem C++17 introduced iterator sentinels which use a different type (class) to mark the end of the iteration loop. For our coroutine we define a data type to act as a marker (sentinel) to indicate that the coroutine has finished:

 struct end_iterator {};

We  provide an operator== for this sentinel type and this is the only operator== we should provide because our previous comparison between two iterator objects does not make sense for input iterators:

bool operator== (const end_iterator&) const {
    return generator? !generator->handle.promise().get_value() : true;
}

C++20 has made some sweeping changes to the way compilers must support comparison operators which affects our operator== method.

We no longer need to provide a corresponding operator!= as the compiler will use the inverted value of our operator==. Neither do we need to implement comparison as friend functions using two versions of each operator with the operands swapped around to support equivalent comparisons such as the following:

generator.begin() == generator.end()
generator.end() == generator.begin()

Under C++20 the compiler must consider swapping the operands when expanding our operator== method. In other words the single operator== method is all we now need to supply. You will notice that many C++20 standard library classes have had redundant comparison operator definitions removed.

Now we have the complete iterator class we can add the sentinel object, begin and end methods to our Generator:

template <typename T>
class Generator
{
    class Promise {...} 
    
    struct end_iterator {};
    class iterator {...}

    iterator begin() { 
        auto it = iterator{*this}; 
        return ++it; 
    }

    end_iterator end() {
        return end_sentinel;
    }

private:
    end_iterator end_sentinel{};
};

The Generator class now supports the range-for loop we showed earlier:

std::cout << "Time (ms)   Data" << std::endl; 
for (auto&& dp: read_data(std::cin)) { 
    std::cout << dp << std::endl; 
}

The iterator also supports standard algorithms that use an input iterator such as copy or transform. There is only one complication with algorithms and that is our use of an end sentinel object. To maintain backward compatibility with existing code the end sentinel versions of the algorithms are defined in the std::ranges namespace so we can use std::ranges::copy and an ostream_iterator to display our data values:

auto stream = read_data(std::cin);
std::cout << "Time (ms)   Data" << std::endl;
std::ranges::copy(stream.begin(), stream.end(),
    std::ostream_iterator<std::optional<DataPoint>>(std::cout,"\n"));

Sometimes C++ can be elegantly simple like a duck floating on a river; but for coroutines we have to be aware of the frantic paddling under the surface just to hold position.

The full example of this code is in the files generator.h and iterator_demo.cpp in the accompanying GitHub repo coroutines-blog. The repo also contains an example of a generator using a movable type (std::unique_ptr) in the file iterator_move_demo.cpp.

Posted in C/C++ Programming | Tagged , | 2 Comments

C++20 Coroutines

C++20 Coroutines

There seems to be a lot of confusion around the implementation of C++20 coroutines, which I think is due to the draft technical specification for C++20 stating that coroutines are a work in progress so we can’t expect full compiler and library support at this point in time.

A lot of the problems probably arise from the lack of official documentation about working with coroutines. We have been given C++ syntax support for coroutines (the co_yield and co_return) but without all of what I consider full library support. The standard library has hooks and basic functionality for supporting coroutines, but we must incorporate this into our own classes. I anticipate that there will be full library support for generator style coroutines in C++23.

The C++20 specification is obviously looking to provide support for parallel (or asynchronous) coroutines using co_await, which makes the implementation of a simpler generator style synchronous coroutines more complex. The implementation requirements for our coroutines utilises a Future and Promise mechanism similar to the std::async mechanism for asynchronous threads.

If you are a Python or C# developer expecting a simple coroutine mechanism, you’ll be disappointed because the C++20 general purpose framework is incomplete. Having said that, there are many blogs and articles on the web that include a template class that supports generator style coroutines. This blog contains a usable coroutine template and example code after discussing what a coroutine is.

What is a Coroutine?

I first came across coroutines via the yield statement in CLU which, like generators in Python (and yield return in C#), are defined using function syntax and accessed using the for loop syntax. They were described as cooperating routines (not concurrent routines) which execute on a single thread. There are other styles of coroutines, and Wikipedia provides a good starting point for comparing functions, generators and threads.

For this blog, I’ll concentrate on coroutines that execute in the context of the caller and allow two separate blocks of code to interleave flow-of-control between them.

The new C++20 co_yield statement allows one routine to provide a piece of data and, at the same time, return control flow to the calling routine for processing. This is a long-winded way of saying they provide a single threaded implementation of the producer-consumer pattern.

We can see a classic coroutine’s producer-consumer interaction in the following UML Sequence Diagram:

The control bars on the diagram show the flow of control moving from one routine to another.

When the flow of control is transferred from one routine to another, the current state of the routine must be saved and then restored when the routine resumes. In the case of the consumer, this happens as part of the usual function call mechanism where the current stack frame holds the state of the routine. In the case of the producer (the coroutine), extra support from the compiler and runtime system is required to save the producer’s stack frame whenever a value is yielded up to the consumer.

The C++20 specification says that the coroutine state is saved on the heap which means they are not suitable for embedded systems that do not use dynamic memory. But, the specification does state that a given implementation can optimise away heap usage if:

  • the lifetime of the coroutine is strictly within the lifetime of the caller
  • the size of coroutine state can be determined at compile time

In practice, for the simple generator coroutines we are considering in this blog they meet this criteria and could save the coroutine state in the callers stack frame. Examining the heap usage for both examples in this blog shows that GCC-11 and Clang-12 (the latest at the time of writing) both use the heap to save the coroutine state. Given compiler support for coroutines is relatively new and evolving it is quite possible that later versions may optimise this code, or support compiler options to enable or disable saving coroutine state in dynamic memory.

To support the  save and restore of the coroutine state we must provide a supporting class that integrates with the coroutine library support provided in the #include <coroutine> header file. This is where the current complexity of implementing a coroutine lies.

C++20 Coroutine Support

To put C++20 coroutines into context, we can create a coroutine to yield up “Hello world!” as three separate objects as follows (not we need to include the <coroutine> header file):

#include <coroutine>

X coroutine()
{
    co_yield "Hello ";
    co_yield "world";
    co_return "!";
}

The first thing to note is that this is not a function definition! We have just used the function syntax to define a block of code that can be passed arguments when instantiated. A function would have a return statement (or an implied return for void functions), whereas here, we yield three separate values. Note that you cannot put a return statement in a coroutine.

Secondly, we return some unknown (for now) object of type X. This is the object that implements the coroutine. The compiler will reorder our code block to implement the coroutine mechanism with its save and restore state code, but it currently needs a little help from us in writing this supporting class X.

Inside the coroutine code block, we use co_yield to yield a value and save the coroutine state, and co_return to yield a value and discard the state.

This is a naive example of a coroutine in that it must be used exactly three times as shown by this example:

auto x = coroutine();
std::cout << x.next();
std::cout << x.next();
std::cout << x.next();
std::cout << std::endl;

Once we have consumed all of the yield values, the coroutine terminates and releases all memory used to store its state.

In our example, the coroutine object has a next() method which will:

  1. suspend the current consumer code
  2. restore the state of the coroutine (producer)
  3. resume the coroutine code from the last yield statement (or the start of the code block)
  4. save the value of the next yield statement
  5. save the coroutine state
  6. restore the consumer state
  7. resume the consumer by returning the saved value from the yield statement

Currently, there is no standard library template for our class X. Hence, we have to look at the available library support for coroutines. An example templated class is shown later when we look at a practical application of coroutines but at the moment we’ll look at a basic hard-coded example.

To write our own coroutine support class X we need to support the lifecycle operations by providing implementations for specific methods. The C++20 standard defines these method requirements using concepts which we introduce in our blog posts for concepts part1 and part2. To keep this blog focused on coroutines we’ll adopt the traditional C++ approach of stating the implied methods we need to provide as part of our class.

Unfortunately, C++20 coroutines require us to provide two inter-related supporting classes:

  • a class to save the coroutine state and save the yield data – usually called the promise
  • a class to manage the coroutine (promise) object – this is our class X, traditionally called the future

In the promise object we will need to provide several lifecycle methods. For now, we’ll just look at a supporting the yield statements and ignore the methods required for managing the coroutine state.

As our coroutine uses a co_yield statement with a const char* value we need a method with the following signature:

std::suspend_always yield_value(const char* value);

The argument is the yield object, and the return type tells the runtime system whether to save the thread state which, for a single threaded coroutine we always want to do by returning a std::suspend_always object. There is the ability to return a  std::suspend_never which allows for asynchronous coroutines but that leads to a lot complications about managing and resuming suspended threads which we don’t want to get involved with for our simple synchronous coroutine.

The yield_value method must save its argument so it can be returned to the calling routine (consumer). A typical implementation is:

std::suspend_always yield_value(const char* value) {
    this->value = value;
    return {};
}

If you haven’t already come across the modern C++ syntax of return {}  it simply means create a default constructed object of the return type for this method. We could have also used return std::suspend_always{}.

To support the co_return statement which yields a value but doesn’t save state we need a second lifecycle method:

void return_value(const char* value) {
    this->value = std::move(value);
}

As co_return terminates the coroutine, the lifecycle function has void return type because the coroutine state will be destroyed.

Without looking in detail at the implementation of class X we can show how the compiler might expand our consumer code into inline sequential operations to see the lifecycle methods. In the following code the method promise() provides access to the promise object which saves the coroutine state and the yield value. A next() method can retrieve the saved value from the promise:

auto x = coroutine();
x.promise().yield_value("Hello "); // save value and state
std::cout << x.next();
x.promise().yield_value("world");  // save value and state
std::cout << x.next();
x.promise().return_value("!");    // save value, discard state
std::cout << x.next();
std::cout << std::endl;

You can see that the compiler reorders our two code blocks into a single sequential set of interleaved method calls.

Just before we look at a complete example with all the templated code for the promise and future classes, we should look at an alternate style of writing our generator coroutine:

X coroutine()
{
    co_yield "Hello ";
    co_yield "world";
    co_yield "!";
//  implied co_return;
}

In this approach, we use co_yield for all our values and don’t identify the final yield with a separate co_return value; we simply allow the code block to terminate. The compiler will provide a co_return statement (with no return value) to terminate the coroutine.

For a co_return (no value) statement we need a separate lifecycle method void return_void():

void return_void() {
    this->value = nullptr;
}

A promise class cannot provide both a return_value and a return_void method which are considered mutually exclusive.

For this trivial example, our consumer code does not change as it reads exactly three values. In a more realistic example where we loop reading values from our coroutine, we will have to mark the end of the data stream in some way. This example uses pointers, so a nullptr can be used to terminate a loop; otherwise a std::optional object is the most general approach.

Our new terminating consumer would look like:

auto x = coroutine();
while (const char* item = x.next()) {
    std::cout << item;
}
std::cout << std::endl;

We could have coded the loop as

while (auto item = x.next()) {

but have chosen to keep the explicit type declaration so it is clear how the generator is used.

The full example of this code is in the file char_demo.cpp in the accompanying GitHub repo called coroutines-blog.

Working with Coroutines

Coroutines are a convenient mechanism for implementing multiple algorithms in separate code blocks rather than combining those algorithms into a single block of convoluted code.

As an example consider an embedded device that monitors data values, such as temperature, and writes these values to a serial port (RS232) complete with a timestamp, which could be time from device boot or a network synchronised clock time.

The timestamp and value are stored as float values (4 bytes each) and simply dumped as binary to the serial byte stream to reduce code complexity and data size. The data stream looks like the following (little-endian byte order):

In our data collector application, we want to read this stream into a struct with two float values and then print out those values to a logging device, such as a display screen, with an alarm message if the value exceeds a given threshold.

Our combined algorithm would involve:

  1. read 4 bytes to construct the time stamp
  2. read 4 bytes to construct the data value
  3. create the data structure with both float values
  4. print the data structure values
  5. print a warning message if the data value exceeds a threshold

Now consider what happens when the data stream ends part way through a float value. Our code has the handle the end of stream error condition for each of the separate read operations (that’s eight separate one byte read operations). Even with good use of functions this code will be a complex set of conditional tests and data reconstruction statements – challenging to write and maintain.

Using coroutines we can break this down into two steps:

  1. parse the data
  2. display the data and optional warning message

In practice we’ll go further and break the first step into:

  1. parse raw bytes into float values
  2. store a timestamp and data point in a structure

Coroutine Future Template

The first step is to create a template for the classes we’ve discussed representing the coroutine future and the data promise.

Promise data holder

Here’s the Promise as a nested structure in the Future class:

template <typename T>
class Future
{
    class Promise
    {
    public:
        using value_type = std::optional<T>;
 
        Promise() = default;
        std::suspend_always initial_suspend() { return {}; }
        std::suspend_always final_suspend() noexcept { return {}; }
        void unhandled_exception() { 
            std::rethrow_exception(std::move(std::current_exception())); 
        }

        std::suspend_always yield_value(T value) {
            this->value = std::move(value);
            return {};
        }

        void return_void() {
            this->value = std::nullopt;
        }

        inline Future get_return_object()

        value_type get_value() {
            return value;
        }

    private:
        value_type value{};
    };
    …
};

The Promise structure (which we define as private to the enclosing Future class) saves a single data value as a private std::optional object with a get_value accessor method. By using a std::optional object we can use std::nullopt to test for the end of the coroutine after the return_void method has been called. We’ve followed the C++ template meta-programming style of defining the value_type type trait so we can interrogate the class to determine its underlying data type.

We provide a default constructor and the two lifecycle methods required for a coroutine promise object (initial_suspend  and final_suspend) which will always suspend the coroutine so that we work on a single thread. These lifecycle methods are required but are just standard implementations that don’t need to be examined further.

We also need to specify how the framework should handle uncaught exceptions. Rather than digress into exception handling and recovery mechanisms we’ll just simply rethrow the exception and let the caller deal with it.

The yield_value and return_void methods discussed earlier are defined to copy or move the yield value to the std::optional holder or use std::nullopt to indicate the end of the coroutine. Note the use of std::move to ensure we support move semantics for our pass by value function argument: this is necessary if we want to yield a std::unique_ptr for example.

The other method we have to provide is get_return_object() which must return the Future object for this promise. As we haven’t yet completed the class Future definition, we need to implement this method after completing the Future/Promise classes.

Future coroutine context manager

The Future class itself provides a constructor/destructor for managing the composite promise object and a mechanism to obtain the values yielded by the coroutine (our next method discussed previously):

template <typename T>
class Future
{
    struct Promise { … };

public:
    using value_type = T;
    using promise_type = Promise;

    explicit Future(std::coroutine_handle<Promise> handle)
    : handle (handle)
    {}

    ~Future() {
        if (handle) { handle.destroy(); }
    }

    // Promise::value_type next() { … }

private:
    std::coroutine_handle<Promise> handle;

};

We have standard library support for managing the promise object via the std::coroutine_handle template class passed as an argument to the future constructor. We need to store this coroutine_handle object and ensure we call its destroy() method when we destroy the future object.

We have one further library requirement for our class Future in that it must define a nested type called promise_type so the standard library templates can determine the underlying promise class type:

using promise_type = Promise;

Our implementation of the next method must check if the promise is still valid or return an empty std::optional object:

Promise::value_type next() {
    if (handle) {
        handle.resume();
        return handle.promise().get_value();
    }
    else {
        return {};
    }
}

To return the yield value from our coroutine:

  • we simply check that the coroutine still exists (its handle hasn’t been destroyed)
  • call resume() on the coroutine_handle to execute code to the next co_yield statement
  • return the value saved by the promise yield_value() method: the library support will handle the restore and save of the coroutine state.
  • if the coroutine has been destroyed, we return an empty value (std::nullopt).

Now we have defined the Future class we can complete the Promise object with the required get_return_object() method:

template <typename T>
inline Future<T> Future<T>::Promise::get_return_object()
{
    return Future{ std::coroutine_handle<Promise>::from_promise(*this) };
}

We use the std::from_promise method to create the coroutine_handle that is passed to the Future constructor.

As you can see, this is just boilerplate code for creating the Future/Promise classes.

That’s the background work done, and this template can be reused with most data types and classes: I’d say all classes, but there will always be an edge case that this template cannot support.

Data Collection Coroutine

Now we can focus on solving our real data handling problem. The first step is to write a coroutine to read from an istream object and yield up float values:

Future<float> read_stream(std::istream& in)
{
    int count{};
    char byte;
    while (in.get(byte)) {
        data = data << 8 | static_cast<unsigned char>(byte);
        if (++count == 4) {
            co_yield *reinterpret_cast<float*>(&data);
            data = 0;
            count = 0;
        }
    }
}

We just read blocks of 4 bytes and shift them into a 32-bit word and use a type cast to reinterpret this memory location as a float for the co_yield statement. If the data stream ends part way through a 4-byte word we ignore any partial value and terminate the coroutine.

We can prove this coroutine works by just printing out each float value that we read from standard input:

auto raw_data = read_stream(std::cin);
while (auto next = raw_data.next()) {
    std::cout << *next << std::endl;
}

As we want to save pairs of values into a data structure so we can use a second coroutine to encapsulate that algorithm:

struct DataPoint
{
    float timestamp;
    float data;
};

Future<DataPoint> read_data(std::istream& in)
{
    std::optional<float> first{};
    auto raw_data = read_stream(in);
    while (auto next = raw_data.next()) {
        if (first) {
            co_yield DataPoint{*first, *next};
            first = std::nullopt;
        }
        else {
            first = next;
        }
    }
}

Again, if the input stream terminates part way through a timestamp and data point, we discard the incomplete datum.

The last step in this example is to process our timestamp data values:

static constexpr float threshold{25.0};

int main()
{
    std::cout << std::fixed << std::setprecision(2);
    std::cout << "Time (ms)   Data" << std::endl;
    auto values = read_data(std::cin);
    while (auto n = values.next()) {
        std::cout << std::setw(8) << n->timestamp
                  << std::setw(8) << n->data
                  << (n->data > threshold ? " ***Threshold exceeded***" : "")
                  << std::endl;
    }
    return 0;
}

This code shows how we intend to process the data and all the nitty-gritty code for converting bytes to float values to a data structure is hidden inside the coroutines.

Hopefully you can now start to see the benefit of using coroutines to separate out different aspects of a complex algorithm into simpler code blocks. Currently, with C++20, we need to jump through the hoop of creating the Future/Promise classes. Still, I hope C++23 will provide something similar to this template so we can concentrate on writing our code not working with the supporting code.

You can test this code using a simple Python script such as the following to generate four hardcoded datapoints:

import struct
import sys

start = 0.0
for ms, value in enumerate([20.1, 20.9, 20.8, 21.1]):
    sys.stdout.buffer.write(struct.pack('>ff', start + ms*0.1, value))

If our compiled executable is called datapoint_demo, we can use the following Linux shell pipeline to show the coroutines working:

# Linux
python3 test_temp.py | ./datapoint_demo

With the following output:

Time (ms)   Data
    0.00   20.10
    0.10   20.90
    0.20   20.80
    0.30   21.10 ***Threshold exceeded***

The full example of this code is in the files future.h and datapoint_demo.cpp in the accompanying GitHub repo called coroutines-blog. In order to compiler these examples using GCC (version 10 onwards) you will need to add the -std=c++20 and  -fcoroutines to the g++ command line.

In the next follow up post I’ll add iterator support to the Future template class so the coroutine can be used in a for loop or as an input iterator to a library algorithm.

Summary

Coroutines are a powerful programming technique for separating different aspects of a complex algorithm into discrete and simpler code blocks.

C++20, like Python and C#, uses the function syntax to define the coroutine code blocks, which many people initially find confusing as this is just syntax to provide the coroutine statements.

The current absence of a simple standard library template for the generator style coroutines we’ve shown here makes it harder for developers to start using coroutines. It’s a bit like a jigsaw puzzle where you haven’t been shown the picture for the completed puzzle – initially daunting but the puzzle can be solved. Hopefully the Future template shown here has provided a picture you can use for your own puzzles.

Posted in C/C++ Programming | Tagged , | 5 Comments

CMake Part 4 – Windows 10 Host

Introduction

In previous blog posts in this series (Part 1Part 2 and Part 3), I looked at using CMake on a Linux host to configure a build to cross compile to target hardware such as the STM32F4 Series.

In this post, we’ll work with the GNU Arm Embedded Toolchain on a Windows 10 Host.

The first part of this blog discusses running the Windows hosted versions of CMake, GNU Arm Embedded Toolchain and GNU Make. An alternative approach, briefly discussed at the end of the blog, is to use container technology such as Windows Subsystem for Linux (WSL2) or Docker, or use a full-blown Linux Virtual Machine  hosted in VirtualBox or VMWare.

CMake on Windows

The first point to make about CMake on Windows is that it defaults to generating build files for Visual Studio and assumes you will be using the Microsoft Visual Studio Toolset.

The second point to make is that the CMake command line is subtly different. Not by much, but enough to confuse  us, as some options that work under Linux are not available under Windows.

The third point is around command and file naming conventions. Microsoft uses a .exe suffix to identify executable programs, but there is no requirement to include this suffix when invoking an executable from the command line. Running the C/C++ compiler is a matter of entering the command CL or CL.EXE (case is ignored but is usually shown as uppercase in documentation). CMake may require the full pathname to the compiler executable, including the .exe suffix. This isn’t used with Linux executables, leading to a minor difference between the command name (CL) and the executable file name (CL.EXE) not found under Linux.

The fourth and last point is that CMake generates build files for Microsoft NMake. Running an NMake build requires a custom environment to be set up by running the vcvarsall.bat supplied with the Microsoft VS Toolset. This isn’t a big problem, but any automated build script must include this.

We need to modify our Linux CMake configuration and supporting build script to address these portability problems. I’ll assume you’re familiar with the Linux based embedded system project that we’ve used in previous posts and just focus on changes required for Windows.

Toolchain Configuration

By default, The GNU Arm Toolchain for Windows is installed in the C:\Program Files (x86)\GNU Arm Embedded Toolchain\ as a subfolder named after the release version. The Arm toolchain does not include the GNU Make command, so we must download this separately, either as a standalone program or as part of a suite of GNU development tools.

Installing GNU development tools on a Linux host is achieved using the system package management commands (apt for Debian/Ubuntu and dnf or yum for Fedora/CentOS/RHEL). However, it’s a little more complex on Windows as there is no official Microsoft package of GNU tools.

We, therefore, need to rely on third-party providers for the GNU development tools, of which MinGW and Cygwin are the most popular.

Additionally there are various standalone versions of GNU Make ported to Windows: but none of the one’s I’ve come across appear to be supported or updated on a regular basis which makes me wary of using them.

I’m not going to digress into the details of installing GNU Make under Windows but assume that a Windows version of the make command is available. In my case, I use the MSYS2 installer which includes the Mingw-w64 development tools (these must be added to the base MinGW installation).

We already have a working CMake toolchain file for our embedded project  (toolchain-STM32F407.cmake), but we will need to make some minor modifications to handle the .exe suffix on the toolchain filenames (not found on Linux). The GitHub project supporting this blog has all the configuration files for Windows.

For CMake we will assume that our Windows environment path variable (%PATH%) is configured to include the directories containing the GNU Arm toolchain (as we have done for the Linux build). This simplifies the toolchain file changes and avoids hard coding filesystem paths into the configuration file. We use a PowerShell script (shown later) to configure the Windows program search path (%PATH%).

In our sample project’s toolchain file (toolchain-STM32F407.cmake) we just add conditional code for including executable suffixes when locating the toolchain executable files:

if (CMAKE_HOST_WIN32)
  set (SUFFIX .exe)
else()
  set (SUFFIX "")
endif()

find_program(CROSS_GCC_PATH arm-none-eabi-gcc${SUFFIX})
get_filename_component(TOOLCHAIN ${CROSS_GCC_PATH} PATH)

set(CMAKE_C_COMPILER ${TOOLCHAIN}/arm-none-eabi-gcc${SUFFIX})
set(CMAKE_Cxx_COMPILER ${TOOLCHAIN}/arm-none-eabi-g++${SUFFIX})
set(TOOLCHAIN_AS ${TOOLCHAIN}/arm-none-eabi-as${SUFFIX} CACHE STRING "arm-none-eabi-as")
set(TOOLCHAIN_LD ${TOOLCHAIN}/arm-none-eabi-ld${SUFFIX} CACHE STRING "arm-none-eabi-ld")
set(TOOLCHAIN_SIZE ${TOOLCHAIN}/arm-none-eabi-size${SUFFIX} CACHE STRING "arm-none-eabi-size")

If we are on Windows, The CMAKE_HOST_WIN32 variable is set to true, allowing the script to set a variable with the required host filename suffix. No other changes to the toolchain file toolchain-STM32F407.cmake are required.

As an aside, if we had decided to adopt a simpler approach for our toolchain configuration, where we only required the compiler and linker without the other build tools, then we could have just specified the compiler command names in the appropriate CMAKE variables:

set(CMAKE_C_COMPILER arm-none-eabi-gcc)
set(CMAKE_CXX_COMPILER arm-none-eabi-g++)

In this case, we are working with command names and not the filenames, so there is no requirement to include a .exe suffix. Configuring the toolchain using this approach means no changes are required to the toolchain file to work with Windows rather than Linux. Our changes are required because we are working withe filenames not commands.

Project Configuration

No changes are required to the project file CMakeLists.txt, which shows that CMake can be configured to work with both Linux hosts (including macOS) and Windows with a single version of the configuration files.

But we do need to look at changes required to the cmake command line to generate the build files and run the build itself.

CMake Command Line

On Linux, we typically add our development commands to the standard search path (defined by the $PATH variable). In contrast, on Windows, we tend not to extend the Windows search path to include development tools mainly because we are less likely to be working at a Windows command line.

By not extending the Windows search path,  we must use full path names for each toolchain command or temporarily extend the path to include the toolchain folder location. As with Linux, a script to manage the build process is essential.

The first change to the cmake command is to specify using the GNU Make code generator instead of Visual Studio Tools by adding a -G “Unix Makefiles” option.

Not only do we need to use the -G option, but we need to specify the location of the make command otherwise we will default to using nmake which we haven’t installed as we are not using the Microsoft host toolchain.

To make it easier to read and maintain a command script, we define a Windows variable for the location of the make program rather than include it on the search path. But we will extend the search path (%PATH%) to include the required GNU Arm toolchain folder.

A simple set of Windows commands to generate a debug build using msys64 Make and GNU Arm toolchain version 2020-q4-major looks like the following (the ^ symbol is the command continuation character for Windows):

set CMAKE=”C:\Program Files\CMake\bin\cmake.exe”
set MAKE=”C:\msys64\usr\bin\make.exe”
set ARMTOOLS=C:\Program Files (x86)\GNU Arm Embedded Toolchain\10 2020-q4-major\bin
set PATH=%PATH%;%ARMTOOLS%

%CMAKE% -S . -B build/debug   ^
  -G “Unix Makefiles”         ^
  -DCMAKE_MAKE_PROGRAM=%MAKE% ^
  -DCMAKE_TOOLCHAIN_FILE=toolchain-STM32F407.cmake

Currently, the Windows version of the cmake command does not support the –build option, so we have to invoke the make command directly from the command line. Using the -C option, we can avoid changing directories to work within our project root folder. We can optionally add a VERBOSE=1 to the end of the command to see the build commands as they are executed.

To build the debug version of our project we use:

set MAKE="C:\msys64\usr\bin\make.exe"
%MAKE% -C build/debug VERBOSE=1

A clean build requires adding the clean target to the make command:

set MAKE="C:\msys64\usr\bin\make.exe"
%MAKE% -C build/debug clean

That’s it. All the other CMake command options we used in the Linux build work under Windows. At the end of the blog is an example of this simple command script and a more functional PowerShell script that wraps up the CMake build commands.

An alternative to using the Windows hosted ARM toolchain is to use a virtual environment or container to perform the build under Linux.

WSL and Docker

Both Windows Subsystem for Linux (WSL2) and Docker are containers or self contained execution environments that run Linux and have access to the Windows file system but typically don’t provide a desktop environment (but could do so). We discuss using Docker containers in out blog An Introduction to Docker for Embedded Developers – Part 1 Getting Started.

For both WSL2 and Docker, the host development toolchain (for the make command) and the GNU Arm Embedded Toolchain (for Linux) need to be installed in the container.

These days both WSL2 and Docker can be accessed from Visual Studio Code running on the Windows host. Microsoft extensions (Remote – WSL and Remote – Containers) are required to access the virtual environments, and there are additional third-party VS Code extensions mainly for Docker but also for WSL. This means you can store and edit the code on Windows filesystem but run the build commands in the WSL2 or Docker container, all from within the Visual Studio Code IDE.

I have found using WSL2 from VS Code an effective mechanism for developing and building our training projects. The resultant ELF file in the Windows filesystem can be downloaded to our target hardware (we use Segger Ozone) or run in our customised version of the XPack QEMU emulator from within windows.

VirtualBox and VMware

At Feabhas, we use VirtualBox to build self-contained Linux VMs and distribute these for online training as Open Virtualization or OVA files. Both VMWare and VirtualBox can import OVA files and be configured to access folders on the host through their Shared folder settings. But both products require additional software to be installed in the Linux guest operating system to gain access to the Windows host filesystem.

We use VirtualBox (without shared folders) to build our training projects and run the compiled ELF image in our custom version of QEMU on the Linux guest. We can also map the JLink USB port from the Windows host into the VM in order to use Ozone (in the VM) to download the ELF images to our target hardware.

VirtualBox and VMware both provide a good environment for developing embedded projects on a Windows Host.

Note: WSL2 (and Docker on Windows 10 Pro) use the Microsoft Windows Hypervisor Platform (Hyper-V) feature which, up until recently, has prevented VirtualBox and VMWare VMs from running correctly (see WSL2 FAQ). Recent versions of VMWare and VirtualBox (July 2021) can now coexist with WSL2 and the Hyper-V platform and hopefully will continue to do so. There does appear to be a noticeable drop off in the performance of VirtualBox emulation when the Hyper-V platform is enabled, so my personal preference is to work with WSL2.

Summary

While I prefer Linux and macOS, I use Windows; it is my primary development environment. I generally work directly with the Windows version of the Arm and Segger tools for embedded system development. Recent improvements to WSL and VS Code means that I now find this combination as good as Windows hosted tools and easier to use than working with VirtualBox (or VMWare).

Initially, getting CMake to build an embedded (cross compiler) project under Windows was painful: a term often used when discussing CMake, and Windows for that matter. Once I’d worked out that switching to the GNU Toolchain did not also default to using GNU Make rather than NMake, it turned out to be straightforward to create a portable configuration. If we hadn’t wanted to find the paths to the additional GNU Arm build commands (like as and ld) we would not have had to make any changes to the CMake configuration files whatsoever.

But, as with Linux, it is the complex and necessary command line options used to set up the toolchain and build configurations that are the main problem.

It’s a shame that the Windows version of CMake does not support the –build option so that we have to revert to the old-fashioned approach of running make directly.

It would have been easier to work on Windows if the GNU Arm Embedded Toolchain (for Windows) included a version of make so that we didn’t have to find and install it from elsewhere.

And finally, using a Windows command or PowerShell script to simplify using the cmake and make commands is essential.

A later article on CMake Presets describes how to use the presets feature added at CMake 3.19 in 2020.

Postscript – Simple Build Scripts

The GitHub project supporting this blog contains a command script (configure.bat) containing the build commands in this blog. The repo also includes a more functional PowerShell script (build.ps1) with a supporting Command script (build.bat) for building debug and release projects under Linux.

Windows Configure Script

A simple command script (configure.bat) based on the examples in the blog:

set CMAKE=”C:\Program Files\CMake\bin\cmake.exe”
set MAKE=C:\msys64\usr\bin\make.exe
%CMAKE% -S . -B build/debug   ^
  -G "Unix Makefiles"         ^
  -DCMAKE_MAKE_PROGRAM=%MAKE% ^
  -DCMAKE_TOOLCHAIN_FILE=toolchain-STM32F407.cmake
%MAKE% -C build/debug VERBOSE=1

Windows Build Scripts

A more complex PowerShell script (build.ps1) supports command line options, but this must be invoked via a command script (build.bat) to configure the security permissions.

Windows applies security restrictions to prevent running unsigned PowerShell scripts from the command prompt (or from the Visual Studio Code tasks). The supporting build.bat script is used to start the PowerShell build script without security checks:

powershell.exe -noprofile -executionpolicy bypass -file build.ps1 %*

The build.ps1 script is a port of the shell script (build.sh) for Linux:

Set-StrictMode -version latest

$SCRIPT = Split-Path $PSCommandPath -Leaf;
$USAGE = "Usage: $SCRIPT [-v | --verbose | --rtos] [ reset | clean | debug | release ]"

$CMAKE = 'C:\Program Files\CMake\bin\cmake.exe'
$MAKE = 'C:\msys64\usr\bin\make.exe'
$ARM_TOOLCHAIN = 'C:\Program Files (x86)\GNU Arm Embedded Toolchain\10 2020-q4-major\bin'

$env:PATH += ";$ARM_TOOLCHAIN"

$BUILD= 'build'
$BTYPE = 'DEBUG'
$BUILD_DIR = "$BUILD\debug"
$CLEAN = ''
$RESET = ''
$VERBOSE = ''
$RTOS = ''

switch -regex ($args)
{
  '^(--help|-h|)$'    { Write-Output "$USAGE"; exit 0 }
  '^(--verbose|-v)$'  { $VERBOSE = 'SHELL="/bin/sh -x"'  }
  '^--rtos$'          { $RTOS = '-DUSE_RTOS=ON'  }
  '^debug$'           { $BTYPE = 'DEBUG';   $BUILD_DIR = "$BUILD\debug" }
  '^release$'         { $BTYPE = 'RELEASE'; $BUILD_DIR = "$BUILD\release"  }
  '^clean$'           { $CLEAN = '1'  }
  '^reset$'           { $RESET = '1'  }
  default             { Write-Error "Unknown option $arg"; Show-Usage }
}

if ( $RESET -and (Test-Path $BUILD_DIR -PathType Container) ) {
  Remove-Item $BUILD_DIR -Recurse
}

$TOOLCHAIN = '-DCMAKE_TOOLCHAIN_FILE=toolchain-STM32F407.cmake'
$CMAKE_ARGS = '-G', 'Unix Makefiles', "-DCMAKE_MAKE_PROGRAM=$MAKE"
$BUILD_TYPE = "-DCMAKE_BUILD_TYPE=$BTYPE"

&$CMAKE -S . -B $BUILD_DIR $CMAKE_ARGS `
  --warn-uninitialized $BUILD_TYPE $TOOLCHAIN $RTOS

if ( $CLEAN  -ne '' ) {
  &$MAKE -C $BUILD_DIR clean
}

&$MAKE -C $BUILD_DIR $VERBOSE

 

Posted in ARM, Build-systems, C/C++ Programming, CMSIS, Cortex, Toolchain | Tagged , , | Leave a comment