Modern Embedded C++ – Deprecation of volatile

Compiling the following, straightforward code:

volatile int x;

int main() {
    x += 10;

Using g++ with the directive -std=c++17 builds without any warnings or errors. However, change the directive to -std=c++20, and the result is:

source>: In function 'int main()':
<source>:5:5: warning: compound assignment with 'volatile'-qualified left operand is deprecated [-Wvolatile]
    5 |   x += 10;
      |   ~~^~~~~
Compiler returned: 0

The new C++ standard, C++20, has deprecated volatile! So, what does this mean for the embedded programmer?

We covered the need for and use of, volatile in a previous posting. That post (written in April 2020) did state that:

In C++20 many general uses of volatile are being deprecated.

The key phrase here is general uses.

Volatile in embedded

Continue reading

Posted in ARM, C/C++ Programming | Tagged | Leave a comment

GitHub Codespaces and online development

In our previous posting, we discussed using VSCode’s Dev Container extension to allow running workspaces directly within a Docker container.

In December 2020, I was granted early access to a new feature developed by GitHub called Codespaces. Codespaces offers an online VSCode development environment, enabling you to develop entirely in the cloud.

The great news is that Codespaces uses the same core process, and file structure, as Dev Containers; meaning once we have our .devcontainer folder setup (if you are unfamiliar with Dev Containers it is worth reading the previous blog first) “it just works” online.

TDD in the cloud

Using our example from the previous blog (GoogleTest and meson) to run using Codespaces is Simples!.

When granted Codespaces access, open the GitHub project and under the Clone or download button you are offered the option Open with Codespaces

Continue reading

Posted in Agile, C/C++ Programming, General, Industry Analysis, Testing | Tagged , , , , , | Leave a comment

VSCode, Dev Containers and Docker: moving software development forward

Long term readers of this blog will know our devotion to using container-based technology, especially Docker, to significantly improve software quality through repeatable builds.

In the Autumn/fall of 2020, Microsoft introduced a Visual Studio Code (VSCode) extension Remote – Containers. With one quick stroke, this extension allows you to open a VSCode project within a Docker container.

Getting started with Dev Containers and Docker

There are several different approaches to using Dev Containers. In this post, we shall cover three options:

  1. Using an existing Docker image from Docker Hub
  2. Using a pre-build Microsoft container setup
  3. Using a custom Docker image based on a project specific Dockerfile

There are a couple of prerequisites:

Using an existing Docker image – TDD in C with Ceedling

Anyone using or experimenting with Test-Driven-Development in C will probably be aware of Ceedling, unity and CMock.

Whether or not you have Ceedling, or any dependents, such as Ruby, installed we can begin using Dev Container with an existing Dockerhub container image. Containerisation ensures we can quickly get up and running with Ceedling in a known environment. In true ‘Blue Peter‘ style, we happen to have a pre-built Ceedling based Docker image on Docker Hub.

  1. Create an empty folder, e.g.
$ mkdir ceedling_test
  1. In the new folder, create another folder called .devcontainer (note the preceding .)
$ cd ceedling_test
$ mkdir .devcontainer
  1. In that new folder, add a file called devcontainer.json with the contents
    "image": "feabhas/ceedling"
  1. Your project structure should now be:
    └── devcontainer.json
  1. Now open VSCode in the working directory
$ code .
  1. VScode will detect the Dev Container configuration file and ask if you want to reopen the folder in a container. Click Reopen in Container.
  2. Open a terminal window within VSCode, and you will be presented with a shell prompt #. We are now running within a Docker container based on the image feabhas/ceedling.
  3. Test the container, e.g.
# ceedling new test_project
Welcome to Ceedling!
      create  test_project/project.yml

Project 'test_project' created!
 - Execute 'ceedling help' from test_project to view available test & build tasks

# cd test_project

# ceedling module:create[widget]
File src/widget.c created
File src/widget.h created
File test/test_widget.c created
Generate Complete

# ceedling test

Test 'test_widget.c'
Generating runner for test_widget.c...
Compiling test_widget_runner.c...
Compiling test_widget.c...
Compiling unity.c...
Compiling widget.c...
Compiling cmock.c...
Linking test_widget.out...
Running test_widget.out...

  Test: test_widget_NeedToImplement
  At line (15): "Need to Implement widget"



After exiting VSCode, all files created will exist in your local file system. Reopening VSCode, you will once again be prompted to reopen in the container.

Continue reading

Posted in Agile, C/C++ Programming, Testing | Tagged , , , , , , , | 1 Comment

Introduction to the ARM® Cortex®-M7 Cache – Part 3 Optimising software to use cache

Caches – Why do we miss?

Cold Start

As stated, both data and instruction caches are required to be invalidated on system start. Therefore, the first load of any object (code or data) cannot be in cache (thus the cold start condition).

One available technique to help with cold-start conditions is the ability to pre-load data into the cache. The ARMv7-M instruction set adds the Preload Data (PLD) instruction. The PLD instruction signals to the memory system that data memory accesses from a specified address are likely shortly. If the address is cacheable, then the memory system responds by pre-loading the cache line containing the specified address into the cache. Unfortunately, there is currently no CMSIS intrinsic support for the PLD instruction.

It is worth noting that some processor data caches implement an automatic prefetcher (e.g. Cortex-A15). This monitors cache misses, and when a pattern is detected, the automatic prefetcher starts linefills in the background. Unfortunately, the Cortex-M7 data cache does not support automatic prefetch.


The other most obvious reason for misses is that of cache capacity. The larger the cache, the higher the probability of a cache hit and the lower the frequency of eviction. However, all this comes at a cost, not only financial but also power.

A larger cache is, naturally, going to contribute to the overall System-on-Chip (SoC) costs, making the end microprocessor more expensive. In high volume designs, this is always a significant factor in SoC choice.

Among all processor components, the cache and memory subsystem generally consume a large portion of the total microprocessor system power, commonly 30-50% of the total power [Zang13]. Caches, thus, add a further level of complexity to the poor-overworked engineer trying to calculate the design’s power model and has an impact on all battery-based designs.


Finally, misses will occur due to natural eviction followed by a reload. So a simple loop such as:

for(uint32_t i = 0; i < N; ++i) {
   dst[i] = src[i];

may result in multiple eviction/reload cycles depending on the memory addresses of dst and src. Also, any dst[i] eviction will result in a memory write as the line is marked dirty. The 4-way data cache goes a long way to help reduce the potential of dst[i] eviction, but because of the pseudo-random replacement policy, it may happen more often than we would expect or like.

Code Optimizations

There are a key number of areas where we, as a software developer, can potentially impact the performance of cache:

  • Algorithms
  • Data structures
  • Code structures

Continue reading

Posted in ARM, C/C++ Programming, CMSIS, Cortex, Design Issues | Tagged , | 3 Comments

Introduction to the ARM® Cortex®-M7 Cache – Part 2 Cache Replacement Policy

Instruction Cache Replacement Policy

Starting with the simpler instruction cache case; when we encounter a cache miss the normal policy is to evict the current cache line and replace it with the new cache line. This is known as a read-allocate policy and is the default on all instruction caches.

Cold start (first read)

It should also be noted that on system power-up the initial state of the cache is unknown. On the ARMv7-M all caches are disabled at reset. Before the cache is accessed, it needs invalidating. As well as each line having a tag associated with it, each line also has a valid flag (V-bit) which indicates whether the current cache line has been populated from main memory or not.

The cache must be invalidated before being enabled. The method for this is implementation-defined. In some cases, invalidation is performed by hardware, whereas in other cases it is a requirement of the boot code. CMSIS- has added a specific instruction cache API to support these operations for the Cortex-M7:

void SCB_InvalidateICache(void);
void SCB_EnableICache(void);    
void SCB_DisableICache(void);

Direct Mapped Cache

So far, we have assumed that the whole of the cache is mapped as one contiguous area, we call this a Direct Mapped cache. A lot of work in trying to improve cache performance has been done over the years and the key metric has been found to be that of cache hit/miss rate, i.e. what is our ratio of reads that result is a cache fetch against those requiring a main memory fetch and cache line eviction.

Studies have shown that the Direct Mapped Cache may not always achieve the best cache hit ratios. Take, for example, the following code:

// file1.c
int filter_algo(int p)
   return result;

// file2.c
void apply_filter(int* p, int N)
   for(int i = 0; i < N, ++i) {
     x[N] = filter_algo(*(p+i)) + k[N];

If, unluckily, apply_filter was in the address range of 0x00004000 and filter_algo was around 0x00005000 then each time apply_filter called on filter_algo, this would result in an eviction of the apply_filter code and the filling of the filter_algo instructions. But upon return, we would have to evict filter_algo code and refill with apply_filter instructions. As the algorithm executed it would cause cache thrashing.

Set-associative cache

Due to the principles of locality, research suggested that rather than a single direct-mapped cache, a better approach is to split the cache into an array of buffers, where two addresses with the same line index can reside in different array indices.

In cache terminology, each array index is known as a way, so we talk about N-Way caches. The number of ways can vary, typically ranging from 2 to 8. For the Cortex-M7 the instruction cache is a 2-way system. When we access an address, we now have ‘N’ possible lines to make a tag match against. The number of valid lines involved in the tag comparison is called the set. Continue reading

Posted in ARM, CMSIS, Cortex, Design Issues | Tagged , , , | 2 Comments

Introduction to the ARM® Cortex®-M7 Cache – Part 1 Cache Basics

For many years, the majority of smaller microprocessor-based systems have typically not used caches. With the launch of the ARMv7 architectures, caches were supported in the ARMv7-A family (e.g. Cortex-A8, etc.) but not supported in the core design of the ARMv7-M micro-controllers such as the Cortex-M3 and Cortex-M4. However, when the Cortex-M7 was announced, it broke that mould by offering cache support for the smaller embedded micro-controller.

This series is broken down in three parts:

  1. Basic principles of cache
  2. Cache replacement policies
  3. Optimising software to use cache

Why introduce caches into the architecture?

The purpose of a cache is to increase the average speed of memory access. The most immediate and obvious benefit is one of improved application performance, which in turn can lead to an enhanced power model. Caches have been used for many years (dating back as far as the 1960s) in high-end processor-based systems.

The driver behind the development and use of a cache is based on The Locality Principle.

Caches operate on two principles of locality:

  • Spatial locality
    • Access to one memory location is likely to be followed by accesses to adjacent locations.
  • Temporal locality
    • Access to an area of memory is likely to be repeated within a short period.

Also of note is Sequentiality – Given that a reference has been made to a particular location s it is likely that within the next several references, a reference to the location of s + 1 will be made. Sequentiality is a restricted type of spatial locality and can be regarded as a subset of it.

In high-end modern systems, there can be many forms of cache, including network and disk caches, but here we will focus on main memory caches. In addition, main memory caches can also be hierarchical, i.e. there are multiple caches between the processor and main memory, often referred too as L1, L2, L3, etc., with L1 being nearest to the processor core.

The Cache

The simplest way to think of a cache is as a small, high-speed buffer placed between the central processor unit (CPU) and main memory that stores blocks of recently referred to main memory.

Once we’re using a cache, each memory read will result in one of two outcomes:

  • A cache hit – the memory for the address is already in cache.
  • A cache miss – the memory access was not in cache, and therefore we have to go out to main memory to access it.

Continue reading

Posted in ARM, CMSIS, Cortex, Design Issues | Tagged , , , , | Leave a comment

TDD with Compiler Explorer

Compiler Explorer (CE) has been around for several years now. When it first appeared on the scene, it immediately became an invaluable tool. Its ability to show generated assembler from given source code across many different compilers and ISAs (Instruction Set Architectures) is “mind-blowing”. We use it extensively when teaching as it allows you to clarify the effect your code can have on both performance and memory usage. 

However, rather than limiting itself to only showing generated assembler, recent developments include the ability to execute the code and examine the program output. Having online support for this is nothing especially new (e.g. ColiruWandbox, etc.), but it’s helpful to have it within one tool.

For example, given a simple “hello, world!” program, we see the standard output in a new tab:

Test-Driven Development

One of the significant benefits to come out of the growth of Agile development is the acceptance that unit testing is just part of the development cycle, rather than a separate activity after coding.

Agile unit-testing, better known as Test-Driven Development, or TDD for short, has lead to a growth of unit-test frameworks, all based around the original xUnit model, typified by GoogleTest (gtest). 

As part of the continuing improvements and feature extensions, CE added support for various libraries to be included as part of the build. Included in this set is support for gtest, as well as two other, more modern, test frameworks; Catch2 and doctest.

Using Google Test with Compiler Explorer

Continue reading

Posted in Agile, C/C++ Programming, Testing | Tagged , , , , , , , | Leave a comment

Side effects and sequence points; why volatile matters


Most embedded programmers, and indeed anyone who has attended a Feabhas programming course, is familiar with using the volatile directive when accessing registers. But it is not always obvious the ‘whys and wherefores’ of the use of volatile.

In this article, we explore why using volatile works, but more importantly, why it is needed in the first place.

Peripheral register access

If we start with a simple, fictitious, example. Suppose we have a peripheral with the following register layout:

register width offset
control byte 0x00
configuration byte 0x01
data byte 0x02
status byte 0x03

with a base address of 0x40020000.

In a previous posting we covered using structures for register access. Let’s assume the following (incorrect) code has been written:

#include <stdint.h>

typedef struct {
    uint8_t ctrl;
    uint8_t cfg;
    uint8_t data;
    uint8_t status;
} Port_t;

Port_t* const port   = (Port_t*) 0x40020000;

void write(uint8_t data)
  port->ctrl = 1;         // Enter configuration mode
  port->cfg  = 3;         // Configure the device
  port->ctrl = 0;         // Enter operational mode.

  while(port->status == 0) 
    // Wait for data...
  port->data = data;

If we compile this code using GCC 8.2 for Arm using the flags:

  • -O3 – high optimaisation
  • -mcpu=cortex-m4

we get the following generated Arm Thumb-2 assembler:

 1. write:
 2.   ldr r3, .L5
 3.   ldrb r2, [r3, #3] @ zero_extendqisi2
 4.   mov r1, #768
 5.   strh r1, [r3] @ movhi
 6.   cbnz r2, .L2
 7. .L3:
 8.   b .L3
 9. .L2:
10.   strb r0, [r3, #2]
11.   bx lr
12. .L5:
13.   .word 1073872896

Complete example

Explaining the assembler

Continue reading

Posted in ARM, C/C++ Programming, CMSIS, Cortex | Tagged , , , , | 2 Comments

Practice makes perfect, part 3 – Idiomatic kata

Previously, we looked at some of the foundational C++ code kata – that is, elements of C++ coding that are absolutely key to master if you’re going to be programming in C++.
Practice makes perfect, part 1 – Code kata
Practice makes perfect, part 2 – foundation kata

In this article I want to introduce what I call ‘idiomatic’ kata.  These exercises have a bit more latitude (and variation) in how they can be implemented.  In that respect they are closer to traditional code kata.  The idea with these kata is to reinforce C++ constructs that aren’t encountered quite so often in typical C++ programming and so are more easily forgotten (or even avoided)

There’s no order to these kata.  None is more important than any other.

If you’re new to C++ there’s nothing wrong with practicing these exercises; but use them as a means to explore new language features.  It’s good to learn how these idiomatic patterns work.

Continue reading

Posted in C/C++ Programming, Design Issues, training | Tagged , , , | Leave a comment

Running the eclipse-mosquitto MQTT Broker in a docker container

I first wrote about MQTT and IoT back in 2012, when I developed a simple C based library to publish and subscribe Quality of Service (QoS) level 0 MQTT messages.

Subsequently, MQTT has grown to be one of the most widely used IoT connectivity protocols with direct support from service such as AWS. Back in 2010, the first open-source MQTT Broker was Mosquitto. Mosquitto is now part of the Eclipse Foundation, and an project, sponsored by

Another area that has grown during the interim period is the use of container technology, such as Docker, for both testing and deployment. We have, also, extensively covered Docker in previous blog posts.

For another internal dogfood project, I wanted to run a local MQTT Broker rather than a web-based broker, such as Mosquitto can be installed natively on Windows, Mac and Linux. Still, one of the significant benefits of Docker is not polluting your working machine with lots of different tools.

Running Mosquitto in a Docker container is, therefore, a perfect test environment. Rather than, as in the previous Docker blog articles, build our own Docker image containing Mosquitto, we can use the official Dockerhub image.

eclipse-mosquitto Docker image

Pull the latest image

I’m assuming you have Docker installed and configured for your local working environment.

First, pull the latest image from Dockerhub:

% docker pull eclipse-mosquitto

Note of caution: the instructions on the Dockerhub site are incorrect!

Run the docker image

Run the basic Docker image with default settings:

% docker run -it --name mosquitto -p 1883:1883 eclipse-mosquitto 
1582194844: mosquitto version 1.6.8 starting
1582194844: Config loaded from /mosquitto/config/mosquitto.conf.
1582194844: Opening ipv4 listen socket on port 1883.
1582194844: Opening ipv6 listen socket on port 1883.

The -p 1883:1883 argument maps the docker container’s default MQTT socket 1883 the localhost ( port 1883. Alternatively, we could map that onto another localhost port if it clashed with a locally running MQTT broker, e.g. -p 11883:1883.

Using the --name directive also allows the container to be stopped and restarted, using:

% docker stop mosquitto


% docker start mosquitto

Testing the eclipse-mosquitto Docker container

To test the setup of the running Mosquitto container, I used my original software, still available on github. To build this, you’ll need a C compiler (ideally gcc or clang) and CMake.

Alternatively, any MQTT client should work for test purposes.


Next, we must subscribe to a topic. In a command window invoke the subscribe client to a topic, the default for our project being hello\world on port, e.g.

% ./mqttsub
MQTT SUB Test Code
Connected to MQTT Server at
Subscribed to MQTT Service hello/world with QoS 0


To test publishing, open another command window and invoke the publisher-client. The publisher-client, by default, publishes 10 messages to the topic hello\world and then closes the connection, e.g.

% ./mqttpub 
MQTT PUB Test Code
Connected to MQTT Server at
Published to MQTT Service hello/world with QoS0
Sent 1 messages
Published to MQTT Service hello/world with QoS0
Sent 2 messages
Published to MQTT Service hello/world with QoS0
Sent 3 messages
Published to MQTT Service hello/world with QoS0
Sent 4 messages
Published to MQTT Service hello/world with QoS0
Sent 5 messages
Published to MQTT Service hello/world with QoS0
Sent 6 messages
Published to MQTT Service hello/world with QoS0
Sent 7 messages
Published to MQTT Service hello/world with QoS0
Sent 8 messages
Published to MQTT Service hello/world with QoS0
Sent 9 messages
Published to MQTT Service hello/world with QoS0
Sent 10 messages

Subscribe output

On returning to the subscriber window, we will see the received message displayed.

Message number 1
Message number 2
Message number 3
Message number 4
Message number 5
Message number 6
Message number 7
Message number 8
Message number 9
Message number 10

Mosquitto window output

Returning the window where the docker image was invoked, various log messages are shown:

1582194844: mosquitto version 1.6.8 starting
1582194844: Config loaded from /mosquitto/config/mosquitto.conf.
1582194844: Opening ipv4 listen socket on port 1883.
1582194844: Opening ipv6 listen socket on port 1883.
1582205221: New connection from on port 1883.
1582205221: New client connected from as default_sub (p1, c1, k30).
1582205225: New connection from on port 1883.
1582205225: New client connected from as default_pub (p1, c1, k30).
1582205235: Client default_pub disconnected.

Setting up persistent files

Mosquitto can be configured, for example, to change logging, password, listener-ports, etc. This is achieved using mosquitto.conf file.

To set up mosquitto.conf, first create a local working directory with a three sub-directories of config, data and log, e.g.

% cd
% mkdir docker-mosquitto
% cd docker-mosquitto
% mkdir mosquitto 
% mkdir mosquitto/config/ 
% mkdir mosquitto/data/
% mkdir mosquitto/log/

Create a config file

Next, create a test file called mosquitto.conf in the newly created subdirectory mosquitto/conf/:

% touch mosquitto/config/mosquitto.conf

Edit the config file

Using your favourite editor (okay vi isn’t my favourite, but it’s convenient):

% vi mosquitto/config/mosquitto.conf

And add the as a minimum set of conf directives.

# following two lines required for > v2.0
allow_anonymous true
listener 1883
persistence true
persistence_location /mosquitto/data/
log_dest file /mosquitto/log/mosquitto.log

The full list of configuration items can be found [here](].

Run the docker image with a mounted volume

Now, when invoking the docker image we use the -v flag mapping the local filesystem into the docker container. The running container will now pick up the locally defined mosquitto.conf. Invoke e.g:

% docker run -it --name mosquitto -p 1883:1883 -v $(pwd)/mosquitto:/mosquitto/ eclipse-mosquitto 


I hope this post gave you a useful overview of getting an MQTT Mosquitto Broker up and running using Docker.

Hopefully, in future posts, I will be able to share further details of the dogfood project.


Posted in General | Tagged , , | 25 Comments