Why We Need Build Systems

Build systems were developed to simplify and automate running the compiler and linker and are an essential part of modern software development. This blog post is a precursor to future posts discussing our experiences refactoring the training projects to use the CMake build generator.

Using Build Systems

Build systems can be standalone command line applications such as  Make, Scons and Ninja; or part of an (Integrated Development Environment IDE) like Visual Studio , XCode or IAR Workbench.

Configuring build systems for a project can be complex and there are a few applications around that will generate the required build files from a simpler project configuration file. The most popular of these tools are CMake, which generates files for several build systems, and Meson, which generates Ninja build files.

A 2021 survey by the Standard C++ Foundation showed that CMake was used by 4 out of 5 of the respondents, while Meson and  scons are each used by less than 1 in 20. The survey also showed that Make/nmake and MSBuild (Visual Studio) are used roughly equally by 2 out of 5 people and Ninja by 1 out of every 3. In many cases the respondents will be using more than one build system across multiple projects.

Interestingly, or worryingly, from our perspective of moving our training projects to CMake, about two thirds of the respondents found CMake to be a major or minor pain point. Managing CMake was the third most frustrating aspect of C++ development behind managing libraries and project build times.

We need to use build systems because compiling an application from source code is no longer as simple as running a single compilation command such as:

$ g++ -o Application main.cpp

While this works, it relies on default configuration options for the compiler and linker.

We should point out that referring to g++ as a compiler is misleading. It isn’t very obvious, but g++ is itself a very simplistic build system responsible for running a number of build phases:

  • the preprocessor
  • the compiler
  • the assembler (code generator)
  • the linker

Anyone who has worked with the Microsoft C++ tools will be aware that there is a separate compiler (cl.exe which includes the preprocessor and assembler) and linker (link.exe).

The build process is complex and involves many stages with different requirements, inputs and outputs and can be summarised in the following diagram.

In reality, we use a build system to manage some or all of the following aspects of software development:

  • source code organisation
  • source code inter-dependecies
  • managing third party libraries
  • compilation options
  • code generation options
  • program linker configuration
  • post build processing
  • managing testing

It’s worth looking closely at these steps in order to understand the requirements of a build system and the concept of a development Toolchain.

Source Code Organisation – In Source Builds

Our simple example above generates the intermediary object files and executable program in the current directory. When we store the output files in the same directory as the source files we call this an In-source build: this is generally considered a bad idea. Managing the source code will become problematical when our application gets more complex and requires multiple files.

The output files from a build process can be re-created from the source code so do not need to be saved using a backup regime, whereas source code must be saved which these days is usually achieved using a source code repository such as Git.

The lifecycle management of source code and build output files are independent and should be stored in different locations. Many build tools and tool generators (like Meson) do not even support in-source builds.

Source Code Organisation – Out of Source Builds

All projects should store generated artefacts (object files, executable applications, etc.) in a separate location to the source code.

A typical C/C++ application will separate out logical subsystems into different components and these will usually be stored in a hierarchical directories structure representing the application’s architecture. A build system should manage source code (including header files) stored in multiple directory locations.

There is no standard structure for C/C++ source code and a quick browse of open source projects in GitHub shows nearly as many different source code structures as there are projects (a bit of an exaggeration but it does show there is no single standard).

Many projects intermingle the header files with the source files, which can lead to complex compilation options when the source code is stored in hierarchical directories. Header files define the interface to a module’s functionality, and a best practice approach is to separate interface from implementation. Applying this approach to source code organisation implies that header files should be stored separately from the implementation files.

Back when I was developing code using Unix I used directory names src for source files and hdr for the header files, with both directories stored in the project’s root directory; but the use of src and inc is the usual practice in embedded system. Not everyone likes the traditional Unix style of shortening words (often by omitting vowels) preferring to use source and include instead.

What everyone does agree on is that the generated output files go in a separate folder: normally in the project workspace. Typical names are build or target as used by the Maven build system.

No matter how the source code is organised, the compiler must be told where the source and header files are located.

Another aspect of source code organisation is integrating the build process with a source code management system. The ability to download the latest committed version of source code before building means the same build system can be used for local development as well as the centralised pipelines of Continuous Integration commonly used in agile development methodologies.

Once the locations of source and headers files are determined, a definition of the component files is also required for a build process. These files can be supplied individually or by using wildcard patterns – a good build system will support both.

The benefit of listing each file individually is that the build system provides a definitive list of source dependencies for the build artefacts, which is helpful for general administration tasks and undertaking a risk analysis for business continuity purposes. The drawback is that this list has to be maintained and the extra administration required can nudge developers into including code in existing modules rather than creating a new module, especially if the build configuration is centrally administered in an overly restrictive manner.

A wildcard approach to filenames (e.g. src/*.cpp) superficially seems more straightforward as it doesn’t require the developer to list each file allowing new files to be easily added. The downside is that the build system does not have a definitive list of the source code files for a given artefact, making it harder to track dependencies and understand precisely what components are required. Wildcards also allow spurious files to be included in the build – maybe an older module that has been superseded but not removed from the source folder.

Best practice says to list all source modules individually despite the, hopefully minor, extra workload involved when first configuring the project or adding additional modules as the project evolves.

Source File Dependencies

Larger projects will use multiple source files to breakdown a large code base into smaller manageable units, probably using directories to group files in component subsystems. There will be interdependencies between these files. For C/C++ projects the dependencies can be identified through the occurrence of #include statements.

Large projects can take a while to compile and link all of the separate files from scratch (a clean build).  A large C project can take 20 or 30 minutes to build even using a fast multi-core server. For C++ projects making heavy use of templates the build times can be measured in hours rather than minutes.

Most build systems will optimise the build process by omitting stages that are already up to date. For C/C++ builds this means omitting the compilation of a source file if neither the source nor any of the files it depends have been changed since the last build.

But a build optimisation only works correctly if the build configuration correctly captures the dependencies between files. For simple build system like GNU Make the developer must specify and maintain these dependencies manually. A build system generator like CMake will scan the source files to maintain the dependencies automatically.

Compilation Options

Compilation options must specify:

  • the source language version, sometimes the source language
  • compilation options
  • include file locations
  • preprocessor symbols (or defines)

Typically, compilation options are provided as command-line parameters but there is no reason why a compiler couldn’t read a configuration from a specification file.

As an example of a compilation option the GNU g++ compiler uses the -std=c++17 for working with C++17 whereas the Microsoft cl compiler uses /std:c++17. It’s also worth pointing out that the GNU Compiler Collection uses different compiler for C (gcc) and C++ ( g++), but Microsoft supplies one compiler (cl) and uses the filename extension to determine the language (.c or .cpp).

A build file generator such as CMake or Meson must handle these different approaches.

An essential compilation option is setting the correct level of compiler warnings. A C/C++ compiler will attempt to generate code whenever possible; only if code cannot be generated will a compilation error be issued. This means that the compiler sometimes makes assumptions about what the programmer intended when writing the source code statements. The phrase “never assume because you make an ASS out of U and ME” has some significance here.

Using g++, we recommend, as a minimum, using the -Wall and -Wextra options to enable warnings for code use that is generally regarded as questionable (this doesn’t include implicit type conversion). The -pedantic and/or -ansi options enforce strict ISO (formerly ANSI) language compliance which is advisable as it will ensure you don’t make use of g++ specific features and pre-empt potential problems if you decide to use a different tool chain in the future.

Other compilation options are used to generate warnings when the compiler infers a programmer’s intentions when the source code is not explicit. A good example of the compiler inferring the programmer’s intentions where the source code is not explicit is the implicit conversion of a signed to an unsigned integer value. Implicit sign conversion can cause subtle problems when working with hardware device registers, so we usually add the -Wconversion and -Wsign-conversion warnings on g++ to identify these situations.

In general, set warnings to the highest level possible and remove all, or at least as many warnings as possible, from the compilation.

Your compilation phase should be augmented by static analysis tools such as clang-tidy, cppcheck or commercial tools such as Coverity which examine code structure and data flow without executing the code. These tools use heuristic rules to identify potential logic flaws and non compliance to coding guidelines such as MISRA widely used in embedded systems.

Include File Locations

We find that when teaching the use of #include preprocessor directive is a common source of confusion, even amongst experienced C/C++ programmers. We’re often asked what is the difference between using angle brackets < > and quotes ” “?

Originally angle brackets were used for header files in standard locations known to the compiler, whereas quotes were used to define a string literal specifying the path to the header file (relative to the project workspace). These days life isn’t quite that clear cut, but the general approach is to use < > for library headers and “” for user-defined headers.

The ISO C and C++ standards both say that “The named source file is searched for in an implementation-defined manner” for both < > and ” “ (see the “Source File Inclusion” section in the relevant standard).

The C++ standard header files are defined with logical module names like <iostream> whereas in C we use the header filenames like <stdio.h>. When using C header files with C++ the logical name is the base part of the filename prefixed by c so <stdio.h> becomes <cstdio>.

The compiler knows where standard header files are located. For example, the Linux host g++ compiler looks in /usr/include whereas the Arm g++ compiler (located at /opt/arm-toolchain/bin/arm-none-eabi-g++) looks in /opt/arm-toolchain/include. There is a full description for hosted GNU compilers in the Search Path section of the manual.

We can tell the compiler to look in other locations using the -I directive which specifies an additional directory to search for include headers. Note that this is just the top-level directory and not a recursive directory search.

We can use multiple include path locations on a compilation so we could include nested include directories as separate -I options. For example, when developing out embedded target code for an STM Discovery board (https://www.st.com/en/evaluation-tools/stm32f4discovery.html) we include standard header files using -Isystem/include/cmsis and -Isystem/include/stm32f4xx separately;  we cannot just use  -Isystem/include. Note that these are relative paths from the project workspace (not an absolute path like /usr/include).

When using #include directives with string literals we are specifying file pathnames. But this approach can be abused by using a directive such as:

#include “../hdr/mylib.h”

This approach has coupled the organisation of the files on the file system to the C++ program code structure. We could not rename the hdr directory to include without modifying every source file that uses this header file. A maintenance nightmare – so this should be avoided.

To solve the file system dependency, we would add -Ihdr to include the hdr folder in source include files search (some people prefer to use -i./hdr to make it explicit this is relative to the project workspace). Our include path now becomes “mylib.h” without any path information. It is common to organise headers files in sub-directories so it would still be acceptable to use “lib/mysublib.h” as the header file organisation is part of the project structure.

The compiler uses the include locations to resolve all include directives so in the previous example we could also have written #include <mylib.h> but this would not follow the accepted conventions of using string literals for user defined header files.

In resolving include file paths modern compilers will typically look for the header file relative to the location of the source files containing the #include statement. If the header file is not found that way the compiler will search the specified include locations, in the order given on the command line, until it finds an exact filename match (Note, both Windows and OSX are case insensitive when searching for included filenames).

Some compilers may adopt a different approach such as initially looking for include files relative to the current working directory rather than the location of the source file. Defining and using a compiler’s -I option (or equivalent) in a build system removes the dependency on a compiler’s include file lookup strategy.

The specified order of the include directories is therefore important and a possible source of problems if there are multiple header files with the same name: the first filename match is used and the same filename in subsequent include directories are ignored: duplicated include header filenames are not treated as an error. The best practice is to use unique header filenames or, failing that, use hierarchical directories to ensure unique paths. A good example of the directory approach is the standard Linux header types.h which is included as <sys/types.h>.

Code generation Options

Code generation or assembler options are specified on the compilation command line and include general concepts like optimisation levels and architecture-specific options. For example, when cross compiling to an Arm processor, we use -mcpu=cortex-m4 flag to set the correct architecture and -mfloat-abi=soft to use a software FPU library because the QEMU we use for online training does not include support for a hardware FPU.

Code optimisation is typically disabled during development by using the -Og option (the g is short for gdb the name of the Gnu debugger), but for a released version of the application (see build types described later) we may want to include some optimisation. For example, using -Ofast will optimise for speed at the expense of a potentially larger memory footprint whereas -Os will optimise for the smallest memory usage usually with slower code execution (the s means size).

Usually, we apply the same options to all compilation units (each source file is compiled independently) but sometimes it may be desirable to treat some source files differently. A good example would be targeting an embedded system with a limited amount of memory where we optimise for small size with -Os. However, if one module is a critical performance bottleneck, we may need that compilation unit to be optimised for speed with -Ofast.

Make sure whatever build system you use will support different compilation options for different source files: even if you don’t need it now, you might in the future.

Preprocessor Directives

We use preprocessor directives to configure our source code so that we can use a single code base (one project) to build potentially different applications.

Examples of standard preprocessor symbols are __STDC_VERSION_ and __cplusplus which can be used to verify the compiler options are set to the correct C or C++ language version.

To ensure we are using C++17 we would a check in our code such as:

#if __cplusplus < 201703L
#error “__FILE__ requires a C++17 compiler”

Here the symbol __FILE__ is the filename of the current compilation unit.

If we knew we were always using a Modern C++ compiler (C++11 or later) we could also have used:

static_assert(__cplusplus >= 201703L)

A good example of user-defined configuration options can be seen in our embedded training projects currently using an STM Discovery board (STM32F407VG) with hardware components configured at specific addresses. We know that we may need to change this to a different board in the future; perhaps STM will stop manufacturing this particular board.

If we need to move to a different board with similar hardware components in the future, these components could be mapped to different physical addresses. We can build this dependency into our source code using conditional compilation based on preprocessor symbol definitions.

To resolve our theoretical problem of differing physical addresses we use pre-processor directives to include the appropriate device header file:

#ifdef STM32F407xx
#include “stm32f407xx.h”
#elif defined(STM32F417xx)
#include "stm32f417xx.h"

Our build configuration would define the appropriate preprocessor symbol on the command line: in this case using -DSTM32F407xx to select the appropriate hardware configuration.

Similarly, we can use -DDEBUG to define a debug symbol which we use to set options applicable to developing and debugging code such a -Og to optimise for the debugger.

Compiler defines are used to support the concept of different Build Configurations which CMake refers to as build type, but an IDE usually calls a build configuration or build target.

Build Configuration

A build configuration can be described as the combination of all the build options that uniquely define the application being built. Many projects have the idea of a debug build optimised for development and a release build optimised for use.

Each build configuration uses a separate output folder for the build artefacts (object files, executable and additional supporting files). This prevents a one build configuration from overwriting the output from a different configuration.

Build configurations can be used for many purposes. Our previous example of using two different target hardware boards would be separate builds for each hardware target. Actually, including debug and release versions for the two different hardware targets, that’s four different configurations and for separate build output locations.

We can also use build configurations to install our application in a central location or a web server where end-users can access and use our build artefacts.

A good build system will allow us to automate this deployment and/or installation process.

Program Linker Configuration

Like the compiler, linker options are provided as command line parameters usually augmented by configuration files and run-time libraries. The linking stage for embedded systems is often more complex than the compilation stage because it as at this point that run time libraries, and the physical memory architecture have to be resolved to create a image suitable for the target board.

When using the Gnu toolchain, the use of linker options can be confusing because the same command (gcc or g++) is used for both compilation and linking. Linker options specified to a compilation are ignored, similarly compiler options are ignored when linking. Working with Arm g++ we might build a system using a compilation command and a link command:

$ g++ -c -o build/main.o -std=c++17 --specs=rdimon.specs src/main.cpp
$ g++ -o build/Application.elf -std=c++17 --specs=rdimon.specs main.o

While this works it isn’t clear that the std=c++17 option is probably not used by the linker. Similarly, the –specs option tells the linker to include a debug version of the embedded C standard library, which isn’t required by the compiler.

Microsoft developers have a separate link.exe program used to link the application so the two build steps are clearly differentiated.

A good build system will clearly differentiate between compiler and linker options.

On a host compilation the linker is specific to the host and simply needs to be given the list of optional run time libraries used by the project. In our C++ courses we discuss threading which requires the linker to include the POSIX threading library using the g++ option -lpthread.  The other common linking requirement is for handling libraries: should they be included in the executable image (static linkage), or use dynamic linking to a shared library (.so or .dll file) at application startup.

Using a development toolchain for an embedded (cross compiled) system has more complex linkage requirements. The physical board memory layout must be supplied to the linker: the GNU Arm Embedded Toolchain  uses .ld files for this purpose. There will be runtime kernel or executive code that executes at board reset to initialise the hardware, setup the stack, heap and static data sections before calling the main function to start the program.

An embedded system linker needs to include the C/C++ runtime libraries. In the case of the GNU Arm Embedded Toolchain these are provided by the  –specs options. For a debug version the rdimon.specs is linked to support semi-hosted debugging (I/O streams are mapped onto the serial port used to flash memory), while a release version uses nosys.specs which has stubs for standard I/O support and unsupported host system functions.

Post Build Processing

The last step in our build (if we ignore deployment and installation) is any additional processing required after a successful build.

For a hosted application, we might strip the generated executable to remove all embedded symbols and other redundant information to reduce the size of the executable image.

For a cross-compilation to target hardware, we usually have to generate additional binary (or hex) files containing the image to load into flash memory. If we have embedded debug support, we will need map files to support post-crash analysis it allows the programmer to understand and review the target memory layout.

Managing Testing and Debugging

Testing and debugging is too extensive a subject to cover in any detail in this post, but a build system should be capable of running automated tests at any point of the build cycle. Test management should include unit testing on a per-source code module, integration testing, and functional testing of the various build artefacts generated by the build process.

Using debugging tools such as Open OCD  and Segger Ozone while a necessary part of development may be too much for a build system to manage. Typically build systems like to create artefacts such as object files, executable images or other output files such as generated web pages for reports and build summaries. Interactive debugging does not fit very well with this approach.

Limitations of a  Build System

Tools such as CMake have a very extensive “programming language” used to configure the build process. While it may appear attractive to use this programmable capability to incorporate additional steps to the build system this can end up adding huge complexity to the build instructions. It may be better kept some aspect oft the development lifecycle decoupled from the build system.

Moving away from a Build System will introduce problems to do with portability and maintenance of the supporting scripts. Bash scripts are supported on Linux and OSX but on Windows are provided by third party libraries such as MinGW or Cygwin. Recently Microsoft have been integrating Windows Subsystem for Linux (WSL) more closely with Windows 10 and this provides another source for running shell scripts. Using bash scripts for Windows developers in not straightforward. Similarly PowerShell is standard on Windows 10, but has to be installed on Linux and OSX.

While Python is cross platform some of the standard libraries are only available on Linux like platforms (usually including OSX) while others are only available on Windows. Python’s popularity is partly down to the extensive range of third-party libraries available, which would have to be installed on the developer’s workstations.

Stepping away from the build tool to provide additional development lifecycle support but that loses the inherent portability of a build tool like CMake. Developers can find themselves deciding to use a build tool to configure the build environment before they can even start development work on a project.


Hopefully, you now know why we need build systems to support modern software development. Applications have complex configuration and build requirements that need to be captured and implemented by a build system. Applications are no longer single source files that can be compiled using a single command line.

In case it isn’t apparent, the build system configuration is part of the project’s source code and should be stored in the same code repository as the program source. Anyone checking out a project source code will then be able to build the project.

This blog post is a pre-cursor to posts discussing what we learned when configuring CMake for our cross compilation and hosted training projects.

Posted in Build-systems, C/C++ Programming, Toolchain | 4 Comments

Modern Embedded C++ – Deprecation of volatile

Compiling the following, straightforward code:

volatile int x;

int main() {
    x += 10;


Using g++ with the directive -std=c++17 builds without any warnings or errors. However, change the directive to -std=c++20, and the result is:

source>: In function 'int main()':
<source>:5:5: warning: compound assignment with 'volatile'-qualified left operand is deprecated [-Wvolatile]
    5 |   x += 10;
      |   ~~^~~~~
Compiler returned: 0

The new C++ standard, C++20, has deprecated volatile! So, what does this mean for the embedded programmer?

We covered the need for and use of, volatile in a previous posting. That post (written in April 2020) did state that:

In C++20 many general uses of volatile are being deprecated.

The key phrase here is general uses.

Volatile in embedded

Continue reading

Posted in ARM, C/C++ Programming | Tagged | Leave a comment

GitHub Codespaces and online development

In our previous posting, we discussed using VSCode’s Dev Container extension to allow running workspaces directly within a Docker container.

In December 2020, I was granted early access to a new feature developed by GitHub called Codespaces. Codespaces offers an online VSCode development environment, enabling you to develop entirely in the cloud.

The great news is that Codespaces uses the same core process, and file structure, as Dev Containers; meaning once we have our .devcontainer folder setup (if you are unfamiliar with Dev Containers it is worth reading the previous blog first) “it just works” online.

TDD in the cloud

Using our example from the previous blog (GoogleTest and meson) to run using Codespaces is Simples!.

When granted Codespaces access, open the GitHub project and under the Clone or download button you are offered the option Open with Codespaces

Continue reading

Posted in Agile, C/C++ Programming, General, Industry Analysis, Testing | Tagged , , , , , | Leave a comment

VSCode, Dev Containers and Docker: moving software development forward

Long term readers of this blog will know our devotion to using container-based technology, especially Docker, to significantly improve software quality through repeatable builds.

In the Autumn/fall of 2020, Microsoft introduced a Visual Studio Code (VSCode) extension Remote – Containers. With one quick stroke, this extension allows you to open a VSCode project within a Docker container.

Getting started with Dev Containers and Docker

There are several different approaches to using Dev Containers. In this post, we shall cover three options:

  1. Using an existing Docker image from Docker Hub
  2. Using a pre-build Microsoft container setup
  3. Using a custom Docker image based on a project specific Dockerfile

There are a couple of prerequisites:

Using an existing Docker image – TDD in C with Ceedling

Anyone using or experimenting with Test-Driven-Development in C will probably be aware of Ceedling, unity and CMock.

Whether or not you have Ceedling, or any dependents, such as Ruby, installed we can begin using Dev Container with an existing Dockerhub container image. Containerisation ensures we can quickly get up and running with Ceedling in a known environment. In true ‘Blue Peter‘ style, we happen to have a pre-built Ceedling based Docker image on Docker Hub.

  1. Create an empty folder, e.g.
$ mkdir ceedling_test
  1. In the new folder, create another folder called .devcontainer (note the preceding .)
$ cd ceedling_test
$ mkdir .devcontainer
  1. In that new folder, add a file called devcontainer.json with the contents
    "image": "feabhas/ceedling"
  1. Your project structure should now be:
    └── devcontainer.json
  1. Now open VSCode in the working directory
$ code .
  1. VScode will detect the Dev Container configuration file and ask if you want to reopen the folder in a container. Click Reopen in Container.
  2. Open a terminal window within VSCode, and you will be presented with a shell prompt #. We are now running within a Docker container based on the image feabhas/ceedling.
  3. Test the container, e.g.
# ceedling new test_project
Welcome to Ceedling!
      create  test_project/project.yml

Project 'test_project' created!
 - Execute 'ceedling help' from test_project to view available test & build tasks

# cd test_project

# ceedling module:create[widget]
File src/widget.c created
File src/widget.h created
File test/test_widget.c created
Generate Complete

# ceedling test

Test 'test_widget.c'
Generating runner for test_widget.c...
Compiling test_widget_runner.c...
Compiling test_widget.c...
Compiling unity.c...
Compiling widget.c...
Compiling cmock.c...
Linking test_widget.out...
Running test_widget.out...

  Test: test_widget_NeedToImplement
  At line (15): "Need to Implement widget"



After exiting VSCode, all files created will exist in your local file system. Reopening VSCode, you will once again be prompted to reopen in the container.

Continue reading

Posted in Agile, C/C++ Programming, Testing | Tagged , , , , , , , | Leave a comment

Introduction to the ARM® Cortex®-M7 Cache – Part 3 Optimising software to use cache

Caches – Why do we miss?

Cold Start

As stated, both data and instruction caches are required to be invalidated on system start. Therefore, the first load of any object (code or data) cannot be in cache (thus the cold start condition).

One available technique to help with cold-start conditions is the ability to pre-load data into the cache. The ARMv7-M instruction set adds the Preload Data (PLD) instruction. The PLD instruction signals to the memory system that data memory accesses from a specified address are likely shortly. If the address is cacheable, then the memory system responds by pre-loading the cache line containing the specified address into the cache. Unfortunately, there is currently no CMSIS intrinsic support for the PLD instruction.

It is worth noting that some processor data caches implement an automatic prefetcher (e.g. Cortex-A15). This monitors cache misses, and when a pattern is detected, the automatic prefetcher starts linefills in the background. Unfortunately, the Cortex-M7 data cache does not support automatic prefetch.


The other most obvious reason for misses is that of cache capacity. The larger the cache, the higher the probability of a cache hit and the lower the frequency of eviction. However, all this comes at a cost, not only financial but also power.

A larger cache is, naturally, going to contribute to the overall System-on-Chip (SoC) costs, making the end microprocessor more expensive. In high volume designs, this is always a significant factor in SoC choice.

Among all processor components, the cache and memory subsystem generally consume a large portion of the total microprocessor system power, commonly 30-50% of the total power [Zang13]. Caches, thus, add a further level of complexity to the poor-overworked engineer trying to calculate the design’s power model and has an impact on all battery-based designs.


Finally, misses will occur due to natural eviction followed by a reload. So a simple loop such as:

for(uint32_t i = 0; i < N; ++i) {
   dst[i] = src[i];

may result in multiple eviction/reload cycles depending on the memory addresses of dst and src. Also, any dst[i] eviction will result in a memory write as the line is marked dirty. The 4-way data cache goes a long way to help reduce the potential of dst[i] eviction, but because of the pseudo-random replacement policy, it may happen more often than we would expect or like.

Code Optimizations

There are a key number of areas where we, as a software developer, can potentially impact the performance of cache:

  • Algorithms
  • Data structures
  • Code structures

Continue reading

Posted in ARM, C/C++ Programming, CMSIS, Cortex, Design Issues | Tagged , | 2 Comments

Introduction to the ARM® Cortex®-M7 Cache – Part 2 Cache Replacement Policy

Instruction Cache Replacement Policy

Starting with the simpler instruction cache case; when we encounter a cache miss the normal policy is to evict the current cache line and replace it with the new cache line. This is known as a read-allocate policy and is the default on all instruction caches.

Cold start (first read)

It should also be noted that on system power-up the initial state of the cache is unknown. On the ARMv7-M all caches are disabled at reset. Before the cache is accessed, it needs invalidating. As well as each line having a tag associated with it, each line also has a valid flag (V-bit) which indicates whether the current cache line has been populated from main memory or not.

The cache must be invalidated before being enabled. The method for this is implementation-defined. In some cases, invalidation is performed by hardware, whereas in other cases it is a requirement of the boot code. CMSIS- has added a specific instruction cache API to support these operations for the Cortex-M7:

void SCB_InvalidateICache(void);
void SCB_EnableICache(void);    
void SCB_DisableICache(void);

Direct Mapped Cache

So far, we have assumed that the whole of the cache is mapped as one contiguous area, we call this a Direct Mapped cache. A lot of work in trying to improve cache performance has been done over the years and the key metric has been found to be that of cache hit/miss rate, i.e. what is our ratio of reads that result is a cache fetch against those requiring a main memory fetch and cache line eviction.

Studies have shown that the Direct Mapped Cache may not always achieve the best cache hit ratios. Take, for example, the following code:

// file1.c
int filter_algo(int p)
   return result;

// file2.c
void apply_filter(int* p, int N)
   for(int i = 0; i < N, ++i) {
     x[N] = filter_algo(*(p+i)) + k[N];

If, unluckily, apply_filter was in the address range of 0x00004000 and filter_algo was around 0x00005000 then each time apply_filter called on filter_algo, this would result in an eviction of the apply_filter code and the filling of the filter_algo instructions. But upon return, we would have to evict filter_algo code and refill with apply_filter instructions. As the algorithm executed it would cause cache thrashing.

Set-associative cache

Due to the principles of locality, research suggested that rather than a single direct-mapped cache, a better approach is to split the cache into an array of buffers, where two addresses with the same line index can reside in different array indices.

In cache terminology, each array index is known as a way, so we talk about N-Way caches. The number of ways can vary, typically ranging from 2 to 8. For the Cortex-M7 the instruction cache is a 2-way system. When we access an address, we now have ‘N’ possible lines to make a tag match against. The number of valid lines involved in the tag comparison is called the set. Continue reading

Posted in ARM, CMSIS, Cortex, Design Issues | Tagged , , , | 2 Comments

Introduction to the ARM® Cortex®-M7 Cache – Part 1 Cache Basics

For many years, the majority of smaller microprocessor-based systems have typically not used caches. With the launch of the ARMv7 architectures, caches were supported in the ARMv7-A family (e.g. Cortex-A8, etc.) but not supported in the core design of the ARMv7-M micro-controllers such as the Cortex-M3 and Cortex-M4. However, when the Cortex-M7 was announced, it broke that mould by offering cache support for the smaller embedded micro-controller.

This series is broken down in three parts:

  1. Basic principles of cache
  2. Cache replacement policies
  3. Optimising software to use cache

Why introduce caches into the architecture?

The purpose of a cache is to increase the average speed of memory access. The most immediate and obvious benefit is one of improved application performance, which in turn can lead to an enhanced power model. Caches have been used for many years (dating back as far as the 1960s) in high-end processor-based systems.

The driver behind the development and use of a cache is based on The Locality Principle.

Caches operate on two principles of locality:

  • Spatial locality
    • Access to one memory location is likely to be followed by accesses to adjacent locations.
  • Temporal locality
    • Access to an area of memory is likely to be repeated within a short period.

Also of note is Sequentiality – Given that a reference has been made to a particular location s it is likely that within the next several references, a reference to the location of s + 1 will be made. Sequentiality is a restricted type of spatial locality and can be regarded as a subset of it.

In high-end modern systems, there can be many forms of cache, including network and disk caches, but here we will focus on main memory caches. In addition, main memory caches can also be hierarchical, i.e. there are multiple caches between the processor and main memory, often referred too as L1, L2, L3, etc., with L1 being nearest to the processor core.

The Cache

The simplest way to think of a cache is as a small, high-speed buffer placed between the central processor unit (CPU) and main memory that stores blocks of recently referred to main memory.

Once we’re using a cache, each memory read will result in one of two outcomes:

  • A cache hit – the memory for the address is already in cache.
  • A cache miss – the memory access was not in cache, and therefore we have to go out to main memory to access it.

Continue reading

Posted in ARM, CMSIS, Cortex, Design Issues | Tagged , , , , | Leave a comment

TDD with Compiler Explorer

Compiler Explorer (CE) has been around for several years now. When it first appeared on the scene, it immediately became an invaluable tool. Its ability to show generated assembler from given source code across many different compilers and ISAs (Instruction Set Architectures) is “mind-blowing”. We use it extensively when teaching as it allows you to clarify the effect your code can have on both performance and memory usage. 

However, rather than limiting itself to only showing generated assembler, recent developments include the ability to execute the code and examine the program output. Having online support for this is nothing especially new (e.g. ColiruWandbox, etc.), but it’s helpful to have it within one tool.

For example, given a simple “hello, world!” program, we see the standard output in a new tab:

Test-Driven Development

One of the significant benefits to come out of the growth of Agile development is the acceptance that unit testing is just part of the development cycle, rather than a separate activity after coding.

Agile unit-testing, better known as Test-Driven Development, or TDD for short, has lead to a growth of unit-test frameworks, all based around the original xUnit model, typified by GoogleTest (gtest). 

As part of the continuing improvements and feature extensions, CE added support for various libraries to be included as part of the build. Included in this set is support for gtest, as well as two other, more modern, test frameworks; Catch2 and doctest.

Using Google Test with Compiler Explorer

Continue reading

Posted in Agile, C/C++ Programming, Testing | Tagged , , , , , , , | Leave a comment

Side effects and sequence points; why volatile matters


Most embedded programmers, and indeed anyone who has attended a Feabhas programming course, is familiar with using the volatile directive when accessing registers. But it is not always obvious the ‘whys and wherefores’ of the use of volatile.

In this article, we explore why using volatile works, but more importantly, why it is needed in the first place.

Peripheral register access

If we start with a simple, fictitious, example. Suppose we have a peripheral with the following register layout:

register width offset
control byte 0x00
configuration byte 0x01
data byte 0x02
status byte 0x03

with a base address of 0x40020000.

In a previous posting we covered using structures for register access. Let’s assume the following (incorrect) code has been written:

#include <stdint.h>

typedef struct {
    uint8_t ctrl;
    uint8_t cfg;
    uint8_t data;
    uint8_t status;
} Port_t;

Port_t* const port   = (Port_t*) 0x40020000;

void write(uint8_t data)
  port->ctrl = 1;         // Enter configuration mode
  port->cfg  = 3;         // Configure the device
  port->ctrl = 0;         // Enter operational mode.

  while(port->status == 0) 
    // Wait for data...
  port->data = data;

If we compile this code using GCC 8.2 for Arm using the flags:

  • -O3 – high optimaisation
  • -mcpu=cortex-m4

we get the following generated Arm Thumb-2 assembler:

 1. write:
 2.   ldr r3, .L5
 3.   ldrb r2, [r3, #3] @ zero_extendqisi2
 4.   mov r1, #768
 5.   strh r1, [r3] @ movhi
 6.   cbnz r2, .L2
 7. .L3:
 8.   b .L3
 9. .L2:
10.   strb r0, [r3, #2]
11.   bx lr
12. .L5:
13.   .word 1073872896

Complete example

Explaining the assembler

Continue reading

Posted in ARM, C/C++ Programming, CMSIS, Cortex | Tagged , , , , | 2 Comments

Practice makes perfect, part 3 – Idiomatic kata

Previously, we looked at some of the foundational C++ code kata – that is, elements of C++ coding that are absolutely key to master if you’re going to be programming in C++.
Practice makes perfect, part 1 – Code kata
Practice makes perfect, part 2 – foundation kata

In this article I want to introduce what I call ‘idiomatic’ kata.  These exercises have a bit more latitude (and variation) in how they can be implemented.  In that respect they are closer to traditional code kata.  The idea with these kata is to reinforce C++ constructs that aren’t encountered quite so often in typical C++ programming and so are more easily forgotten (or even avoided)

There’s no order to these kata.  None is more important than any other.

If you’re new to C++ there’s nothing wrong with practicing these exercises; but use them as a means to explore new language features.  It’s good to learn how these idiomatic patterns work.

Continue reading

Posted in C/C++ Programming, Design Issues, training | Tagged , , , | Leave a comment