Side effects and sequence points; why volatile matters

Introduction

Most embedded programmers, and indeed anyone who has attended a Feabhas programming course, is familiar with using the volatile directive when accessing registers. But it is not always obvious the ‘whys and wherefores’ of the use of volatile.

In this article, we explore why using volatile works, but more importantly, why it is needed in the first place.

Peripheral register access

If we start with a simple, fictitious, example. Suppose we have a peripheral with the following register layout:

register width offset
control byte 0x00
configuration byte 0x01
data byte 0x02
status byte 0x03

with a base address of 0x40020000.

In a previous posting we covered using structures for register access. Let’s assume the following (incorrect) code has been written:

#include <stdint.h>

typedef struct {
    uint8_t ctrl;
    uint8_t cfg;
    uint8_t data;
    uint8_t status;
} Port_t;

Port_t* const port   = (Port_t*) 0x40020000;

void write(uint8_t data)
{
  port->ctrl = 1;         // Enter configuration mode
  port->cfg  = 3;         // Configure the device
  port->ctrl = 0;         // Enter operational mode.

  while(port->status == 0) 
  {
    // Wait for data...
  }
  port->data = data;
}

If we compile this code using GCC 8.2 for Arm using the flags:

  • -O3 – high optimaisation
  • -mcpu=cortex-m4

we get the following generated Arm Thumb-2 assembler:


 1. write:
 2.   ldr r3, .L5
 3.   ldrb r2, [r3, #3] @ zero_extendqisi2
 4.   mov r1, #768
 5.   strh r1, [r3] @ movhi
 6.   cbnz r2, .L2
 7. .L3:
 8.   b .L3
 9. .L2:
10.   strb r0, [r3, #2]
11.   bx lr
12. .L5:
13.   .word 1073872896

Complete example

Explaining the assembler

The key instructions are:

  • [2] This loads the registers base address into r31073872896 is the decimal representation of 0x40020000
  • [3] Loads 0x40020003, the device’s status register, into r2
  • [4-5] Stores 3 in the config register and 0 control register. #768 is hex 0x0300 and strh writes a ‘half-word’ (16-bits) to the base address (i.e. and 0x00 => 0x40020000 and 0x03 => 0x40020001)
  • [6] Compares the value of status to zero and branches accordingly – cbnz is the opcode compare branch on non-zero
  • [7] Label (L3) for the while loop
  • [8] Unconditional branch to label L3
  • [10] Stores the passed parameter (passed in r0) in the data register 0x40020002.

So what’s the problem?

With this example, the generated code is incorrect. Most notably the compiler optimises away the memory read from the while expression; the port->status register only gets loaded from memory once (line 3). As a consequence, the compiler generates an infinite loop L3 and doesn’t re-read the status register as a part of the while expression.

What’s not immediately apparent is that it has also optimised away the first write to the control register (port->ctrl = 1) so our device would never have entered configuration mode in the first place! You may see this referred to as ‘fusing’ memory accesses.

Adding volatile

If you’ve been programming embedded systems for any time (and certainly if you’ve attended any of the Feabhas training courses) you will know that adding volatile to the pointer definition fixes the problem. The volatile directive informs the compiler that the object could change outside the program’s flow-of-control; so don’t optimise accesses to the object.

Updating our code so the pointer is volatile:

volatile Port_t* const port   = (Port_t*) 0x40020000;

The compiler now generates correct assembler:


 1. write:
 2.   ldr r3, .L7
 3.   push {r4}
 4.   movs r2, #1
 5.   movs r4, #3
 6.   movs r1, #0
 7.   strb r2, [r3]
 8.   strb r4, [r3, #1]
 9.   mov r2, r3
10.   strb r1, [r3]
11. .L2:
12.   ldrb r3, [r2, #3] @ zero_extendqisi2
13.   cmp r3, #0
14.   beq .L2
15.   strb r0, [r2, #2]
16.   pop {r4}
17.   bx lr
18. .L7:
19.  .word 1073872896

Complete listing

Examining the generated instructions we now see:

  • [2] Loads the registers base address into r31073872896 is decimal of 0x40020000
  • [7] Stores 1 at 0x40020000; the missing port->ctrl = 1 from before
  • [8] Stores 3 in the config register
  • [10] Stores 0 control register.
  • [11] Label (L2) for the while loop
  • [12] Loads 0x40020003, the status register’s contents, into r3 as part of the loop
  • [13-14] Compares the value of status to zero and branches accordingly
  • [15] Stores the passed parameter (passed in r0) in the data register 0x40020002.

Importantly, both the initial config register write [7] and the re-read of the status register [12] are occurring.

Great, but why in the first example is the specific code being optimised away?

Side Effects and Sequence Points

To properly understand why volatile fixes this, we have to understand the concepts of Side Effects and Sequence Points.

Side Effects

In its simplest form, writing to any object (variable) is a side effect, so the statement:

  x = y;

is producing a side effect on the object x. There is much jargon around here, but anytime we are modifying a memory location, we can consider that a side effect. More generally, a side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.

C allows for multiple side effects in a single expression, e.g.

  x = y++;

We now have side effects occurring on both x and y.

All statements should have a side effect, e.g.

  x == y;

is non-sensical but legal C! A good compiler would generate a warning along the lines of:

<source>:18:14: warning: statement with no effect [-Wunused-value]
   x == y;

and there are multiple MISRA-C rules regarding side effects, fundamentally saying there should be one, and only one, side effect in any expression.

Other expressions considered being a side effect are:

  • modifying a file
  • read a volatile object
  • calling a function, after the arguments have been evaluated

So, given:

  int x;
  volatile int y;
  //...
  x = y;

the assignment statement has two side effects.

Sequence Points

Sequence points are vital when considering code optimisation. They represent points in our code where we can guarantee certain conditions are true.

As a simple example, given the expression:

   x = y + z;

This expression is evaluated for its side effects and there is a sequence point following this evaluation. So what we can rely on is that the sub-expression (y + z) must be evaluated before the temporary result (called an r-value) is assigned to x generating a side effect. However, we cannot rely on the order that y and z is read from memory, because, of course, in theory it shouldn’t matter (the order of sub-expression evaluation is Unspecified behaviour).

Rather than getting into the rabbit hole that is sequence points (I’ll leave that for another day), I want to return to why it’s affecting the initial code without the use of volatile.

The important paragraph in the standard C standard specifies:

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

What this boils down to is that between sequence points the compiler is free to remove statements where it considers them unnecessary. The two optimisation that matter to us are:

  • If there are two or more writes to the same object (side effects) without an interleaving read, then only the last write (side effect) needs to be evaluated
  • If there are two or more reads to an object without an interleaving write (side effect), then only the first read needs evaluating (effectively caching the read value locally)

So going back to our original code:

  port->ctrl = 1;         // Enter configuration mode
  port->cfg  = 3;         // Configure the device
  port->ctrl = 0;         // Enter operational mode.

Hopefully, you can now appreciate that as there isn’t a read from port->ctrl between the two writes to port->ctrl, the compiler evaluates this and, so only evaluates the second write.

Further on:

  while(port->status == 0) 
  {
    // Wait for data...
  }

It, hopefully, now makes sense why the compiler only evaluates port->status == 0 once and not for each iteration of the loop.

By adding volatile to the pointer port, all reads and writes via the pointer are considered side effects, so the compiler does not optimise away any of the access.

Note that had we written:

extern void f(void);

Port_t* const port   = (Port_t*) 0x40020000;

int main(void)
{
  port->ctrl = 1;         // Enter configuration mode
  f();                    // for a sequence point
  port->cfg  = 3;         // Configure the device
  port->ctrl = 0;         // Enter operational mode.

  while(port->status == 0) 
  {
    // Wait for data...
    f();  // force a sequence point
  }
  port->data = 0x01;
}

Then we would also get the ‘correct’ assembler as the function call to f() acts as a sequence point. This is why sometimes ‘incorrect’ code can appear work, but then a minor modification (e.g. removing the call to f()) will suddenly cause the code to stop functioning correctly.

Also low optimisation settings, used quite often during debugging sessions, typically don’t trigger the sequence point/side effect optimisation, so again appear to initially function correctly.

The C99 standard list the following sequence points (Annex C):

  • The call to a function, after the arguments have been evaluated (6.5.2.2).
  • The end of the first operand of the following operators: logical AND && (6.5.13); logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
  • The end of a full declarator: declarators (6.7.5);
  • The end of a full expression: an initializer (6.7.8); the expression in an expression statement (6.8.3); the controlling expression of a selection statement (if or switch) (6.8.4); the controlling expression of a while or do statement (6.8.5); each of the expressions of a for statement (6.8.5.3); the expression in a return statement (6.8.6.4).
  • Immediately before a library function returns (7.1.4).
  • After the actions associated with each formatted input/output function conversion specifier (7.19.6, 7.24.2).
  • Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any movement of the objects passed as arguments to that call (7.20.5).

A matter of style – volatile structures

In the example so far, we had made the pointer definition volatile, e.g.

volatile Port_t* const port   = (Port_t*) 0x40020000;

An alternative style is to apply the volatile directive to the structure elements instead of the pointer definition.

typedef struct {
    volatile uint8_t ctrl;
    volatile uint8_t cfg;
    volatile uint8_t data;
    volatile uint8_t status;
} Port_t;

Port_t* const port   = (Port_t*) 0x40020000; // non-volatile pointer

This is the style used by CMSIS.

Using #define with volatile struct elements

Historically C programmers have used #define in preference to using constant-pointers, so given the following:

typedef struct {
    volatile uint8_t ctrl;
    volatile uint8_t cfg;
    volatile uint8_t data;
    volatile uint8_t status;
} Port_t;

The #define would be:

#define port    ((Port_t*)0x40020000)

Using #define with volatile pointer cast

Finally, rather than the volatile directive being part of the structure definition, it can be placed as part of the pointer cast in the #define, e.g.

typedef struct {
     uint8_t ctrl;
     uint8_t cfg;
     uint8_t data;
     uint8_t status;
} Port_t;

#define port    ((volatile Port_t*)0x40020000)

Coding Style and volatile

When using any volatile objects, the recommended practice is to avoid using

  • ++ or --
  • any op= operator (e.g. |= )

Prefer, instead to only use the volatile object for read (load) and write (store) operations, e.g.

prefer

typedef struct {
     uint8_t ctrl;
     uint8_t cfg;
     uint8_t data;
     uint8_t status;
} Port_t;

#define port    ((volatile Port_t*)0x40020000)

void f(void)
{
  ...
  unit8_t value = port->ctrl;  // load
  value |= 0x3;                // modify local
  port->ctrl = value;          // store
  ...
}

over

typedef struct {
     uint8_t ctrl;
     uint8_t cfg;
     uint8_t data;
     uint8_t status;
} Port_t;

#define port    ((volatile Port_t*)0x40020000)

void f(void)
{
  ...
  port->ctrl |= 0x3;   // hides load, modify, store 
  ...
}

Depending on the his can also help reduce the number of bus access, which in turn can help reduce power consumption.

C11 Atomics

The C11 standard introduces the concept of atomics (<stdatomic.h>). These don’t change the semantic of volatile but do have an impact of memory access. Rather than going into them here, I refer you to a previous posting by Glennan The three ‘No’s of sequential consistency.

Glennan’s post is written for a C++ audience, but the concepts are identical. In C++20 many general uses of volatile are being deprecated.

Summary

Understanding sequence points and side effects is very important to understanding C. The way we write code and the optimisation settings we use can affect the physical memory access model. All register access must be through volatile objects.

It does not appear to make any difference where volatile is placed (object definition or type definition), therefore a matter of style. Adding the volatile to the struct definition has the benefit of simplifying the pointer syntax and would eliminate any potential of it being missed.

The same principles apply to C++, see Making things do stuff – Part 4

Niall Cooling
Dislike (0)
Website | + posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.

About Niall Cooling

Co-Founder and Director of Feabhas since 1995. Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking. His current interest lie in IoT Security and Agile for Embedded Systems.
This entry was posted in ARM, C/C++ Programming, CMSIS, Cortex and tagged , , , , . Bookmark the permalink.

2 Responses to Side effects and sequence points; why volatile matters

  1. Ed Baker says:

    A very interesting blog...

    If "The end of a full expression..." is a sequence point, is there not a sequence point at the end of "port->ctrl = 1;"?

    I once (only a few years back in fact) consulted the C90 (ISO/IEC 9899:1990) standard and concluded that:

    port->ctrl = 1; // Enter configuration mode
    port->cfg = 3; // Configure the device
    port->ctrl = 0; // Enter operational mode.

    Would (should) work fine because the semicolon at the end of each line constitutes the end of a full expression, and hence a sequence point.

    This appears not to be the case, and examination of the generated machine code revealed the problem, and was indeed fixed by "volatile". Cue phone call to compiler vendor to report the "compiler bug" 😉

    My misunderstanding was apparently with sequence points. I find it difficult to determine from the standard where a sequence point really is. Is the semicolon at the end of each statement not a sequence point?

    I very much look forward to your future post on sequence points another day... "Rather than getting into the rabbit hole that is sequence points (I’ll leave that for another day),"

    Thanks again for the great blog.

    Like (1)
    Dislike (0)
  2. denis says:

    Frequently volatile keyword is a unknown subject.
    Very good explanation.

    Like (0)
    Dislike (0)

Leave a Reply