Contents
Introduction
Most embedded programmers, and indeed anyone who has attended a Feabhas programming course, is familiar with using the volatile
directive when accessing registers. But it is not always obvious the ‘whys and wherefores’ of the use of volatile
.
In this article, we explore why using volatile
works, but more importantly, why it is needed in the first place.
Peripheral register access
If we start with a simple, fictitious, example. Suppose we have a peripheral with the following register layout:
register | width | offset |
---|---|---|
control | byte | 0x00 |
configuration | byte | 0x01 |
data | byte | 0x02 |
status | byte | 0x03 |
with a base address of 0x40020000
.
In a previous posting we covered using structures for register access. Let’s assume the following (incorrect) code has been written:
#include <stdint.h>
typedef struct {
uint8_t ctrl;
uint8_t cfg;
uint8_t data;
uint8_t status;
} Port_t;
Port_t* const port = (Port_t*) 0x40020000;
void write(uint8_t data)
{
port->ctrl = 1; // Enter configuration mode
port->cfg = 3; // Configure the device
port->ctrl = 0; // Enter operational mode.
while(port->status == 0)
{
// Wait for data...
}
port->data = data;
}
If we compile this code using GCC 8.2 for Arm using the flags:
-O3
– high optimaisation-mcpu=cortex-m4
we get the following generated Arm Thumb-2 assembler:
1. write:
2. ldr r3, .L5
3. ldrb r2, [r3, #3] @ zero_extendqisi2
4. mov r1, #768
5. strh r1, [r3] @ movhi
6. cbnz r2, .L2
7. .L3:
8. b .L3
9. .L2:
10. strb r0, [r3, #2]
11. bx lr
12. .L5:
13. .word 1073872896
Explaining the assembler
The key instructions are:
- [2] This loads the registers base address into
r3
–1073872896
is the decimal representation of0x40020000
- [3] Loads
0x40020003
, the device’s status register, intor2
- [4-5] Stores
3
in the config register and0
control register.#768
is hex0x0300
andstrh
writes a ‘half-word’ (16-bits) to the base address (i.e. and0x00
=>0x40020000
and0x03
=>0x40020001
) - [6] Compares the value of status to zero and branches accordingly –
cbnz
is the opcode compare branch on non-zero - [7] Label (
L3
) for the while loop - [8] Unconditional branch to label
L3
- [10] Stores the passed parameter (passed in
r0
) in the data register0x40020002
.
So what’s the problem?
With this example, the generated code is incorrect. Most notably the compiler optimises away the memory read from the while
expression; the port->status
register only gets loaded from memory once (line 3). As a consequence, the compiler generates an infinite loop L3
and doesn’t re-read the status register as a part of the while expression.
What’s not immediately apparent is that it has also optimised away the first write to the control register (port->ctrl = 1
) so our device would never have entered configuration mode in the first place! You may see this referred to as ‘fusing’ memory accesses.
Adding volatile
If you’ve been programming embedded systems for any time (and certainly if you’ve attended any of the Feabhas training courses) you will know that adding volatile
to the pointer definition fixes the problem. The volatile
directive informs the compiler that the object could change outside the program’s flow-of-control; so don’t optimise accesses to the object.
Updating our code so the pointer is volatile
:
volatile Port_t* const port = (Port_t*) 0x40020000;
The compiler now generates correct assembler:
1. write:
2. ldr r3, .L7
3. push {r4}
4. movs r2, #1
5. movs r4, #3
6. movs r1, #0
7. strb r2, [r3]
8. strb r4, [r3, #1]
9. mov r2, r3
10. strb r1, [r3]
11. .L2:
12. ldrb r3, [r2, #3] @ zero_extendqisi2
13. cmp r3, #0
14. beq .L2
15. strb r0, [r2, #2]
16. pop {r4}
17. bx lr
18. .L7:
19. .word 1073872896
Examining the generated instructions we now see:
- [2] Loads the registers base address into
r3
–1073872896
is decimal of0x40020000
- [7] Stores
1
at0x40020000
; the missingport->ctrl = 1
from before - [8] Stores
3
in the config register - [10] Stores
0
control register. - [11] Label (
L2
) for the while loop - [12] Loads
0x40020003
, the status register’s contents, intor3
as part of the loop - [13-14] Compares the value of status to zero and branches accordingly
- [15] Stores the passed parameter (passed in
r0
) in the data register0x40020002
.
Importantly, both the initial config register write [7] and the re-read of the status register [12] are occurring.
Great, but why in the first example is the specific code being optimised away?
Side Effects and Sequence Points
To properly understand why volatile
fixes this, we have to understand the concepts of Side Effects and Sequence Points.
Side Effects
In its simplest form, writing to any object (variable) is a side effect, so the statement:
x = y;
is producing a side effect on the object x
. There is much jargon around here, but anytime we are modifying a memory location, we can consider that a side effect. More generally, a side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.
C allows for multiple side effects in a single expression, e.g.
x = y++;
We now have side effects occurring on both x
and y
.
All statements should have a side effect, e.g.
x == y;
is non-sensical but legal C! A good compiler would generate a warning along the lines of:
<source>:18:14: warning: statement with no effect [-Wunused-value]
x == y;
and there are multiple MISRA-C rules regarding side effects, fundamentally saying there should be one, and only one, side effect in any expression.
Other expressions considered being a side effect are:
- modifying a file
- read a volatile object
- calling a function, after the arguments have been evaluated
So, given:
int x;
volatile int y;
//...
x = y;
the assignment statement has two side effects.
Sequence Points
Sequence points are vital when considering code optimisation. They represent points in our code where we can guarantee certain conditions are true.
As a simple example, given the expression:
x = y + z;
This expression is evaluated for its side effects and there is a sequence point following this evaluation. So what we can rely on is that the sub-expression (y + z
) must be evaluated before the temporary result (called an r-value) is assigned to x
generating a side effect. However, we cannot rely on the order that y
and z
is read from memory, because, of course, in theory it shouldn’t matter (the order of sub-expression evaluation is Unspecified behaviour).
Rather than getting into the rabbit hole that is sequence points (I’ll leave that for another day), I want to return to why it’s affecting the initial code without the use of volatile
.
The important paragraph in the standard C standard specifies:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
What this boils down to is that between sequence points the compiler is free to remove statements where it considers them unnecessary. The two optimisation that matter to us are:
- If there are two or more writes to the same object (side effects) without an interleaving read, then only the last write (side effect) needs to be evaluated
- If there are two or more reads to an object without an interleaving write (side effect), then only the first read needs evaluating (effectively caching the read value locally)
So going back to our original code:
port->ctrl = 1; // Enter configuration mode
port->cfg = 3; // Configure the device
port->ctrl = 0; // Enter operational mode.
Hopefully, you can now appreciate that as there isn’t a read from port->ctrl
between the two writes to port->ctrl
, the compiler evaluates this and, so only evaluates the second write.
Further on:
while(port->status == 0)
{
// Wait for data...
}
It, hopefully, now makes sense why the compiler only evaluates port->status == 0
once and not for each iteration of the loop.
By adding volatile
to the pointer port
, all reads and writes via the pointer are considered side effects, so the compiler does not optimise away any of the access.
Note that had we written:
extern void f(void);
Port_t* const port = (Port_t*) 0x40020000;
int main(void)
{
port->ctrl = 1; // Enter configuration mode
f(); // for a sequence point
port->cfg = 3; // Configure the device
port->ctrl = 0; // Enter operational mode.
while(port->status == 0)
{
// Wait for data...
f(); // force a sequence point
}
port->data = 0x01;
}
Then we would also get the ‘correct’ assembler as the function call to f()
acts as a sequence point. This is why sometimes ‘incorrect’ code can appear work, but then a minor modification (e.g. removing the call to f()
) will suddenly cause the code to stop functioning correctly.
Also low optimisation settings, used quite often during debugging sessions, typically don’t trigger the sequence point/side effect optimisation, so again appear to initially function correctly.
The C99 standard list the following sequence points (Annex C):
- The call to a function, after the arguments have been evaluated (6.5.2.2).
- The end of the first operand of the following operators: logical AND && (6.5.13); logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
- The end of a full declarator: declarators (6.7.5);
- The end of a full expression: an initializer (6.7.8); the expression in an expression statement (6.8.3); the controlling expression of a selection statement (if or switch) (6.8.4); the controlling expression of a while or do statement (6.8.5); each of the expressions of a for statement (6.8.5.3); the expression in a return statement (6.8.6.4).
- Immediately before a library function returns (7.1.4).
- After the actions associated with each formatted input/output function conversion specifier (7.19.6, 7.24.2).
- Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any movement of the objects passed as arguments to that call (7.20.5).
A matter of style – volatile structures
In the example so far, we had made the pointer definition volatile
, e.g.
volatile Port_t* const port = (Port_t*) 0x40020000;
An alternative style is to apply the volatile
directive to the structure elements instead of the pointer definition.
typedef struct {
volatile uint8_t ctrl;
volatile uint8_t cfg;
volatile uint8_t data;
volatile uint8_t status;
} Port_t;
Port_t* const port = (Port_t*) 0x40020000; // non-volatile pointer
This is the style used by CMSIS.
Using #define
with volatile struct elements
Historically C programmers have used #define
in preference to using constant-pointers, so given the following:
typedef struct {
volatile uint8_t ctrl;
volatile uint8_t cfg;
volatile uint8_t data;
volatile uint8_t status;
} Port_t;
The #define
would be:
#define port ((Port_t*)0x40020000)
Using #define
with volatile pointer cast
Finally, rather than the volatile
directive being part of the structure definition, it can be placed as part of the pointer cast in the #define
, e.g.
typedef struct {
uint8_t ctrl;
uint8_t cfg;
uint8_t data;
uint8_t status;
} Port_t;
#define port ((volatile Port_t*)0x40020000)
Coding Style and volatile
When using any volatile
objects, the recommended practice is to avoid using
++
or--
- any op= operator (e.g.
|=
)
Prefer, instead to only use the volatile
object for read (load) and write (store) operations, e.g.
prefer
typedef struct {
uint8_t ctrl;
uint8_t cfg;
uint8_t data;
uint8_t status;
} Port_t;
#define port ((volatile Port_t*)0x40020000)
void f(void)
{
...
unit8_t value = port->ctrl; // load
value |= 0x3; // modify local
port->ctrl = value; // store
...
}
over
typedef struct {
uint8_t ctrl;
uint8_t cfg;
uint8_t data;
uint8_t status;
} Port_t;
#define port ((volatile Port_t*)0x40020000)
void f(void)
{
...
port->ctrl |= 0x3; // hides load, modify, store
...
}
Depending on the his can also help reduce the number of bus access, which in turn can help reduce power consumption.
C11 Atomics
The C11 standard introduces the concept of atomics (<stdatomic.h>
). These don’t change the semantic of volatile
but do have an impact of memory access. Rather than going into them here, I refer you to a previous posting by Glennan The three ‘No’s of sequential consistency.
Glennan’s post is written for a C++ audience, but the concepts are identical. In C++20 many general uses of volatile
are being deprecated.
Summary
Understanding sequence points and side effects is very important to understanding C. The way we write code and the optimisation settings we use can affect the physical memory access model. All register access must be through volatile
objects.
It does not appear to make any difference where volatile
is placed (object definition or type definition), therefore a matter of style. Adding the volatile
to the struct definition has the benefit of simplifying the pointer syntax and would eliminate any potential of it being missed.
The same principles apply to C++, see Making things do stuff – Part 4
- Disassembling a Cortex-M raw binary file with Ghidra - December 20, 2022
- Using final in C++ to improve performance - November 14, 2022
- Understanding Arm Cortex-M Intel-Hex (ihex) files - October 12, 2022
Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.
A very interesting blog...
If "The end of a full expression..." is a sequence point, is there not a sequence point at the end of "port->ctrl = 1;"?
I once (only a few years back in fact) consulted the C90 (ISO/IEC 9899:1990) standard and concluded that:
port->ctrl = 1; // Enter configuration mode
port->cfg = 3; // Configure the device
port->ctrl = 0; // Enter operational mode.
Would (should) work fine because the semicolon at the end of each line constitutes the end of a full expression, and hence a sequence point.
This appears not to be the case, and examination of the generated machine code revealed the problem, and was indeed fixed by "volatile". Cue phone call to compiler vendor to report the "compiler bug" 😉
My misunderstanding was apparently with sequence points. I find it difficult to determine from the standard where a sequence point really is. Is the semicolon at the end of each statement not a sequence point?
I very much look forward to your future post on sequence points another day... "Rather than getting into the rabbit hole that is sequence points (I’ll leave that for another day),"
Thanks again for the great blog.
Frequently volatile keyword is a unknown subject.
Very good explanation.