In the last article we explored the design of a class to encapsulate a physical hardware device. In that article I deliberately ignored how the class would actually interact with the hardware.
In this article we explore the options available to us for accessing hardware and the consequences of those choices.
The story so far…
We’ve been designing a class to encapsulate access to a simple GPIO hardware device. We’ve made some design choices already and have the following class declaration. I won’t go over these choices again here; have a read of the article for more detail.
namespace STM32F407 { enum device { GPIO_A, GPIO_B, GPIO_C, GPIO_D, GPIO_E, GPIO_F, GPIO_G, GPIO_H, GPIO_I }; constexpr std::uint32_t peripheral_base { 0x40020000 }; inline void enable_device(device dev) { ... } inline void disable_device(device dev) { ... } } // namespace STM32F407 class GPIO { public: enum Pin { Pin00, Pin01, Pin02, Pin03, Pin04, Pin05, Pin06, Pin07, Pin08, Pin09, Pin10, Pin11, Pin12, Pin13, Pin14, Pin15, }; // Construction / destruction // explicit GPIO(STM32F407::device dev); ~GPIO(); // Copy and move policy // GPIO(const GPIO&) = delete; GPIO(GPIO&&) = delete; GPIO& operator=(const GPIO&) = delete; GPIO& operator=(GPIO&&) = delete; // Behavioural API // void set_as_output(Pin pin); void set_as_input (Pin pin); void set_pin (Pin pin); void clear_pin(Pin pin); bool is_set (Pin pin); private: STM32F407::device ID; };
Implementation options
There are three mainstream approaches to hardware access available to us:
- Nested pointers / references
- Pointer offsets
- Structure overlay
Nested pointers are the simplest, and probably most common, approach but can have some memory overhead costs. Pointer offsets and structure overlay are more memory-efficient but can have some shortcomings if not implemented carefully.
Nested pointers / references
Nested pointers, as the name suggests, involves storing a pointer to each hardware register as a private member within the class.
class GPIO { public: explicit GPIO(STM32F407::device dev); ~GPIO(); // Copy and move policy... // Behavioural API... private: STM32F407::device ID; volatile std::uint32_t* const mode; volatile std::uint32_t* const type; volatile std::uint32_t* const speed; volatile std::uint32_t* const pull_up_down; volatile std::uint32_t* const input_data; volatile std::uint32_t* const output_data; volatile std::uint32_t* const set_reset; volatile std::uint32_t* const lock; volatile std::uint32_t* const alt_fn_low; volatile std::uint32_t* const alt_fn_high; };
Note, the pointers are declared as const. This means the default copy constructor and assignment operator are not available; but since we declared these operations as deleted this does not affect the design of this class.
Since the pointers are constants they must be initialised in the GPIO constructor. At the moment, though, we don’t have an address to ‘force’ into the pointers; only an enumeration identifying the device.
Luckily, in our case there is a direct mapping between the device’s ID and its address in memory. We can construct a simple conversion function to do the mapping.
namespace STM32F407 { inline constexpr uint32_t device_address(device dev) { return peripheral_base + (dev << 10); } }
We can use this function when constructing the class.
(Note the ‘volatile’ in the reg32_ptr function below is not necessary; I’ve kept it in there for consistency with the earlier code)
using std::uint32_t; using STM32F407::device_address; // Inline function to remove // code clutter // inline volatile uint32_t* reg32_ptr(uint32_t addr) { return reinterpret_cast<volatile uint32_t*>(addr); } GPIO::GPIO(STM32F407::device dev) : ID { dev }, // NOTE: This is NOT pointer // arithmetic! // | mode { reg32_ptr(device_address(dev) + 0x00) }, type { reg32_ptr(device_address(dev) + 0x04) }, speed { reg32_ptr(device_address(dev) + 0x08) }, pull_up_down { reg32_ptr(device_address(dev) + 0x0C) }, input_data { reg32_ptr(device_address(dev) + 0x10) }, output_data { reg32_ptr(device_address(dev) + 0x14) }, set_reset { reg32_ptr(device_address(dev) + 0x18) }, lock { reg32_ptr(device_address(dev) + 0x1C) }, alt_fn_low { reg32_ptr(device_address(dev) + 0x20) }, alt_fn_high { reg32_ptr(device_address(dev) + 0x24) } { STM32F407::enable_device(ID); }
A small point to note here: Programmers unused to the above notation often make the mistake of thinking the addition in the pointer initialisers is actually pointer arithmetic. The addition is done as (unsigned) integers, then cast to a pointer type.
The behavioural member functions of the class can now be implemented in much the same way as we have done previously; for example:
void GPIO::set_pin(GPIO::Pin pin) { *output_data |= (1 << pin); } void GPIO::clear_pin(GPIO::Pin pin) { *output_data &= ~(1 << pin); }
The client code is very clean.
int main() { GPIO port_d { STM32F407::GPIO_D }; port_d.set_as_output(GPIO::Pin15); while(true) { port_d.set_pin(GPIO::Pin15); sleep(1000); port_d.clear_pin(GPIO::Pin15); sleep(1000); } }
From a performance perspective the code looks very similar to the minimal example we created a previous article (there is the additional overhead of the constructor call – not shown here).
; main() { ; 08000d78: push {lr} 08000d7a: sub sp, #52 ; Allocate memory for GPIO ; GPIO port_d { STM32F407::GPIO_D }; ; 08000d7c: add r0, sp, #4 ; r0 = &port_d 08000d7e: movs r1, #3 ; r1 = STM32F407::GPIO_D 08000d80: bl 0x8000cf8 ; GPIO::GPIO() ; port_d.set_as_output(GPIO::Pin15); ; 08000d84: ldr r2, [sp, #8] ; r2 = mode 08000d86: ldr r3, [r2, #0] ; r3 = *r2; 08000d88: orr.w r3, r3, #1073741824 ; r3 = r3 | 0x40000000 08000d8c: str r3, [r2, #0] ; *r2 = r3 ; while (true) { ; port_d.set_pin(GPIO::Pin15); ; loop: 08000d8e: ldr r2, [sp, #28] ; r2 = output_data 08000d90: ldr r3, [r2, #0] ; r3 = *r2 08000d92: orr.w r3, r3, #32768 ; r3 = r3 | 0x8000 08000d96: str r3, [r2, #0] ; *r2 = r3 ; port_d.clear_pin(GPIO::Pin15); ; 08000d98: ldr r2, [sp, #28] ; r2 = output_data 08000d9a: ldr r3, [r2, #0] ; r3 = *r2 08000d9c: bic.w r3, r3, #32768 ; r3 = r3 & ~0x8000 08000da0: str r3, [r2, #0] ; *r2 = r3 ; } ; 08000da2: b.n 0x8000d8e ; goto loop ; }
One of the drawbacks of the nested pointer approach is the amount of memory required for each GPIO object. In the case of our example each 32-bit hardware register has a software analogue in the form of a 32-bit pointer. This seems reasonable. However, if our hardware consisted of 8-bit registers our software object would be four times the size the hardware device it accessed.
Of course, we could omit some of these pointers if we are not exposing the behaviour they allow (for example, only using the default output type and speed).
Pointer offsets
An alternative, and more memory-efficient, implementation can exploit the fact that the hardware registers are always at fixed offsets from some base address. We can store a single pointer and use pointer arithmetic to access individual registers. The code can be made more readable by using an enumeration.
class GPIO { public: explicit GPIO(STM32F407::device dev); ~GPIO(); // Copy and move policy... // Behavioural API... private: STM32F407::device ID; volatile uint32_t* const registers; // Base address enum Offset // Offsets { mode, type, speed, pull_up_down, input_data, output_data, set_reset, lock, alt_fn_low, alt_fn_high }; };
The constructor is commensurately simpler, also.
GPIO::GPIO(STM32F407::device dev) : ID { dev }, registers { reg32_ptr(device_address(dev)) } { STM32F407::enable_device(ID); }
In order to access individual registers we now have to perform pointer arithmetic on the base address.
void GPIO::set_pin(GPIO::Pin pin) { *(registers + output_data) |= (1 << pin); } void GPIO::clear_pin(GPIO::Pin pin) { *(registers + output_data) &= ~(1 << pin); }
The readability of the code is starting to suffer now (to say the least). We can make an improvement by exploiting the relationship between pointer arithmetic and the index operator
void GPIO::set_pin(GPIO::Pin pin) { registers[output_data] |= (1 << pin); } void GPIO::clear_pin(GPIO::Pin pin) { registers[output_data] &= ~(1 << pin); }
We’ve made a potentially significant improvement in our memory-efficiency now: irrespective of the number of registers the size of the object remains the same – a single pointer (plus any additional management data).
Run-time performance is not affected, either:
; main() { ; 08000d44: push {lr} 08000d46: sub sp, #12 ; Allocate memory for port_d ; GPIO port_d { STM32F407::GPIO_D }; ; 08000d48: mov r0, sp ; r0 = &port_d 08000d4a: movs r1, #3 ; r1 = STM32F407::GPIO_D 08000d4c: bl 0x8000cf8 ; GPIO::GPIO() ; port_d.set_as_output(GPIO::Pin15); ; 08000d50: ldr r2, [sp, #4] ; r2 = registers 08000d52: ldr r3, [r2, #0] ; r3 = registers->mode 08000d54: orr.w r3, r3, #1073741824 ; r3 = r3 | 0x40000000 08000d58: str r3, [r2, #0] ; registers->mode = r3 ; while(true) { ; port_d.set_pin(GPIO::Pin15); ; loop: 08000d5a: ldr r3, [sp, #4] ; r3 = registers 08000d5c: ldr r2, [r3, #20] ; r2 = registers->output_data 08000d5e: orr.w r2, r2, #32768 ; r2 = r2 | 0x8000 08000d62: str r2, [r3, #20] ; registers->output_data = r2 ; port_d.clear_pin(GPIO::Pin15); ; 08000d64: ldr r2, [r3, #20] ; r2 = registers->output_data 08000d66: bic.w r2, r2, #32768 ; r2 = r2 & ~0x8000 08000d6a: str r2, [r3, #20] ; registers->output_data = r2 08000d6c: b.n 0x8000d5a ; goto loop 08000d6e: nop ; }
Structure overlay
The main limitation of the pointer offset implementation is that all registers must be the same size, and/or any offsets between registers (if they are not contiguous) must be the some multiple of the register size. This is because the pointer offset implementation basically treats the hardware memory as an array.
In our example the pointer offset implementation is a viable option; but that is not always the case. It is possible (although less likely these days) that you have different-sized registers, or registers at odd offsets. For these situations a structure overlay is a good option.
Structure overlay make uses of the fact that the type of a pointer defines not only how much memory to read but also how to interpret it. Until now we have been using pointers to scalar types – unsigned integers. There is nothing to stop us declaring a pointer to a user-defined type, with multiple members (in other words, a class or structure)
If we can define a structure that matches our hardware register layout we can ‘overlay’ this structure on memory by declaring a pointer to the struct type and ‘forcing’ an address into the pointer.
A big word of warning here:
By default, the compiler is free to insert padding into a structure’s layout to word-align the members for more efficient access. This can mean that actual structure has a different size and member offsets to the structure you declared. This will be invisible from the code; and a pain to debug.
Using static_assert can provide some basic checking by comparing the size of the structure (as generated by the compiler) against the size you expect it to (according to your hardware layout).
When using structures for hardware overlay ALWAYS pack the structures (that is, force the compiler to remove any padding). Unless you can guarantee that your structure will never be padded.
Unfortunately, structure packing is not part of the C++ standard so the packing instruction is always compiler-specific.
In our example our registers are all 32-bit and contiguous in memory; therefore we can be comfortable that we will have no padding issues.
Here’s the class declaration
class GPIO { public: explicit GPIO(STM32F407::device dev); ~GPIO(); // Copy and move policy... // Behavioural API... private: STM32F407::device ID; // Overlay structure // struct Registers { uint32_t mode; uint32_t type; uint32_t speed; uint32_t pull_up_down; uint32_t input_data; uint32_t output_data; uint32_t set_reset; uint32_t lock; uint32_t alt_fn_low; uint32_t alt_fn_high; }; // Compile-time check for padding // in structure overlay (just in case) // static_assert(sizeof(Registers) == (sizeof(uint32_t) * 10), "Unexpected padding in structure overlay"); volatile Registers* const registers; };
Notice the struct declaration is a private declaration within the GPIO class. Since this is a declaration it does not add to the size of a GPIO object.
The constructor has to change. Note we’re now casting our address to a pointer-to-structure.
GPIO::GPIO(STM32F407::device dev) : ID { dev }, registers { reinterpret_cast<Registers*>(device_address(dev)) } { STM32F407::enable_device(ID); }
In the member functions we can make use of the pointer-to-member operator. The compiler will automatically calculate the member offsets from the base address
void GPIO::set_pin(GPIO::Pin pin) { registers->output_data |= (1 << pin); } void GPIO::clear_pin(GPIO::Pin pin) { registers->output_data &= ~(1 << pin); }
From a memory perspective, the structure overlay implementation has the same footprint as the pointer-offset implementation.
From a code-performance perspective?
; main() { ; 08000d44: push {lr} 08000d46: sub sp, #12 ; Allocate memory for port_d ; GPIO port_d { STM32F407::GPIO_D }; ; 08000d48: mov r0, sp ; r0 = &port_d 08000d4a: movs r1, #3 ; r1 = STM32F407::GPIO_D 08000d4c: bl 0x8000cf8 ; GPIO::GPIO() ; port_d.set_as_output(GPIO::Pin15); ; 08000d50: ldr r2, [sp, #4] ; r2 = registers 08000d52: ldr r3, [r2, #0] ; r3 = registers->mode 08000d54: orr.w r3, r3, #1073741824 ; r3 = r3 | 0x40000000 08000d58: str r3, [r2, #0] ; registers->mode = r3 while(true) { ; port_d.set_pin(GPIO::Pin15); ; loop: 08000d5a: ldr r3, [sp, #4] ; r3 = registers 08000d5c: ldr r2, [r3, #20] ; r2 = registers->output_data 08000d5e: orr.w r2, r2, #32768 ; r2 = r2 | 0x8000 08000d62: str r2, [r3, #20] ; registers->output_data = r2 ; port_d.clear_pin(GPIO::Pin15); ; 08000d64: ldr r2, [r3, #20] ; r2 = registers->output_data 08000d66: bic.w r2, r2, #32768 ; r2 = r2 & ~0x8000 08000d6a: str r2, [r3, #20] ; registers->output_data = r2 08000d6c: b.n 0x8000d5a ; goto loop 08000d6e: nop ; }
Looks pretty familiar, doesn’t it?
One final option with the structure overlay implementation: we can hide our implementation using the Pointer-to-Implementation (pImpl) idiom. Since we are simply declaring a pointer to a structure inside the class declaration the compiler is happy for that to be a pointer to an incomplete (that is, not-yet-defined) structure.
The structure itself must be defined inside the implementation file.
class GPIO { public: explicit GPIO(STM32F407::device dev); ~GPIO(); // Copy and move policy... // Behavioural API... private: STM32F407::device ID; // Pointer to incomplete type; // struct Registers must be defined in // the .cpp file // volatile struct Registers* const registers; };
There is a price to pay for this, though. Our member functions are currently inlined for speed, and hence in the header file. Since the Registers structure is not defined in the header file any more we cannot refer to any of its members. Therefore we cannot inline our member functions.
Summary
We’ve now got a basic, but working, GPIO class. I’ve (deliberately) kept the functionality limited and ignored all the basic error-checking and validation code needed to make this class production-ready.
There are three basic implementations:
- Nested pointers are simple but not especially memory-efficient
- Pointer offsets are efficient but limited to same-size registers and (effectively) contiguous register alignment
- Structure overlay allows a more flexible arrangement than pointer offsets.
In the next article I want to take a short aside to look at a feature of C++ that some can be used for hardware access, although it comes with some issues – placement new
- Practice makes perfect, part 3 – Idiomatic kata - February 27, 2020
- Practice makes perfect, part 2– foundation kata - February 13, 2020
- Practice makes perfect, part 1 – Code kata - January 30, 2020
Glennan is an embedded systems and software engineer with over 20 years experience, mostly in high-integrity systems for the defence and aerospace industry.
He specialises in C++, UML, software modelling, Systems Engineering and process development.