Making things do stuff – Part 4

In the last article we explored the design of a class to encapsulate a physical hardware device.  In that article I deliberately ignored how the class would actually interact with the hardware.

In this article we explore the options available to us for accessing hardware and the consequences of those choices.

The story so far…

We’ve been designing a class to encapsulate access to a simple GPIO hardware device.  We’ve made some design choices already and have the following class declaration.  I won’t go over these choices again here; have a read of the article for more detail.

namespace STM32F407
{
  enum device
  {
    GPIO_A, GPIO_B, GPIO_C,
    GPIO_D, GPIO_E, GPIO_F,
    GPIO_G, GPIO_H, GPIO_I
  };

  constexpr std::uint32_t peripheral_base { 0x40020000 };
  inline void enable_device(device dev)  { ... }
  inline void disable_device(device dev) { ... }
} // namespace STM32F407 

class GPIO
{
public:
  enum Pin
  {
    Pin00, Pin01, Pin02, Pin03,
    Pin04, Pin05, Pin06, Pin07,
    Pin08, Pin09, Pin10, Pin11,
    Pin12, Pin13, Pin14, Pin15,
  };

  // Construction / destruction
  //
  explicit GPIO(STM32F407::device dev);
  ~GPIO();

  // Copy and move policy
  //
  GPIO(const GPIO&)            = delete;
  GPIO(GPIO&&)                 = delete;
  GPIO& operator=(const GPIO&) = delete;
  GPIO& operator=(GPIO&&)      = delete;

  // Behavioural API
  //
  void set_as_output(Pin pin);
  void set_as_input (Pin pin);

  void set_pin  (Pin pin);
  void clear_pin(Pin pin);
  bool is_set    (Pin pin);

private:
  STM32F407::device ID;
};

Implementation options

There are three mainstream approaches to hardware access available to us:

  • Nested pointers / references
  • Pointer offsets
  • Structure overlay

Nested pointers are the simplest, and probably most common, approach but can have some memory overhead costs.  Pointer offsets and structure overlay are more memory-efficient but can have some shortcomings if not implemented carefully.

Nested pointers / references

Nested pointers, as the name suggests, involves storing a pointer to each hardware register as a private member within the class.

class GPIO
{
public:
  explicit GPIO(STM32F407::device dev);
  ~GPIO();

  // Copy and move policy...

  // Behavioural API...

private:
  STM32F407::device ID;

  volatile std::uint32_t* const mode;
  volatile std::uint32_t* const type;
  volatile std::uint32_t* const speed;
  volatile std::uint32_t* const pull_up_down;
  volatile std::uint32_t* const input_data;
  volatile std::uint32_t* const output_data;
  volatile std::uint32_t* const set_reset;
  volatile std::uint32_t* const lock;
  volatile std::uint32_t* const alt_fn_low;
  volatile std::uint32_t* const alt_fn_high;
};

Note, the pointers are declared as const.  This means the default copy constructor and assignment operator are not available; but since we declared these operations as deleted this does not affect the design of this class.

Since the pointers are constants they must be initialised in the GPIO constructor.  At the moment, though, we don’t have an address to ‘force’ into the pointers; only an enumeration identifying the device.

Luckily, in our case there is a direct mapping between the device’s ID and its address in memory.  We can construct a simple conversion function to do the mapping.

namespace STM32F407
{
  inline constexpr uint32_t device_address(device dev)
  {
    return peripheral_base + (dev << 10);
  }
}

We can use this function when constructing the class.

(Note the ‘volatile’ in the reg32_ptr function below is not necessary; I’ve kept it in there for consistency with the earlier code)

using std::uint32_t;
using STM32F407::device_address;

// Inline function to remove
// code clutter
//
inline
volatile uint32_t* reg32_ptr(uint32_t addr)
{
  return reinterpret_cast<volatile uint32_t*>(addr);
}

GPIO::GPIO(STM32F407::device dev) :
  ID           { dev },
  //                                NOTE: This is NOT pointer
  //                                          arithmetic!
  //                                              |
  mode         { reg32_ptr(device_address(dev) + 0x00) },
  type         { reg32_ptr(device_address(dev) + 0x04) },
  speed        { reg32_ptr(device_address(dev) + 0x08) },
  pull_up_down { reg32_ptr(device_address(dev) + 0x0C) },
  input_data   { reg32_ptr(device_address(dev) + 0x10) },
  output_data  { reg32_ptr(device_address(dev) + 0x14) },
  set_reset    { reg32_ptr(device_address(dev) + 0x18) },
  lock         { reg32_ptr(device_address(dev) + 0x1C) },
  alt_fn_low   { reg32_ptr(device_address(dev) + 0x20) },
  alt_fn_high  { reg32_ptr(device_address(dev) + 0x24) }
{
  STM32F407::enable_device(ID);
}

A small point to note here:  Programmers unused to the above notation often make the mistake of thinking the addition in the pointer initialisers is actually pointer arithmetic.  The addition is done as (unsigned) integers, then cast to a pointer type.

The behavioural member functions of the class can now be implemented in much the same way as we have done previously; for example:

void GPIO::set_pin(GPIO::Pin pin)
{
  *output_data |= (1 << pin);
}

void GPIO::clear_pin(GPIO::Pin pin)
{
  *output_data &= ~(1 << pin);
}

The client code is very clean.

int main()
{
  GPIO port_d { STM32F407::GPIO_D };

  port_d.set_as_output(GPIO::Pin15);

  while(true)
  {
    port_d.set_pin(GPIO::Pin15);
    sleep(1000);
    port_d.clear_pin(GPIO::Pin15);
    sleep(1000);
  }
}

From a performance perspective the code looks very similar to the minimal example we created a previous article (there is the additional overhead of the constructor call – not shown here).

; main() {
;
08000d78:   push    {lr}
08000d7a:   sub     sp, #52              ; Allocate memory for GPIO

; GPIO port_d { STM32F407::GPIO_D };
;
08000d7c:   add     r0, sp, #4           ; r0 = &port_d
08000d7e:   movs    r1, #3               ; r1 = STM32F407::GPIO_D
08000d80:   bl      0x8000cf8            ; GPIO::GPIO()

; port_d.set_as_output(GPIO::Pin15);
;
08000d84:   ldr     r2, [sp, #8]         ; r2  = mode
08000d86:   ldr     r3, [r2, #0]         ; r3  = *r2;
08000d88:   orr.w   r3, r3, #1073741824  ; r3  = r3 | 0x40000000
08000d8c:   str     r3, [r2, #0]         ; *r2 = r3

; while (true) {
; port_d.set_pin(GPIO::Pin15);
;
loop:
08000d8e:   ldr     r2, [sp, #28]        ; r2  = output_data
08000d90:   ldr     r3, [r2, #0]         ; r3  = *r2
08000d92:   orr.w   r3, r3, #32768       ; r3  = r3 | 0x8000
08000d96:   str     r3, [r2, #0]         ; *r2 = r3

; port_d.clear_pin(GPIO::Pin15);
;
08000d98:   ldr     r2, [sp, #28]        ; r2  = output_data
08000d9a:   ldr     r3, [r2, #0]         ; r3  = *r2
08000d9c:   bic.w   r3, r3, #32768       ; r3  = r3 & ~0x8000
08000da0:   str     r3, [r2, #0]         ; *r2 = r3

; }
;
08000da2:   b.n     0x8000d8e            ; goto loop
; }

One of the drawbacks of the nested pointer approach is the amount of memory required for each GPIO object.  In the case of our example each 32-bit hardware register has a software analogue in the form of a 32-bit pointer.  This seems reasonable.  However, if our hardware consisted of 8-bit registers our software object would be four times the size the hardware device it accessed.

Of course, we could omit some of these pointers if we are not exposing the behaviour they allow (for example, only using the default output type and speed).

Pointer offsets

An alternative, and more memory-efficient, implementation can exploit the fact that the hardware registers are always at fixed offsets from some base address.  We can store a single pointer and use pointer arithmetic to access individual registers.  The code can be made more readable by using an enumeration.

class GPIO
{
public:
  explicit GPIO(STM32F407::device dev);
  ~GPIO();

  // Copy and move policy...

  // Behavioural API...

private:
  STM32F407::device ID;

  volatile uint32_t* const registers;  // Base address

  enum Offset                          // Offsets
  {
    mode,
    type,
    speed,
    pull_up_down,
    input_data,
    output_data,
    set_reset,
    lock,
    alt_fn_low,
    alt_fn_high
  };
};

The constructor is commensurately simpler, also.

GPIO::GPIO(STM32F407::device dev) :
  ID        { dev },
  registers { reg32_ptr(device_address(dev)) }
{
  STM32F407::enable_device(ID);
}

In order to access individual registers we now have to perform pointer arithmetic on the base address.

void GPIO::set_pin(GPIO::Pin pin)
{
  *(registers + output_data) |= (1 << pin);
}

void GPIO::clear_pin(GPIO::Pin pin)
{
  *(registers + output_data) &= ~(1 << pin);
}

The readability of the code is starting to suffer now (to say the least).  We can make an improvement by exploiting the relationship between pointer arithmetic and the index operator

void GPIO::set_pin(GPIO::Pin pin)
{
  registers[output_data] |= (1 << pin);
}

void GPIO::clear_pin(GPIO::Pin pin)
{
  registers[output_data] &= ~(1 << pin);
}

We’ve made a potentially significant improvement in our memory-efficiency now:  irrespective of the number of registers the size of the object remains the same – a single pointer (plus any additional management data).

Run-time performance is not affected, either:

; main() {
;
08000d44:   push    {lr}
08000d46:   sub     sp, #12              ; Allocate memory for port_d 

; GPIO port_d { STM32F407::GPIO_D };
;
08000d48:   mov     r0, sp               ; r0 = &port_d
08000d4a:   movs    r1, #3               ; r1 = STM32F407::GPIO_D
08000d4c:   bl      0x8000cf8            ; GPIO::GPIO()

; port_d.set_as_output(GPIO::Pin15);
;
08000d50:   ldr     r2, [sp, #4]         ; r2 = registers 
08000d52:   ldr     r3, [r2, #0]         ; r3 = registers->mode
08000d54:   orr.w   r3, r3, #1073741824  ; r3 = r3 | 0x40000000
08000d58:   str     r3, [r2, #0]         ; registers->mode = r3

; while(true) {
; port_d.set_pin(GPIO::Pin15);
;
loop:
08000d5a:   ldr     r3, [sp, #4]         ; r3 = registers
08000d5c:   ldr     r2, [r3, #20]        ; r2 = registers->output_data
08000d5e:   orr.w   r2, r2, #32768       ; r2 = r2 | 0x8000
08000d62:   str     r2, [r3, #20]        ; registers->output_data = r2

; port_d.clear_pin(GPIO::Pin15);
;
08000d64:   ldr     r2, [r3, #20]        ; r2 = registers->output_data
08000d66:   bic.w   r2, r2, #32768       ; r2 = r2 & ~0x8000
08000d6a:   str     r2, [r3, #20]        ; registers->output_data = r2
08000d6c:   b.n     0x8000d5a            ; goto loop
08000d6e:   nop
; }

Structure overlay

The main limitation of the pointer offset implementation is that all registers must be the same size, and/or any offsets between registers (if they are not contiguous) must be the some multiple of the register size.  This is because the pointer offset implementation basically treats the hardware memory as an array.

In our example the pointer offset implementation is a viable option; but that is not always the case.  It is possible (although less likely these days) that you have different-sized registers, or registers at odd offsets.  For these situations a structure overlay is a good option.

Structure overlay make uses of the fact that the type of a pointer defines not only how much memory to read but also how to interpret it.  Until now we have been using pointers to scalar types – unsigned integers.  There is nothing to stop us declaring a pointer to a user-defined type, with multiple members (in other words, a class or structure)

If we can define a structure that matches our hardware register layout we can ‘overlay’ this structure on memory by declaring a pointer to the struct type and ‘forcing’ an address into the pointer.

A big word of warning here:

By default, the compiler is free to insert padding into a structure’s layout to word-align the members for more efficient access.  This can mean that actual structure has a different size and member offsets to the structure you declared.  This will be invisible from the code; and a pain to debug.

Using static_assert can provide some basic checking by comparing the size of the structure (as generated by the compiler) against the size you expect it to (according to your hardware layout).

When using structures for hardware overlay ALWAYS pack the structures (that is, force the compiler to remove any padding).  Unless you can guarantee that your structure will never be padded.

Unfortunately, structure packing is not part of the C++ standard so the packing instruction is always compiler-specific.

In our example our registers are all 32-bit and contiguous in memory; therefore we can be comfortable that we will have no padding issues.

Here’s the class declaration

class GPIO
{
public:
  explicit GPIO(STM32F407::device dev);
  ~GPIO();

  // Copy and move policy...

  // Behavioural API...

private:
  STM32F407::device ID;

  // Overlay structure
  //
  struct Registers
  {
    uint32_t mode;
    uint32_t type;
    uint32_t speed;
    uint32_t pull_up_down;
    uint32_t input_data;
    uint32_t output_data;
    uint32_t set_reset;
    uint32_t lock;
    uint32_t alt_fn_low;
    uint32_t alt_fn_high;
  };

  // Compile-time check for padding
  // in structure overlay (just in case)
  //
  static_assert(sizeof(Registers) == (sizeof(uint32_t) * 10), 
                "Unexpected padding in structure overlay");

  volatile Registers* const registers;
};

Notice the struct declaration is a private declaration within the GPIO class.  Since this is a declaration it does not add to the size of a GPIO object.

The constructor has to change.  Note we’re now casting our address to a pointer-to-structure.

GPIO::GPIO(STM32F407::device dev) :
  ID        { dev },
  registers { reinterpret_cast<Registers*>(device_address(dev)) }
{
  STM32F407::enable_device(ID);
}

In the member functions we can make use of the pointer-to-member operator.  The compiler will automatically calculate the member offsets from the base address

void GPIO::set_pin(GPIO::Pin pin)
{
  registers->output_data |= (1 << pin);
}

void GPIO::clear_pin(GPIO::Pin pin)
{
  registers->output_data &= ~(1 << pin);
}

From a memory perspective, the structure overlay implementation has the same footprint as the pointer-offset implementation.

From a code-performance perspective?

; main() {
;
08000d44:   push    {lr}
08000d46:   sub     sp, #12              ; Allocate memory for port_d

; GPIO port_d { STM32F407::GPIO_D };
;
08000d48:   mov     r0, sp               ; r0 = &port_d
08000d4a:   movs    r1, #3               ; r1 = STM32F407::GPIO_D
08000d4c:   bl      0x8000cf8            ; GPIO::GPIO()

; port_d.set_as_output(GPIO::Pin15);
;
08000d50:   ldr     r2, [sp, #4]         ; r2 = registers
08000d52:   ldr     r3, [r2, #0]         ; r3 = registers->mode
08000d54:   orr.w   r3, r3, #1073741824  ; r3 = r3 | 0x40000000
08000d58:   str     r3, [r2, #0]         ; registers->mode = r3

while(true) {
; port_d.set_pin(GPIO::Pin15);
;
loop:
08000d5a:   ldr     r3, [sp, #4]         ; r3 = registers
08000d5c:   ldr     r2, [r3, #20]        ; r2 = registers->output_data
08000d5e:   orr.w   r2, r2, #32768       ; r2 = r2 | 0x8000
08000d62:   str     r2, [r3, #20]        ; registers->output_data = r2

; port_d.clear_pin(GPIO::Pin15);
;
08000d64:   ldr     r2, [r3, #20]        ; r2 = registers->output_data
08000d66:   bic.w   r2, r2, #32768       ; r2 = r2 & ~0x8000
08000d6a:   str     r2, [r3, #20]        ; registers->output_data = r2 
08000d6c:   b.n     0x8000d5a            ; goto loop
08000d6e:   nop
;  }

Looks pretty familiar, doesn’t it?

One final option with the structure overlay implementation:  we can hide our implementation using the Pointer-to-Implementation (pImpl) idiom.  Since we are simply declaring a pointer to a structure inside the class declaration the compiler is happy for that to be a pointer to an incomplete (that is, not-yet-defined) structure.

The structure itself must be defined inside the implementation file.

class GPIO
{
public:
  explicit GPIO(STM32F407::device dev);
  ~GPIO();
  
  // Copy and move policy...

  // Behavioural API...

private:
  STM32F407::device ID;

  // Pointer to incomplete type;
  // struct Registers must be defined in
  // the .cpp file
  //
  volatile struct Registers* const registers;
};

There is a price to pay for this, though.  Our member functions are currently inlined for speed, and hence in the header file.  Since the Registers structure is not defined in the header file any more we cannot refer to any of its members.  Therefore we cannot inline our member functions.

Summary

We’ve now got a basic, but working, GPIO class.  I’ve (deliberately) kept the functionality limited and ignored all the basic error-checking and validation code needed to make this class production-ready.

There are three basic implementations:

  • Nested pointers are simple but not especially memory-efficient
  • Pointer offsets are efficient but limited to same-size registers and (effectively) contiguous register alignment
  • Structure overlay allows a more flexible arrangement than pointer offsets.

In the next article I want to take a short aside to look at a feature of C++ that some can be used for hardware access, although it comes with some issues – placement new

Glennan Carnie
Dislike (0)
Website | + posts

Glennan is an embedded systems and software engineer with over 20 years experience, mostly in high-integrity systems for the defence and aerospace industry.

He specialises in C++, UML, software modelling, Systems Engineering and process development.

About Glennan Carnie

Glennan is an embedded systems and software engineer with over 20 years experience, mostly in high-integrity systems for the defence and aerospace industry. He specialises in C++, UML, software modelling, Systems Engineering and process development.
This entry was posted in ARM, C/C++ Programming, Cortex, General and tagged , , , , , , . Bookmark the permalink.

Leave a Reply