When working with peripherals, we need to be able to read and write to the device’s internal registers. How we achieve this in C depends on whether we’re working with memory-mapped IO or port-mapped IO. Port-mapped IO typically requires compiler/language extensions, whereas memory-mapped IO can be accommodated with the standard C syntax.
Contents
Embedded “Hello, World!”
We all know the embedded equivalent of the “Hello, world!” program is flashing the LED, so true to form I’m going to use that as an example.
The examples are based on a STM32F407 chip using the GNU Arm Embedded Toolchain .
The STM32F4 uses a port-based GPIO (General Purpose Input Output) model, where each port can manage 16 physical pins. The LEDS are mapped to external pins 55-58 which maps internally onto GPIO Port D pins 8-11.
Flashing the LEDs
Flashing the LEDs is fairly straightforward, at the port level there are only two registers we are interested in.
- Mode Register – this defines, on a pin-by-pin basis what its function is, e.g. we want this pin to behave as an output pin.
- Output Data Register – Writing a ‘
1
‘ to the appropriate pin will generate voltage and writing a ‘0
‘ will ground the pin.
Mode Register (MODER)
Each port pin has four modes of operation, thus requiring two configuration bits per pin (pin 0 is configured using mode bits 0-1, pin 2 uses mode bits 2-3, and so on):
00
Input01
Output10
Alternative function (details configured via other registers)11
Analogue
So, for example, to configure pin 8 for output, we must write the value 01 into bits 16 and 17 in the MODER register (that is, bit 16 => 1, bit 17 => 0).
Output Data Register (ODR)
In the Output Data Register (ODR) each bit represents an I/O pin on the port. The bit number matches the pin number.
If a pin is set to output (in the MODER register) then writing a 1 into the appropriate bit will drive the I/O pin high. Writing 0 into the appropriate bit will drive the I/O pin low.
There are 16 IO pins, but the register is 32bits wide. Reserved bits are read as ‘0’.
Port D Addresses
The absolute addresses for the MODER and ODR of Port D are:
- MODER –
0x40020C00
- ODR –
0x40020C14
Pointer access to registers
Typically when we access registers in C based on memory-mapped IO we use a pointer notation to ‘trick’ the compiler into generating the correct load/store operations at the absolute address needed.
So for the Port D we might see something along the lines of (I’ll keep the code brief and use magic numbers) for simplicity):
#include <stdint.h>
volatile uint32_t* const portd_moder = (uint32_t*) 0x40020C00;
volatile uint32_t* const portd_odr = (uint32_t*) 0x40020C14;
extern void sleep(uint32_t ms); // use systick to busy-wait
int main(void)
{
uint32_t moder = *portd_moder;
moder |= (1 << 16);
moder &= ~(1 << 17);
*portd_moder = moder;
while(1) {
*portd_odr |= (1 << 8); // led-on
sleep(500);
*portd_odr &= ~(1 << 8); // led-off
sleep(500);
}
}
Alternatively we may see the registers defined using the pre-processors, e.g.
#include <stdint.h>
#define PORTD_MODER (*((volatile uint32_t*) 0x40020C00))
#define PORTD_ODR (*((volatile uint32_t*) 0x40020C14))
extern void sleep(uint32_t ms); // use systick to busy-wait
int main(void)
{
uint32_t moder = PORTD_MODER;
moder |= (1 << 16);
moder &= ~(1 << 17);
PORTD_MODER = moder;
while(1) {
PORTD_ODR |= (1 << 8); // led-on
sleep(500);
PORTD_ODR &= ~(1 << 8); // led-off
sleep(500);
}
}
There is a misconception among many C programmers that the pointer model is less efficient than the #define
model. With C99 and modern compilers this is not the case, they will generate identical code (C99 allows for the complier to optimise away const
objects).
Enabling Port D
We are missing one final step; each peripheral on the the STM32F407 is clock gated. The clock signal does not reach the peripheral until we tell it to do so by way of setting a bit in a specific register. By default, clock signals never reach peripherals that are not in use, thus saving power.
To enable the clock to reach the GPIO port D the GPIODEN (GPIO D Enable) bit (bit 3) of the AHB1ENR (AMBA High-performance Bus 1 Enable) register in the RCC (Reset and Clock Control) peripheral needs setting.
#include <stdint.h>
volatile uint32_t* const portd_moder = (uint32_t*) 0x40020C00;
volatile uint32_t* const portd_odr = (uint32_t*) 0x40020C14;
volatile uint32_t* const rcc_ahb1enr = (uint32_t*) 0x40023830;
extern void sleep(uint32_t ms); // use systick to busy-wait
int main(void)
{
*rcc_ahb1enr |= (1 << 3); // enable PortD's clock
uint32_t moder = *portd_moder;
moder |= (1 << 16);
moder &= ~(1 << 17);
*portd_moder = moder;
while(1) {
*portd_odr |= (1 << 8); // led-on
sleep(500);
*portd_odr &= ~(1 << 8); // led-off
sleep(500);
}
}
Using structs
The code so far works just fine, but has a number of shortcomings.
First, to support multiple IO ports we would have to define a set of pointers for each set of registers for each port, e.g.:
volatile uint32_t* const porta_moder = (uint32_t*) 0x40020000;
volatile uint32_t* const porta_odr = (uint32_t*) 0x40020014;
volatile uint32_t* const portb_moder = (uint32_t*) 0x40020400;
volatile uint32_t* const portb_odr = (uint32_t*) 0x40020414;
volatile uint32_t* const portc_moder = (uint32_t*) 0x40020800;
volatile uint32_t* const portc_odr = (uint32_t*) 0x40020014;
volatile uint32_t* const portd_moder = (uint32_t*) 0x40020C00;
volatile uint32_t* const portd_odr = (uint32_t*) 0x40020C14;
volatile uint32_t* const porte_moder = (uint32_t*) 0x40021000;
volatile uint32_t* const porte_odr = (uint32_t*) 0x40021014;
Considering the port actually has 10 different registers we may want to access, this involves a lot of repetition. Where there is repetition, simple to make, but difficult to track down bugs can creep in (did you spot the deliberate mistake?).
In addition, and more significantly, we can see that the port’s ODR is always 0x14
bytes offset from the MODER. The MODER is always at offset 0x00
from the port address (this the MODER is also the port’s base address).
In Software Engineering terms we’d view this separate declaration of related pointers
as a lack of cohesion in the code. One of our goals is to strive for high cohesion, thus grouping things together that should naturally be together (as change effects them all).
struct Overlay
The full register layout for the STM32F4 GPIO port is shown below:
By using a struct to define the relative memory offsets, we can get the compiler to generate all the correct address accesses relative to the base address.
#include <stdint.h>
typedef struct
{
uint32_t MODER; // mode register, offset: 0x00
uint32_t OTYPER; // output type register, offset: 0x04
uint32_t OSPEEDR; // output speed register, offset: 0x08
uint32_t PUPDR; // pull-up/pull-down register, offset: 0x0C
uint32_t IDR; // input data register, offset: 0x10
uint32_t ODR; // output data register, offset: 0x14
uint32_t BSRR; // bit set/reset register, offset: 0x18
uint32_t LCKR; // configuration lock register, offset: 0x1C
uint32_t AFRL; // GPIO alternate function registers, offset: 0x20
uint32_t AFRH; // GPIO alternate function registers, offset: 0x24
} GPIO_t;
Now we define the pointer as before, but this time using the struct type rather than a uint32_t
:
volatile GPIO_t* const portd = (GPIO_t*)0x40020C00;
Finally we can use it as before, but this time use struct-pointer dereferencing to access the individual registers:
int main(void)
{
*rcc_ahb1enr |= (1 << 3); // enable PortD's clock
uint32_t moder = portd->MODER;
moder |= (1 << 16);
moder &= ~(1 << 17);
portd->MODER = moder;
while (1) {
portd->ODR |= (1 << 8); // led-on
sleep(500);
portd->ODR &= ~(1 << 8); // led-off
sleep(500);
}
}
Now when we access the ODR via the statement:
portd->ODR |= (1 << 8); // led-on
the compiler can calculate the relative offset (0x14) of the ODR member relative to the base address of the pointer (0x40020C00).
This means that we only need one pointer per port rather than 10, e.g.
volatile GPIO_t* const porta = (GPIO_t*)0x40020000;
volatile GPIO_t* const portb = (GPIO_t*)0x40020400;
volatile GPIO_t* const portc = (GPIO_t*)0x40020800;
volatile GPIO_t* const portd = (GPIO_t*)0x40020C00;
volatile GPIO_t* const porte = (GPIO_t*)0x40021000;
Alternatively we could do the same with #define
s;
#define PORTA ((volatile GPIO_t*) 0x40020000)
#define PORTB ((volatile GPIO_t*) 0x40020400)
#define PORTC ((volatile GPIO_t*) 0x40020800)
#define PORTD ((volatile GPIO_t*) 0x40020C00)
#define PORTE ((volatile GPIO_t*) 0x40021000)
Note in the #define
s the leading ‘*
‘ as a dereference has been dropped, so access to the register is coded thus:
PORTD->ODR |= (1 << 8); // led-on
If we left the dereference in:
#define PORTD (*((volatile GPIO_t) 0x40020C00))
the code would be:
PORTD.ODR |= (1 << 8); // led-on
It’s a matter of style, the generated instructions are the same.
Code Comparison
So how does the struct code expression compare to our original pointer code (compiled with optimisation flag -Og
):
Original code
$ arm-none-eabi-objdump -d -S main.o
...
*portd_odr |= (1 << 8); // led-on
1a: 4c0b ldr r4, [pc, #44] ; (48 <main+0x48>)
1c: 6823 ldr r3, [r4, #0]
1e: f443 7380 orr.w r3, r3, #256 ; 0x100
22: 6023 str r3, [r4, #0]
...
The assembler code does the following:
- Load the value 0x40020C14 into r4
- Read the contents of 0x40020C14 [r4 + 0] as a 32-bit value into r3
- Or 0x100 with the contents of r3 (set bit 8)
- Store r3 as a 32-bit value at address 0x40020C14
Comparing this to the struct access:
$ arm-none-eabi-objdump -d -S main.o
...
portd->ODR |= (1 << 8); // led-on
1a: 4c0a ldr r4, [pc, #40] ; (44 <main+0x44>)
1c: 6963 ldr r3, [r4, #20]
1e: f443 7380 orr.w r3, r3, #256 ; 0x100
22: 6163 str r3, [r4, #20]
...
So how does this differ? only in the use of an offset-load:
- Load the value 0x40020C00 into r4
- Read the contents of 0x40020C14 [r4 + 20] as a 32-bit value into r3
- Or the value 0x100 with the contents of r3
- Store r3 as a 32-bit value at address 0x40020C14 – [r4 + 0x14]
This code demonstrates that, from a size and performance perspective, there is no difference between the two approaches (at least for the Arm).
Note: An Arm load (ldr
) instruction with or without a secondary offset takes 2-cycles.
Caveats
Before rush off and refactor legacy code to now use structs there are a couple of factors we are relying on, which may vary from compiler to compiler.
First, what can we be sure of?
- The offset of the first struct member is always 0x0 from the objects address (this is not guaranteed in C++ but usually is the case).
- The compiler cannot reorder the members, so OTYPER will always come at a higher address in memory than MODER and at a lower than OSPEEDR.
However, we cannot guarantee that the compiler will not introduce padding between members, as the standard states:
There may be unnamed padding within a structure object, but not at its beginning.
So we cannot guarantee that address of OTYPER is equal to the address of MODER + 4 bytes.
That said, in practical terms, with modern compilers, it is unlikely to be a problem (for this code). Padding tends to occur when a data member crosses its natural boundary (i.e. a 32-bit type is not word aligned). e.g.
typedef struct
{
int a;
char b;
int c;
} Padding_t;
would likely return a result of 12 from sizeof(Padding_t);
because 3 paddings bytes
are added after char b
to align the int c
definition.
Mitigating the risk
The obvious, and most straightforward, approach is to ensure you have a unit test that checks the size of the generated structure, e.g.
void test_GPIO_t_struct_size(void)
{
TEST_ASSERT_EQUAL(40, sizeof(GPIO_t));
}
Alternatively, one of the compelling reasons to use C11 is the introduction of static_assert[link]
, e.g.
int main(void)
{
static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present");
}
This is a compile-time check; if padding was present, then the following compiler error is generated:
src/main.c: In function 'main':
src/main.c:87:3: error: static assertion failed: "padding in GPIO_t present"
static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present");
^
If you’re not using C11 (I’ve yet to come across an embedded C project using it) then a final approach is to try and ensure no padding is present by requesting the compiler ‘pack’ the struct to the most optimal memory model.
This is always a compiler-specific request, which may be done through #pragma
s. However GCC uses its own ‘attribute’ approach instead of pragmas.
Defining the structure with the attribute ‘packed
‘ will normally remove any potential padding, e.g.
typedef struct
{
uint32_t MODER; // mode register, offset: 0x00
uint32_t OTYPER; // output type register, offset: 0x04
uint32_t OSPEEDR; // output speed register, offset: 0x08
uint32_t PUPDR; // pull-up/pull-down register, offset: 0x0C
uint32_t IDR; // input data register, offset: 0x10
uint32_t ODR; // output data register, offset: 0x14
uint32_t BSRR; // bit set/reset register, offset: 0x18
uint32_t LCKR; // configuration lock register, offset: 0x1C
uint32_t AFRL; // alternate function registers, offset: 0x20
uint32_t AFRH; // alternate function registers, offset: 0x24
} __attribute__((packed)) GPIO_t;
typedef struct
{
int a;
char b;
int c;
} __attribute__((packed)) Padding_t;
int main(void)
{
static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present");
static_assert(sizeof(Padding_t) == 9, "padding in Padding_t present");
}
Unaligned access can cause a whole host of problems and performance issues, so be extremely careful using packing.
Vendor Supplied Headers
On most modern microcontrollers you are likely to find headers provided with register definitions already supplied. Many years ago Arm introduced the
Cortex Micro-controller Software Interface Standard (CMSIS). As part of the standard it is expected that between Arm and the Vendor, register definitions will be supplied.
For example, ST supply a series for headers for their STM32 family of microcontrollers. Searching out the ST provided file stm32f407xx.h
you will find definitions for all peripheral included in the 407 variant.
On line 544 of this header file (based on version V2.1.0) you will find the following definition:
typedef struct
{
__IO uint32_t MODER; /*!< GPIO port mode register, Address offset: 0x00 */
__IO uint32_t OTYPER; /*!< GPIO port output type register, Address offset: 0x04 */
__IO uint32_t OSPEEDR; /*!< GPIO port output speed register, Address offset: 0x08 */
__IO uint32_t PUPDR; /*!< GPIO port pull-up/pull-down register, Address offset: 0x0C */
__IO uint32_t IDR; /*!< GPIO port input data register, Address offset: 0x10 */
__IO uint32_t ODR; /*!< GPIO port output data register, Address offset: 0x14 */
__IO uint16_t BSRRL; /*!< GPIO port bit set/reset low register, Address offset: 0x18 */
__IO uint16_t BSRRH; /*!< GPIO port bit set/reset high register, Address offset: 0x1A */
__IO uint32_t LCKR; /*!< GPIO port configuration lock register, Address offset: 0x1C */
__IO uint32_t AFR[2]; /*!< GPIO alternate function registers, Address offset: 0x20-0x24 */
} GPIO_TypeDef;
This is a slightly different interpretation of the register layout from earlier, notably:
- The BSRR has been split into two 16-bit register (BSRRL and BSRRH)
- The AFR has been combined into an array of two elements (rather than a High and Low).
There could be a risk of padding between BSRRL and BSRRH, but unlikely and does not occur here.
The __IO
macro simply maps onto volatile
. There is a macro for __I
(volatile const) to define ‘read only’ access (there is a __O
(volatile) to indicate ‘write only’ access – but this can’t be enforced in C).
Further down in the file (line 1130):
#define GPIOD ((GPIO_TypeDef *) GPIOD_BASE)
Again, another slight difference in the code is the choice to put the volatile directive in the struct rather than at the pointer definition.
The RCC struct definition is on line 615 with the #define
on line 1137.
The CMSIS code to drive the LED is:
#include "stm32f407xx.h"
#include "timer.h"
int main(void)
{
RCC->AHB1ENR = (1 << 3);
uint32_t moder = GPIOD->MODER;
moder |= (1 << 16);
moder &= ~(1 << 17);
GPIOD->MODER = moder;
while (1) {
GPIOD->ODR |= (1 << 8); // led-on
sleep(500);
GPIOD->ODR &= ~(1 << 8); // led-off
sleep(500);
}
}
In summary
Programs are decomposed into modules in several ways of which one is chosen during the design process (assuming design happens!). The choice of decomposition has a critical effect on the architecturel and thus the product’s quality attributes such as maintainability, reliability, modifiability, and testability of the final system.
Cohesion is one of the most important concepts in software decomposition. High cohesion is central to good design principles and patterns, guiding separation of concerns and maintainability.
Using a struct-based model for device access improves cohesion through good abstraction models, making code easier to understand and maintain.
In the next article I shall start to compare the relative merits and consequences of using the #define
model verse the pointer model.
- Navigating Memory in C++: A Guide to Using std::uintptr_t for Address Handling - February 22, 2024
- Embedded Expertise: Beyond Fixed-Size Integers; Exploring Fast and Least Types - January 15, 2024
- Disassembling a Cortex-M raw binary file with Ghidra - December 20, 2022
Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.
interesting especially on the comparison point vs define
Personally I find the use of register and magic number too dangerous (but that's not the point, I know)
On STM32, there are now Low level drivers that do the same thing but with more control and security
So now on stm32,imho I think there is no reason to develop this kind of register access
Any decent coding style gudieline will advise aginst the use of 'Magic Numbers' (note that there isn't a MISRA-C:2012 rule deprecating their use).
Regarding your comment that on the STM32 there is no reason to develop drivers like this:
First, outside of a context it's difficult to respond, but I strongly disagree:
1. Writing drivers is a major education tool for better code understanding - especially in embedded
2. If you are going to use existing drivers it helps in understanding their function
3. Read the header comments of your drivers; they'll all have the "AS IS" clause - any problems are yours and you get to go to prison.
4. As they are generic there can be a lot of code bloat.
Where they are useful is for rapid prototyping; but in production code proof of correctness is so important unless there is a well defined test harness and static analysis results for all the code.
And anyway, where the fun in that 😉
1/2- Yes it's the best way to understand how a µc works, especially if you're a beginner (that was part of my first job ...)
3 - Of course it's your responsability but the problem is the same with your own code.
If proof of correctness is so important for you, then you will have some tests to check it. So wheter it's your own code or a code provided by your manufacturer, you will test it. But in one case you will have to develop the drivers and then find someone to test it.
In the second case the driver is alredy develop and you will add just more test to the one done by the manufacturer.
So developping your own driver is a "good" way to add bugs. Of course if the driver is full of crap, it's best to redevelop it 🙂
I will had that using standard library helps to make the code more readable and universal.
Very detailed and In Depth explanation I found on internet about this topic. Really interesting to read this entire article. Helped me to clarify my long term confusion of using struct in modern embedded C while defining ports.
Also it's notable that the author points out Necessary Examples and resources, so that each concept will be clear. He explained topic starting in a very basics and ending to advanced level . Thanks a lot..
While both of your
uint32_t *
andGPIO_t *
techniques work (whether by#define
or explicit declaration), they're missing a key point.I use a third model, with
struct
s (similar to your
GPIO_t
). Instead of "volatile GPIO_t* const portd = (GPIO_t*)0x40020C00;
", I use "extern volatile GPIO_t portd;
""So what?" you say. "You still need to define it somewhere!" And I do - using the linker, which is the repository of all things with explicit addresses. All you need is to add the following line to your
.ld
file outside of any section: "portd = 0x40020C00;
"From there, in your code you just write to
portd.ODR
as normal. The advantage is, you can pre-compile it into a library, since it's the link step that moves it to the correct address. You can thus use the same code on a huge array of different processors, all with their peripherals at different addresses, without recompiling the source.And since the linker has the
INCLUDE
statement, you can provide a wide variety of*.ld.inc
include files for each of the different processors, and simply choose the correct processor for your project.Yes, that's also a very neat way of doing it and I like the pre-compile thinking!
The only downside is portability and people being knowledgable about their specific toolchain linker configuration file. For example, IAR and Keil have their own formats that differ significantly from, say, GCC.
In our experiences, I would say the majority of people working in teams, very few have any real appreciation of the format and layout of the LCF. For many, it's just hidden behind a build menu option!
But thanks for the comment, maybe worth writing up as a follow-on.
Niall
Although I like this approach, I would be interested how the bit-arrays deal with endianness?
Unfortunately, you are into implementation-defined behaviour, so it will depend on a number of factors. Especially as we have the two forms of big-endian!
Thanks for this clear and concise post.
Thank you Niall, for this educational article!
Thanks for this interesting article. One thing surprises me, though:
>With C99 and modern compilers this is not the case, they will generate identical code
>(C99 allows for the complier to optimise away const objects).
In my opinion this is not a feature of C99. Const means something different in C than in C++.
Simple sample:
#define PORTA ((volatile GPIO_t*) 0x40020000)
static_assert((uintptr_t)PORTA == 0x40020000U, "Test1");
// --> no error
volatile GPIO_t* const porta = (GPIO_t*)0x40020000;
static_assert((uintptr_t)porta == 0x40020000U, "Test2");
// --> error: expression in static assertion is not constant
The reason, in my opinion, is that modern processors are very effective with memory accesses and therefore the same code is generated.
Am I on the wrong track?
As always it's a bit of "yes and no". Strictly speaking, no it's not a "feature" of C99. Nevertheless, there were some wording changes in C99 that "implied" that if you didn't follow certain rules, then it was undefined behaviour. This allows compilers to be more aggressive regarding optimisations of const object definitions.
It's definitely a grey area.
In C23 we will get constexpr (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3018.htm) which, like Modern C++, is much clearer around this (although no support for consteval or constinit).
This article provides a comprehensive overview of working with peripherals in embedded systems using C. The explanation of memory-mapped IO and port-mapped IO, as well as the use of structs for better code organization, is particularly helpful. It's a valuable resource for embedded system developers. Great job!
Thank you for you kind words.
We put a lot of time and effort into the blogs and feedback is very appreciated.
Hi Niall,
regarding mitigating the risk and using a unit test. Normally I would think unit tests are compiled and run on your development platform (in contrast to being run on target).
That's why I would say checking the size could result in different outcomes if your platforms are different memory wise.
Static assert of C11 looks great though, as it should be checked by the compiler for the actual target.
That's at least my understanding.
Cheers
Andreas
Hi Andreas,
You certainly develop unit tests on the host platform, but there is an argument for verifying the test results in the target environment (ideally using an emulator such as QEMU). This is to help catch potential toolchain issues between host and target as you've correctly eluded too.
If we're using the correct types, such as from stdint, then we shouldn't see many issues (a major reason to use portable types). But of course you host might be GCC or MSVC, whereas your target compiler might be Keil/IAR etc.
We had a host test fail recently, where someone needed to save an address to Flash in the 32-bit target. The target code was casting it to a uint32_t from a pointer, but on the host this test failed (as the pointer was 64-bits). But changing it to the type 'intptr_t' solved the issue, and identified code written for a 32-bit target.
Thanks for the comment, and check out C23 as there is some really interesting new features coming.