Embedded Expertise: Beyond Fixed-Size Integers; Exploring Fast and Least Types

The Challenge of Fixed-Size Integers Before C99

In embedded programming, before adopting the C99 standard (ISO/IEC 9899:1999), a significant challenge was ensuring the consistent sizing of key data objects. This complexity stemmed from the C standard’s (ISO/IEC 9899) non-committal stance on the size of an int. We knew:

  • A short is a minimum of 16-bits.
  • A long is a minimum of 32-bits.
  • An int is somewhere between a short and a long.

This flexibility boosted C’s portability, making it a favourite for various architectures, including some with non-standard bit sizes like Unisys’s 36-bit and 48-bit ints. However, from a portability perspective, we could rely on limited ranges (signed int: -32767…32767, unsigned int: 0…65535).

Coding standards before C99 (like MISRA-C1 and MISRA-C2) navigated this by eschewing basic integral types (char, short, int, and long) in favour of specific-length typedefs, e.g.,

typedef i8 signed char;
typedef u8 unsigned char;
// etc.

C99 and <stdint.h>: A Turning Point

C99 introduced fixed-width integer types, offering a standardised approach across systems:

Integer Width Signed Type Unsigned Type
8 bits int8_t uint8_t
16 bits int16_t uint16_t
32 bits int32_t uint32_t
64 bits int64_t uint64_t

Modern coding standards like MISRA-C:2012 now require these types from <stdint.h> to be used instead of traditional integral types.

Tuning for Size and Performance: A Simple Case Study

Consider the function to square a number (ignoring macro definitions):

unsigned square(unsigned num) {
  return num * num;
}

This implementation poses portability challenges across different architectures. For instance:

  • On 8- or 16-bit systems, sizeof(unsigned) is typically ‘2’, limiting the parameter range to 0…32767.
  • On 32- or 64-bit systems, sizeof(unsigned) is likely ‘4’, extending the range to 0…2147483647.

One could argue that for maximum portability, we should use:

unsigned long long square(unsigned long long num) {
  return num * num;
}

Giving a much greater range of parameter values before considering number wrapping.

However, for a 16-bit TI MSP430, the assembly generated (msp430-gcc -O3) for unsigned long long is considerably more complex than for unsigned int:

Assembly for unsigned int:

square:
  MOV.W  R12, R13
  CALL  #__mspabi_mpyi
  RET

Assembly for unsigned long long:

square:
  PUSHM.W #3, R10
  MOV.W  R12, R8
  MOV.W  R13, R9
  MOV.W  R14, R10
  MOV.W  R15, R11
  CALL  #__mspabi_mpyll
  POPM.W #3, R10
  RET

The unsigned int version results in reduced code size and better performance. This leads to a preference for uint16_t if we know the parameter argument does not exceed UINT_MAX/2:

uint16_t square(uint16_t num) {
  return num * num;
}

The Surprising Efficiency of Larger Types

Optimising for specific ranges can be counterintuitive. For instance, using uint8_t for squaring numbers 0-100 seems optimal but can be less efficient on non-8-bit processors. The assembly code for MSP430 illustrates this:

Assembly for uint8_t:

square:
  MOV.B  R12, R13
  MOV.W  R13, R12
  CALL  #__mspabi_mpyi
  RET

Here, uint16_t would be more optimal, emphasising that the best integer type often aligns with sizeof(int).

Portability Challenges in Architecture Transitions

Moving from a 16-bit system to a 32-bit Cortex-M0 can unexpectedly increase code size. The optimised uint16_t code now generates less efficient assembly:

Assembly for uint16_t on Cortex-M0:

square:
  mul   r0, r0, r0
  uxth  r0, r0
  bx   lr

Whereas:
Assembly for uint32_t on Cortex-M0:

square:
  mul   r0, r0, r0
  bx   lr

This one extra opcode (uxth – Unsigned Extend Halfword) might seem insignificant. Still, over a whole codebase, it can waste a considerable amount of Flash memory and slow down processing speeds (remember, we are also working with slower clocks for power considerations).

And this leaves us with a bit of a headache. If we’d coded to unsigned, then the code would have ported efficiently, but as we work to modern coding standards, we are forced to use a fixed-size type from <stdint.h>.

The Role of Fast and Least Integers in <stdint.h>

There are five categories of integer types defined within <stdint.h>, but for this discussion, we are limiting ourselves to three:

  • integer types having specific exact widths, e.g. uint16_t / int16_t
  • integer types having at least specific specified widths, e.g. uint_least16_t / int_least16_t
  • fastest integer types having at least specific specified widths, e.g. uint_fast16_t / int_fast16_t

It has been over 20 years since the introduction of <stdint.h> and the aliased integer types. But in all that time I cannot recall working with any projects (C or C++) that use either the least or fast integer types (please let me know if you use them regularly).

Fixed, Least or Fast?

Returning to our headache, do we use conditionals to select between a 16-bit build (for current systems) and 32-bit builds for future systems, or do we end up with two separate codebases (neither is ideal)?

This is where the other integer types become helpful, specifically, uint_fastN_t.

Across the compilers we’ve had experience with, it appears that:

uintN_t == uint_leastN_t

We have not seen a case where the code generated for a least integer differs from using the fixed type across 8-, 16-, 32- and 64-bit architectures. Note that we’re focused on the deeply embedded space, so there may be cases where this is untrue.

First, if we recode to:

uint_fast8_t square(uint_fast8_t num) {
  return num * num;
}

On an 8-bit processor (e.g. AVR ATtiny13A), the generated code uses an unsigned char as the type of the parameter (AVR GCC -O3 -mmcu=attiny13a):

u8squaref:
  mov r22,r24
  rcall __mulqi3

For the 16-bit MSP430 code, we get the more efficient (16-bit integer parameter) code generated:

square:
  ; start of prologue
  ; end of prologue
  MOV.W  R12, R13
  CALL  #__mspabi_mpyi
  ; start of epilogue
  RET

Importantly, if we then compile this code for the Cortex-M0, we now also get 32-bit efficient code (ARM GCC 12.3.1 (none) -O3):

square:
  muls  r0, r0
  bx   lr

So, given a parameter range 0…100, using int_fast8_t generates the smallest, fastest code for each architecture.

C++ and Fixed-Size Integer Types

For some bizarre reason, Modern C++ has chosen to make compiler support for the fixed-size types (e.g. std::unit32_t) optional! But fast and least are mandatory (exploding head emoji) – that said, I do not know of any compiler that doesn’t support the fixed-size type alias.

Summary and Reflections

Embedded programming involves intricate decisions around integer types, balancing size, performance, and portability. Before C99, programmers relied on specific-length typedefs to ensure size consistency. C99’s <stdint.h> introduced fixed-width integer types, simplifying this process. However, choosing the right integer type – whether fixed, least, or fast – depends on the specific requirements of the target architecture and the application.

For instance, while uint16_t might be optimal for 16-bit systems, but transitioning to a 32-bit system like the Cortex-M0 could necessitate a shift to uint32_t for efficiency. Additionally, in scenarios where parameter values are within a specific range, the uint_fastN_t types offer an adaptable solution to generate efficient code across various architectures.

In conclusion, the journey from fixed-size integers to exploring fast and least types in C and C++ highlights the evolving landscape of embedded programming. Understanding these nuances is vital to writing optimised, portable, and maintainable code in this domain.

As a summary, based on GCC, this is the expected sizes of parameters based on the standard integer type system

MCU ATTiny13a MSP430 Cortex-M0
Size 8-bit 16-bit 32-bit
sizeof(char) 1 1 1
sizeof(short) 2 2 2
sizeof(int) 2 2 4
sizeof(long) 4 4 4
uint8_t unsigned char unsigned char unsigned char
uintfast8t unsigned char unsigned int unsigned int
uint16_t unsigned int unsigned short unsigned short
uintfast16t unsigned int unsigned int unsigned int
uint32_t unsigned long unsigned long unsigned int
uintfast32t unsigned long unsigned long unsigned int

 

Niall Cooling
Dislike (0)
Website |  + posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.

About Niall Cooling

Co-Founder and Director of Feabhas since 1995. Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking. His current interest lie in IoT Security and Agile for Embedded Systems.
This entry was posted in ARM, C/C++ Programming, Cortex, Toolchain and tagged , , . Bookmark the permalink.

Leave a Reply