Working with Strings in Embedded C++

In this post, by Embedded I’m generally referring to deeply embedded/bare-metal systems as opposed to Linux-based embedded systems.

Embedded systems and strings

Historically, the need for and thus the use of strings in embedded systems was fairly limited. However, this has changed with the advent of cheaper, full graphic displays and the growth of the ‘Internet of Things’ (IoT).

Many embedded systems sport full-colour graphics displays, supported by embedded-specific graphics libraries, including:

  • free open-source – e.g. LVGL
  • vendor-specific – e.g. TouchGFX from STMicroelectronics
  • fully specialised graphics environments – e.g. Qt for MCUs.

Naturally, these environments will use strings extensively for labels, message boxes, alerts, etc.

Many of the major IoT frameworks utilise web services built on top of HTTP, such as REST. In conjunction with the web services, embedded applications utilise data interchange formats such as XML XCAP or JSON. Both XML and JSON require character encoding based on ISO/IEC 10646 such as UTF-8.

Character literals

Modern C++ extends the character literal model to support ISO 10646 character encoding. C++11 (and C11) added UTF-16 and UTF-32 support, with C++20 finally adding UTF-8 support.

int main()
{
  char     c1{ 'a' };       // 'narrow' char
  char8_t  c2{ u8'a' };     // UTF-8  - (C++20)
  char16_t c3{ u'貓' };     // UTF-16 - (C11/C++11)
  char32_t c4{ U'????' };     // UTF-32 - (C11/C++11)
  wchar_t  c5{ L'β' };      // wide char - wchar_t
}

Example Code

C Strings

Null-Terminated Byte Strings (NTBS)

A ‘C-Style’ string is any null-terminated byte string (NTBS), where this is a sequence of nonzero bytes followed by a byte with zero (0) value (the terminating null character). The terminating null character is represented as the character literal '\0';

The length of an NTBS is the number of elements that precede the terminating null character. An empty NTBS has a length of zero.

A string literal (constant) is a sequence of characters surrounded by double quotes (" ").

#include <cstring>
#include <iostream>

int main(void)
{
  char message[] = "Hello World";     

  std::cout << sizeof(message) << '\n';  // 12
  std::cout << strlen(message) << '\n';  // 11
}

Example Code

In C/C++, single quotes (') are used to identify character literals. Single quotes (' ') cannot be used (unlike some other programming languages) to represent strings.

C-Strings and string literals

What is the difference in the memory model between the following two program object definitions?

#include <iostream>

int main()
{
   char message[] = "this is a string";
   std::cout << sizeof(message) << '\n';

   const char *msg_ptr = "this is a string";
   std::cout << sizeof(msg_ptr) << '\n';
}

Example Code

The first output message will display 17, the number of characters in the string (including the null character). The second output will display the sizeof a pointer (e.g. 4 on 32-bit Armv7-M).

Although the code above looks ostensibly identical, there are significant semantic differences:

  • For message, the memory for the array is allocated on the stack at runtime. The compiler initialises it from the string literal. At runtime, the program memory copies the string literal into the array (depending on compiler optimisations and ISA).
  • For msg_ptr, only the address of the string literal is held on the stack, and there is no copying of string literal.

String literals are stored in your program image, usually in a read-only section (.rodata), normally mapped to NVM such as Flash.

text

As the memory for message is Stack-based, at runtime, it is allowable to manipulate the contents of message[] using indexing, e.g.
message[0] = 'T';

Unlike C, C++ does not allow the dangerous code of having a non-const pointer pointing at constant memory, e.g.:
char *msg_ptr = "this is a string";
as this would allow the statement:
msg_ptr[0] = 'T';
which on most systems will cause a program failure by trying to write to readonly memory.

Finding strings

Most modern C/C++ toolchains (e.g. GNU Arm Embedded Toolchain) supply a collection of useful utilities, usually referred to as Binutils.

The strings utility lists printable strings from a file. Running strings on an application image will list all embedded strings. This is a common tool for bad actors to look for embedded clear-text passwords, etc., in firmware (e.g. Intro To Hardware Hacking – Dumping Your First Firmware).

$ arm-none-eabi-strings -d Application.elf | grep "this is"
this is a string

Strings can also be run against an individual object file, e.g.

$ arm-none-eabi-strings -d -n 12 main.o
this is a string

It can be enlightening to run strings against your own application image.

Another utility, objdump, can help us understand the actual memory image. Running objdump against the object file can identify whether the code uses read-only data, e.g.

$ arm-none-eabi-objdump -h main.o | grep rodata
 77 .rodata.main.str1.4 00000011  00000000  00000000  000003d0  2**2

Without getting into the details of objdump, this output tells us that the main object file has readonly data of size 0x11 (17 bytes). The symbol name is compiler generated.

At the link stage, we generate a .map file (you should be if you’re not already, it contains a wealth of target information). Search the map file for the symbol .rodata.main.str1.4 from objdump; we can see it is located at the address 0x08023b70. On our Cortex-M4 target, this is on-chip Flash.

.rodata.main.str1.4
                0x0000000008023b70       0x11 main.o

C String Standard Library

Standard C supplies a library to help with common string manipulation issues (<string.h> in C or <cstring> in C++). It has several ‘helper’ functions to manipulate NTBS, e.g.

copying strings         strcpy, strncpy
concatenating strings   strcat, strncat
comparing strings       strcmp, strncmp
parsing strings         strtok, strcspn
length                  strlen

Note that all these functions rely on the supplied pointer pointing at a well-formed NTBS. The behaviour is undefined if it is not a pointer to a null-terminated byte string.

Safety and Security Issues

Unfortunately, using strings in embedded systems potentially introduces several safety and security weaknesses. SEI CERT C has some rules regarding the misuse of strings, and the Common Weakness Enumeration website covers many potential types of string-related weaknesses. Many of these arise from aspects such as incorrect string termination, memory buffer errors and incorrect data validation (string parsing).

C++ Strings

C++11 Raw String Literals

C++11 introduced a Raw string literal type. In a raw string, literal escape sequences are not processed; and the resulting output is exactly the same as appears in the source code. Raw string literals are useful for storing file paths or regular expressions, which use characters that C++ may interpret as formatting information. They can be represented in UTF format as needed.

For example, given the following code:

#include <string>
#include <iostream>

const char* raw_str  {   // could use constexpr auto raw_str   
R"(<!DOCTYPE html>
<html>
<body>
    <p>hello, world!</p>
</body>
</html>
)" 
};

int main()
{
  std::cout << raw_str;
}

The standard output is:

<!DOCTYPE html>
<html>
<body>
    <p>hello, world!</p>
</body>
</html>

The raw program image will contain (depending on ISA) the full string, e.g. Intel

.string "<!DOCTYPE html>\n<html>\n<body>\n <p>hello, world!</p>\n</body>\n</html>\n"

Arm:

.LC0:
.ascii "<!DOCTYPE html>\012<html>\012<body>\012 <p>hello"
.ascii ", world!</p>\012</body>\012</html>\012\000"

Example Code

C++ Strings Library

The C++ Standard Library supports the header <string>. The basic std::string can be thought of as a variable-sized character array (a vector of characters). std::string supports operator overloads for the most common string behaviour – converting from string literals, concatenation, etc. making string parsing more straightforward than using NTBSs. std::string is also supported by <iostream>(e.g. std::cout / std::cin), giving one significant benefit over C, in that an input string will grow to the size of the input stream (i.e. no pre-allocation of a local array of chars).

#include <iostream>
#include <string>

const std::string salutation { "Hello: " };         // initialise

int main()
{
  std::cout << "Enter your name: ";

  std::string name{};
  std::cin >> name;                                 // reads to whitespace
  // std::getline(std::cin, name);                  // reads to '\n'

  std::string greeting =  salutation + name + '\n'; // concatenation

  std::cout << greeting;

  if(not salutation.empty()) {
    std::cout << salutation.length() << '\n';
    std::cout << salutation[0] << ' ' << salutation.front() << '\n';
    std::cout << salutation[salutation.length()-1] << ' ' << salutation.back() << '\n';
  }
}

Numeric to std::to_string

C++11 introduced the overloaded function std::to_string. This, unsurprisingly, converts a numeric value to std::string. E.g.

#include <iostream>
#include <string>

int main() 
{
    int i = 42;
    unsigned long ul = 42UL;
    auto i_str = std::to_string(i);
    auto ul_str = std::to_string(ul);
    std::cout << "std::cout: " << i << '\n'
              << "to_string: " << i_str  << "\n"
              << "std::cout: " << ul << '\n'
              << "to_string: " << ul_str << "\n";
    printf("%i\n%s\n%li\n%s\n", i, i_str.c_str(), ul, ul_str.c_str());

    double f = 23.43;
    double f2 = 1e-9;
    auto f_str = std::to_string(f);
    auto f_str2 = std::to_string(f2); // Note: returns "0.000000"

    std::cout << "std::cout: " << f << '\n'
              << "to_string: " << f_str  << "\n"
              << "std::cout: " << f2 << '\n'
              << "to_string: " << f_str2 << "\n";
    printf("%g\n%s\n%g\n%s\n", f, f_str.c_str(), f2, f_str2.c_str());
}

Example Code

Expected output

std::cout: 42
to_string: 42
std::cout: 42
to_string: 42
42
42
42
42
std::cout: 23.43
to_string: 23.430000
std::cout: 1e-09
to_string: 0.000000
23.43
23.430000
1e-09
0.000000

std::string and NTBS

std::string cannot be used directly where a const char* is required. Many embedded libraries favour C-based APIs, requiring support for converting a std::string to const char*.
std::string has a member function .c_str(), which returns a pointer to the underlying C-style string (const char*). In addition, std::string, as with most container types, can get at the underlying raw data via a member function, .data(), or the standard library function std::data().

#include <iostream>
#include <string>
#include <cstdio>

int main()
{
  const char *str = "Hello World!";

  std::string s { str };

  std::cout << "Using cout: " << s << '\n';

  std::printf("C-style string: %s \n", str);
  std::printf("std::string as C-style string: %s \n", s.c_str() );
  std::printf("std::string as raw data: %s \n", s.data() );
}

For both models, the pointer returned is such that the range [data(); data() + size()] is valid and the values in it correspond to the values stored in the string. Therefore data() + i == std::addressof(operator[](i)) for every i in [0, size()] is guaranteed.

Note, there was a minor change to .data() in C++17. Prior to C++17, .data(), like .c_str() returned const char*. Since C++17, .data() now returns char*.

When writing library code is recommended never to let C++ strings propagate outside your component. Different compilers have different models for strings; for portability prefer const char* parameters.

typedefs for string types

As the character literal model was extended to support ISO 10646 character encoding, the standard library has string support for each character type, i.e.

Type                    Definition
std::string             std::basic_string<char>
std::wstring            std::basic_string<wchar_t>
std::u8string (C++20)   std::basic_string<char8_t>
std::u16string (C++11)  std::basic_string<char16_t>
std::u32string (C++11)  std::basic_string<char32_t>

These are all based on a common underlying template class of:

template<class CharT, 
         class Traits = std::char_traits<CharT>, 
         class Allocator = std::allocator<CharT>
> 
class basic_string;

String memory management

Strings are, by default, heap-allocated

By default, std::string uses std::allocate, which in turn, uses ::new/::delete to allocate dynamic memory for storing the actual NTBS.

#include <string>

int main() {
    const char* s1 = "literal string";  // characters stored in .rodata
    std::string s2 = "Literal String";  // characters stored in .heap 
}

text

std::strings have three main parts:

  • data -> stored on the heap; accessed via .data()
  • length -> based on strlen; accessed via .length() or .size()
  • capacity -> allocated size on the heap ( >= length); accessed via .capacity()

Capacity can be extended using .reserve() or reduced using .shrink_to_fit().

Note, looking at the definition of basic_string:

template<class CharT, ..., class Allocator = std::allocator<CharT> > class basic_string;

you may notice you can replace the use of ::new/::delete with a different allocator (more on this later).

Copying Strings

A string’s handle lifetime defines the lifetime of the underlying string, e.g.

#include <string>
#include <iostream>

std::string s1 { "initial contents" };

int main() {
    std::cout << s1 << '\n';
    {
        auto s2{ s1 };        // copy-construction
        s2[0] ='I';           // unchecked assignment
        std::cout << s2 << '\n';
    }                         // s2 heap memory deleted
    std::cout << s1 << '\n';
    {
        std::string s3{};    // empty string
        s3 = s1;             // copy-assignment
        s3.at(0) = 'I';      // bound-safe assignment
        std::cout << s3 << '\n';
    }                        // s3 heap memory deleted
    std::cout << s1 << '\n';
}

Expected Output

initial contents
Initial contents
initial contents
Initial contents
initial contents

For s1, the lifetime is static, but the memory for the NTBS is still dynamically allocated (i.e. ::new is called before main).
For s2 and s3, ::new is called when the copy takes place, and ::delete is called when reaching the end of enclosing blocks.

Example Code

Deep copy of std::string

std::string implements “deep-copy” semantics; when the copies are created, new memory is allocated, and the contents are copied (e.g. strcpy)

int main() {
    std::string s1 = "literal string";
    auto s2 { s1 };  // ::new called and characters copied
}

text

Moving std::string

The addition of Move-semantics for Modern C++ was driven by reducing deep-copying of complex objects, such as strings.

int main() {
    std::string s1 = "literal string";
    auto s2 = std::move(s1);   // only the address of the .heap memory is copied
}

When s1 is initially constructed, the memory model will look somewhat like this:

text

As before, the memory for the string handle, s2, is allocated on the stack. When a string is moved, rather than copied, the pointer to the heap-based NTBS is copied to the new object (s2), rather than the whole NTBS part being copied, e.g.

text

The length of the moved-from string (s1) becomes zero (0). Note, however, current implementations don’t necessarily reset the moved-from pointer to the nullptr (0) as would be safe. The standard specifies the moved-from string “… is left in a valid state with an unspecified value.”.

Short String Optimisation (SSO)

Many modern compilers (e.g. GCC and clang) support an implementation-specific optimisation generally referred to as Short-String Optimisation (SSO). As we’ve seen, there is the handle part for every string. This stores the three essential data items:

  • data
  • size
  • capacity

For small strings, this typically proves an overhead will typically outweigh, both from a performance and memory perspective, the benefits of using std::string.

So to significantly improve performance, when the literal string has fewer characters than an implementation-defined threshold, the runtime implementation stores the literal string within the stack space allocated for the handle, rather than allocation memory from the heap, e.g.

text
Example Code

In the example shown, we can see from the output dynamic memory has only occurred for the larger of the two strings.

SSO – GCC

Different compilers have their own specific implementation of SSO. For example, compiled for GCC/Linux on an x86-64 architecture, the stack-based part of the string is 32-bytes (where the stack is 8-bytes wide). When using SSO, the upper 16-bytes are used to store the NTBS (thus placing a threshold of 16-bytes, including the terminating null character, for SSO).

text

Example Code implementation based on GCC v10.2

On a 32-bit platform, where the stack is 4-bytes wide, GCC uses a 24-bytes structure to implement SSO (sizeof(std::string) == 24). Again, the upper 16-bytes can store characters when utilising SSO in this model.

text

Based on: GCC version 9.2.1 (GNU Tools for Arm Embedded Processors 9-2019-q4-major).

Clang (on a 64-bit platform) uses a different implementation for SSO. When SSO is employed, the first byte stores the calculated string size. The last double-word always stores the value 33. This allows up to 22 characters to be held on the stack before the heap usage is required.

String APIs

An NTBS can be passed as an argument for the function parameter of type const std::string& (const lvalue reference). However, as part of the call, a temporary std::string object is created to bind to the parameter, and ::new is called to generate the rvalue expression std::string{s1}.

#include <string>  

void c_str(const char* str) {       // pass-by-pointer-to-const
  ...
}

void cpp_str(const std::string& str) {  // pass-by-const-lvalue-ref
  ...
}  

int main() {
  const char* s1 { "godbolt compiler explorer" };   // NTBS
  std::string s2 { "godbolt compiler explorer" };   // Heap-based C++ String
  c_str(s1);
  c_str(s2);                // FAILs to compile
  c_str(s2.c_str());

  cpp_str(s1);              // cpp_str(std::string{s1});
  cpp_str(s2);
}

As we have both NTBS and C++ Strings, writing portable code to handle both can be challenging. For example, a C++ string cannot be passed to const char * (the .c_str() member function must be used).

Example Code

C++17 std::string_view

C++17 introduced significant library support for string management in the form of std::string_view. std::string_view describes an object that can refer to a constant contiguous sequence of char-like objects, with the first element of the sequence at position zero.

This means std::string_view can handle both NTBS and std::string, e.g.

#include <string>
#include <string_view>

void cpp_sv(std::string_view str) {
  ...
}  

int main()
{
  const char* s1 { "godbolt compiler explorer" }; // NTBS
  std::string s2 { "godbolt compiler explorer" }; // Heap-based C++ String

  cpp_sv(s1);
  cpp_sv(s2);
}

A typical implementation holds only two members: a pointer to constant character type (CharT) and a size.

The header <string_view> defines several typedefs for common character types, e.g.

Type Definition

std::string_view            std::basic_string_view<char>
std::wstring_view           std::basic_string_view<wchar_t>
std::u8string_view (C++20)  std::basic_string_view<char8_t>
std::u16string_view         std::basic_string_view<char16_t>
std::u32string_view         std::basic_string_view<char32_t>

Constant pointer + size

The common implementation of std::string_view is a class that holds a pointer-constant to the character array and its size, e.g.

template<class _CharT, class _Traits = char_traits<_CharT> >
class basic_string_view {
public:
    typedef _CharT value_type;
    typedef size_t size_type;
    ...
private:
    const value_type* __data;
    size_type __size;
};

Note that __data is const*.

So the effective calling code is:

int main()
{
  const char* s1 { "godbolt compiler explorer" };
  std::string s2 { "godbolt compiler explorer" };

  cpp_sv(s1); // string_view(s1, strlen(s1))
  cpp_sv(s2); // string_view(s2.data(), s2.length)
}

Example Code

text

std::string_view as a NTBS replacement

With the introduction of std::string_view in C++17, we now have a safer, lightweight alternative to using C-style NTBS e.g.

#include <iostream>
#include <string_view>

int main()
{
    const char*      s1 { "compiler explorer" };   // NTBS in .rodata
    std::string_view s2 { "Compiler Explorer" };   // also stored in .rodata

    std::cout << sizeof(s1) << '\n';               // 4/8 - 32/64bit
    std::cout << sizeof(s2) << '\n';               // 8/16

    std::cout << s2.length() << '\n';           // could also use .size()
    if(s2.empty()) std::cout << "empty string\n";
    if(s2.compare(s1) == 0) std::cout << "strings are equal\n";

    for(auto c : s2) {
        std::cout << c << '\n';
    }

    std::string_view s3 { "ALL CAPITALS" };    
    std::swap(s2, s3);                          // also s2.swap(s3)
}

Example Code

The sizeof(std::string_view) is sizeof(void*)+sizeof(std::size_t) where std::size_t is typically typedef‘ed to unsigned long int.

It is helpful that we can quickly wrap NTBSs in string_view objects to allow processing on them using standard library elements, e.g.
if (argc > 1) std::string_view run_count_str{ argv[1] };

Common APIs with string-like syntax

The primary benefit of using std::string_view is many string parsing functions are available through a common interface, e.g.

#include <iostream>
#include <string>
#include <string_view>

void cpp_sv(std::string_view str) {
   std::cout << str.length() << ' ';
   std::cout << reinterpret_cast<const void*>(str.data()) << '\n';
   if(str == "godbolt compiler explorer"){
      std::cout << "strings equal\n";
   }
   if(not str.empty()){
    constexpr std::string_view compiler{"compiler"};
    std::cout << str.substr(str.find(compiler),compiler.length()) << '\n';
   }
}  

int main() {
  const char* s1 { "godbolt compiler explorer" };   // NTBS
  std::string s2 { "godbolt compiler explorer" };   // C++ Heap string
  cpp_sv(s1);   // string_view(s1, strlen(s1))
  cpp_sv(s2);   // string_view(s2.data(), s2.length())
}

Example Code

Expected Output

[allocating 26 bytes]
25 0x40201a
strings equal
compiler
25 0x1cd3ec0
strings equal
compiler

Most std::string_view functions work identically to std::string but without the potential of generating temporary (rvalue expression) strings (e.g. == ).

auto type deduction

When using auto type deduction, a quoted string defaults to const char*. The standard library supplies literals for both std::string and std::string_view. Appending 's' to a quoted-string converts a character array literal to basic_string, whereas appending 'sv' to a quoted-string creates a string view of a character array literal.

As well as auto type deduction, this also applies to template deduction, e.g.

#include <string>
#include <string_view>
#include <typeinfo>
#include <cassert>

using namespace std::string_literals; // operator""s
using namespace std::literals;        // operator""sv

int main() {
    const char*      ntbs_1 { "godbolt compiler explorer" };    // NTBS
    std::string      str_1  { "godbolt compiler explorer" };    // C++ Heap string
    std::string_view sv_1   { "godbolt compiler explorer" };    // std::string_view

    auto ntbs_2 { "godbolt compiler explorer"   };              // const char *
    auto str_2  { "godbolt compiler explorer"s  };              // std::string
    auto sv_2   { "godbolt compiler explorer"sv };              // std::string_view

    assert(typeid(ntbs_1) == typeid(ntbs_2));
    assert(typeid(str_1)  == typeid(str_2));
    assert(typeid(sv_1)   == typeid(sv_2));
}

Example Code

auto vs decltype

One area of modern C++ that may still trip you up is the subtle difference between auto and decltype for the NTBS strings. As previously mentioned, an NTBS string is automatically deduced as const char*, whereas when using decltype to inspect an entity or expression, an NTBS yields a character array based on strlen+1, e.g.

#include <typeinfo>
#include <iostream>

int main()
{
    const char*         str { "hello" };
    auto                s1 { "Hello" };                    
    decltype( "Hello" ) s2 {};                
    auto                s3 { str };                        
    decltype( str )     s4 {};                    

    std::cout << typeid(str).name() << '\n';       // const char*
    std::cout << typeid(s1).name() << '\n';        // const char*
    std::cout << typeid(s2).name() << '\n';        // char[6]
    std::cout << typeid(s3).name() << '\n';        // const char*
    std::cout << typeid(s4).name() << '\n';        // const char*
    std::cout << typeid("hello").name() << '\n';   // char[6]
}

std::string_view Caveats

So we’ve seen that std::string_view is an ideal candidate for possibly refactoring legacy code (where appropriate) by replacing both const char* and const std::string& parameters with std::string_view. However, as with most things in life, there are always a couple of gotchas.

There are two significant uses where std:string_view can fail:

  • lifetime management
  • non-null-terminated strings

string lifetime management

First, it is the programmer’s responsibility to ensure that std::string_view does not outlive the pointed-to character array. To get this wrong does take some effort, and in well-crafted code should never happen, but it is still worth being aware of, e.g.

#include <iostream>
#include <string>
#include <string_view>

using namespace std::string_literals; // operator""s
using namespace std::literals;        // operator""sv

int main()
{   
  std::string_view ntbs{ "a string literal" };      // OK: points to a static array   
  std::string_view heap_string{ "a temporary string"s }; // rvalue string using new
  // rvalue string memory deallocated...
  std::cout << "Address of heap_string: " << (void*)heap_string.data() << '\n'; 
  std::cout << "Data at heap_string: " << heap_string.data() << '\n';
}

Example Code

In this example, the string used to construct heap_string is an rvalue expression; this will call ::new during construction. However, rvalue objects lifetime only exists for the duration of the statement. Before we execute the following statement, ::delete will be called as the rvalues object lifetime has ended.

Accessing an object after its lifetime has ended is undefined behaviour. In a host system (e.g. Linux), this error will be caught using one of the modern compiler sanitisers (e.g. ASan ). In a target system, it is unlike to be caught at runtime and could lead to difficult to find bugs. Example with ASan enabled.

non null-terminated strings

Secondly, unlike string::data() and string literals, string_view::data() may return a pointer to a buffer that is not null-terminated (e.g. a substring). Therefore it is typically a mistake to pass data() to a function that takes just a const charT* and expects a null-terminated string. std::string_view is not guaranteed to be pointing at an NTBS, e.g.

#include <iostream>
#include <string>
#include <string_view>
#include <cstdio>

using namespace std::string_literals; // operator""s
using namespace std::literals;        // operator""sv

void sv_print(std::string_view str) {
  std::cout << str.length() << ' '<< reinterpret_cast<const void*>(str.data()) << '\n';
  std::cout << "cout: " << str << '\n';     // based on str.length()
  printf("stdout: %s\n",str.data());        // based on NUL
}

int main() {
    std::string      str_s  {"godbolt compiler explorer"}; 
    std::string_view str_sv {"godbolt compiler explorer"}; 
    char char_arr2[] = {
        'a',' ','c','h','a','r',' ','a','r','r','a','y'
        }; // Not null character terminated
   sv_print(str_s.substr(8,8));
   sv_print(str_sv.substr(8,8));
   sv_print(char_arr2);
}

Example Code

Example Output

[allocating at 0x1454eb0 size: 26 bytes]
8 0x7fffc14078d0
cout: compiler
stdout: compiler
8 0x402052
cout: compiler
stdout: compiler explorer
16 0x7fffc14078e4
cout: a char array�NE
stdout: a char array�NE
[deallocating at 0x1454eb0

Again, your mileage may vary depending on running this on a host versus a target system. The address sanitizer, ASan, will typically detect and report on this error.

Polymorphic Memory Resource (PMR) Strings

I want to finish by touching on an area that probably deserves its own article.

C++17 introduced library support for Memory Resources. Memory resources implement memory allocation strategies that can be used by std::pmr::polymorphic_allocator, covered in detail in a previous blog post.

Simply, PMR allows you to specify a chunk of allocated memory (typically a simple stack-based array) as the memory to be used instead of the heap. C++17 support PMR based strings, e.g.

#include <iomanip>
#include <iostream>
#include <new>
#include <string>

#include <memory_resource>    // C++17 header

// print if ::new called
void* operator new(size_t sz) { ... }
// dump array contents in hex 
auto print_mem = [](auto& b) { ... };
// print capacity size string
auto print_string = [](auto& s) { ... };

int main() {
    std::array<uint8_t, 32> buff{};   // stack based array

    // create PMR buffer
    std::pmr::monotonic_buffer_resource buffer_mem_res(
        buff.data(), buff.size(), std::pmr::null_memory_resource());       
    print_mem(buff);

    std::pmr::string str("hello", &buffer_mem_res);  // SSO still used
    print_string(str);
    print_mem(buff);
    str = "012345678901234567890";   // replace ::new with pmr::
    print_string(str);
    print_mem(buff);
    str = "01234567890123456789012";
    print_string(str);
    print_mem(buff);
}

Example code

Expected Output

 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 5 hello
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 21 012345678901234567890
303132333435363738393031323334353637383930 0 0 0 0 0 0 0 0 0 0 0
30 23 01234567890123456789012
3031323334353637383930313233343536373839303132 0 0 0 0 0 0 0 0 0

The use of std::pmr::null_memory_resource() means that if the buffer request exceeds the buffer size, the program will terminate (either through raising the std::bad_alloc exception or calling terminate). You can also cascade PMR buffers (i.e. we could have a bigger array to use if the 32-byte limit is exceeded) or revert to using the standard heap.

Unfortunately, as of today, there is still minimal support for PMR across compiler toolchains.

Summary

With the growth of IoT and embedded graphics, the need to use strings has become more commonplace in modern, deeply embedded systems. C-style string management (NTBS) with the associated library is challenging compared to modern programming languages and opens up the potential introduction of program flaws.

The C++ standard library has always supported a more flexible and rich programming class for strings. Unfortunately, std::string is not appropriate for many deeply embedded systems dues to its requirement for dynamic memory management.

C++17’s introduction of std::string_view allows us to replace many uses of NTBS parsing with the more user-friendly string_view parsing functions.

SSO means, with care, that we can be safely use string management for small strings (e.g. fewer than 16 characters for GCC) in an embedded environment. However, implementing strategies to not spill onto the heap may outweigh the benefits.

std::pmr::string is a real potential to allow complete, modern string management to be used in embedded C++, bringing with it potential safety and security benefits over using NTBS. Unfortunately, the lack of PMR support in current compiler technology (both host and freestanding) is frustrating and is not currently a realistic option.

Hopefully, we will see better support for string management in the future, but at the moment, unless you have support for C++17 stick to using NTBS. If you do have support for C++17, then it is probably best to still avoid std::string but look to utilise std::string_view where appropriate.

For dynamically constructed arrays, then until better support for PMR, std::array and snprintf are probably still the best options. However, the C++20 facility std::format_to_n looks to be a modern substitution to snprintf.

Niall Cooling
Dislike (0)
Website | + posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.

About Niall Cooling

Co-Founder and Director of Feabhas since 1995. Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking. His current interest lie in IoT Security and Agile for Embedded Systems.
This entry was posted in C/C++ Programming and tagged , , , , , , . Bookmark the permalink.

9 Responses to Working with Strings in Embedded C++

  1. kobica says:

    Great post!

    Like (0)
    Dislike (0)
  2. Thanks

    Like (0)
    Dislike (0)
  3. Sergey says:

    The article is great, thank you.
    But I still think it's not worth dragging the standard library from C++ to embedded systems, you either need to create your own class to work with strings for a specific project, or use strings from C.
    But I'm not experienced enough yet and I could be wrong????.

    Like (1)
    Dislike (0)
  4. Hi Sergey,
    I think if you completely omit the C++ Std Lib, then potentially there are many useful areas you may be missing out on (e.g. string_view). A good linker will only pull in the object code for the parts you use, not the full library. But as of today, I would avoid std::string in deeply embedded.
    Regards,
    Niall.

    Like (1)
    Dislike (0)
  5. Sergey says:

    Thanks Nialll, I also think that std::string is unnecessary in embedded systems. I am waiting for the release of new articles from you.
    Regards.

    Like (0)
    Dislike (0)
  6. Shawn says:

    Bare metal with c++ you have to deal with exceptions or no? I use c instead of c++ for deeply embedded systems because I think exception is not a good fit.

    Like (1)
    Dislike (0)
  7. Exceptions are not necessarily a good fit (though Ada has always supported them). All modern embedded C++ compilers allow you to work with exceptions turned off (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_exceptions.html). Exceptions are mapped to abort() (similar to Rust's approach to error handling with std::panic).

    Like (0)
    Dislike (0)
  8. Reto Felix says:

    This article is great. Thanks
    At the beginning ("C-Strings and string literals") You explain two different memory model.

    I often use a variant.

    static const char message2[] = "this is a string";
    std::cout << sizeof(message) << '\n';

    No pointer on stack no copy to stack. Only an object in .rodata
    For me the preferred C definition,

    Like (0)
    Dislike (0)
  9. Thanks for the comment.
    I know this is a subjective area, but I think you've got to be careful with code such as that.
    You're perfectly correct with what you've said, but actually (architecture-dependent) you won't actually save anything for a code perspective. You're pre-guessing the compiler and creating non-idiomatic code (which can add to maintenance issues). The `sizeof` is purely a compile-time evaluation. Getting sizeof(void*) from a local pointer is correct but misleading (the pointer will be optimised away).
    A modern RISC based architecture such as Arm will not generate any stack usage for the code. When the memory is used, it still has to be read into a register.
    See https://godbolt.org/z/dbWs9TdT9
    I agree your version will limit any opportunity of a compile generating a local, but it's highly unlikely with modern compilers.
    That's why I like C++11 addition of constexpr.

    Like (0)
    Dislike (0)

Leave a Reply