Recently IAR have finally released full support for C++ (adding exceptions and RTTI) to their family of cross compilers. Initially the kickstart (free) version had not had exceptions and RTTI enabled, however with the release of version 6.10.2 this has now been rectified.
We currently use the IAR compilers on our training courses, targeting an NXP LPC2129 (ARM7TDMI) based systems. As part of verifying that the previous version’s (v5.41) projects still work with v6.10, I decided to investigate the potential overheads of full C++ in this environment (I’m pleased to say all projects worked under v6.10 without modification – phew).
Here are my preliminary findings and I’ll add to them as I investigate further:
First off, I created a project based on the C++ main project, giving the following code:
[Yes I know, return 0 isn’t necessary and it’s perfectly legal to have “void main()” in C++]
First off, the default language selection is still “Extended Embedded C++”, so I needed to change this to full C++. All build numbers are based on a Debug setting and I/O set to semi-hosting (I/O via debugger terminal window).
The build then gave (in bytes):
- code – 296
- const – 1
- data – 8704
The data size surprised me, but by checking the default settings in the Linker configuration file, the default stack size was set to 0x2000 (8192). By changing this to 0x400 (1024) the data requirement dropped to 1536 bytes. I haven’t, as yet, dug down to where the other 500+ bytes are coming from (for another day).
Next step was try the “mandatory” hello world program:
std::cout << “hello world\n”;
Building this (Stack back at 0x2000), gave the following:
- code – 17357
- const – 2571
- data – 42196
Whoa!!! What the… almost 17k of code and 42k of data for hello world; what’s going on here?
Again, the default Linker settings are skewing the values. This time the culprit was the default for the Heap (free store) set at 0x8000(32768). Stack and heap combine account for 40960 bytes. Again I checked this by setting the Heap to 0x2000 and sure enough the data requirement dropped to 17620 bytes (Stack+Heap = 16384).
From these values two things can be deduced:
- The heap memory is only allocated if fully linked program requires it
- The run-time library (RTL) I/O stream <iostream> makes use of the heap
To confirm point 1 above, I proceeded to build the following program:
volatile int* p = new int;
Sure enough I got the results I expected.
|0x2000||0x8000||1 083||17||41 512|
|0x2000||0x2000||1 083||17||16 936|
|0x400||0x400||1 083||17||2 600|
back to the hello world program. With Full C++ enabled and both stack and heap set at 0x2000, the baseline numbers are:
- code – 17357
- const – 2571
- data – 17620
Disabling exceptions, RTTI and static object destruction (pretty much extended embedded C++) gave:
- code – 7600
- const – 432
- data – 17589
So, the basic overhead for Full C++ (due to <iostream>) is around 10k of code space. Breaking this down further:
|y||y||y||17 357||2 571||17 620|
|n||n||n||7 600||432||17 589|
|y||n||n||16 604||2 471||17 604|
|n||y||n||7 600||432||17 589|
|n||n||y||7 600||432||17 589|
As expected, the bulk of the overhead comes from exception support (in a future post I plan to explain the C++ exception model on the ARM). You wouldn’t expect any overhead from just RTTI (as we have no classes) or static object destruction (as we have no static objects). Again, the overhead for these will be looked at in a future post.
Notice however, exceptions without RTTI & ~static differs from exceptions with RTTI and ~static. This is expected as, of course, the C++ exception model is class based (e.g. ios_base::failure) and cin, cout, cerr and clog are static objects.
Finally, I compared using cout against good old printf (stack & heap at 0x2000).
|cout||y||y||y||17 357||2 571||17 620|
|printf||y||y||y||8 846||70||8 712|
So using printf in preference to cout halves the code and data requirements. In addition IAR allow you to adjust the size of RTL by selecting different Printf formatters:
The options are; Full, Large, Small and Tiny. When using cout changing the formatter had no effect on code or data size. However, when only using printf the following values were observed:
|Full||y||y||y||8 846||70||8 712|
|Large||y||y||y||7 942||54||8 712|
|Small||y||y||y||2 949||47||8 712|
|Tiny||y||y||y||1 902||46||8 712|
So, as a first pass, there we have it. There are many unanswered questions regarding why the memory is being allocated, and further investigation of RTTI and ~static.
- Introduction to the ARM® Cortex®-M7 Cache – Part 1 Cache Basics - October 15, 2020
- TDD with Compiler Explorer - August 13, 2020
- Side effects and sequence points; why volatile matters - April 16, 2020