Dynamic polymorphism (virtual functions) is central to Object-Oriented Programming (OOP). Used well, it provides hooks into an existing codebase where new functionality and behaviour can (relatively) easily be integrated into a proven, tested codebase.
Subtype inheritance can bring significant benefits, including easier integration, reduced regression test time and improved maintenance.
However, using virtual functions in C++ brings a runtime performance overhead. This overhead may appear inconsequential for individual calls, but in a non-trivial real-time embedded application, these overheads may build up and impact the system’s overall responsiveness.
Refactoring an existing codebase late in the project lifecycle to try and achieve performance goals is never a welcome task. Project deadline pressures mean any rework may introduce potential new bugs to existing well-tested code. And yet we don’t want to perform unnecessary premature optimization (as in avoiding virtual functions altogether) as this tends to create technical debt, which may come back to bite us (or some other poor soul) during maintenance.
final specifier was introduced in C++11 to ensure that either a class or a virtual function cannot be further overridden. However, as we shall investigate, this also allows them to perform an optimization known as devirtualization, improving runtime performance.
Interfaces and subtyping
Unlike Java, C++ does not explicitly have the concept of Interfaces built into the language. Interfaces play a central role in Design Patterns and are the principal mechanism to implement the SOLID ‘D’ Dependency Inversion Principle pattern.
Simple Interface Example
Let’s take a simplified example; we have a mechanism layer defining a class named
PDO_Protocol. To decouple the protocol from the underlying utility layer, we introduced an interface called
Data_link. The concrete class
CAN_bus then realizes the Interface.
This design would yield the following Interface class:
Side note: I’ll park the discussion about using
pragma once, virtual-default-destructors and pass-by-copy for another day.
The client (in our case,
PDO_protocol) is only dependent on the Interface, e.g.
main, we can bind a
CAN_bus object to a
PDO_protocol object. The calls from
PDO_protocol invoke the overridden functions in
Using dynamic polymorphism
PDO_protocol object to the alternative class.
Importantly, there are no changes to the
PDO_protocol class. With appropriate unit testing, introducing the
RS422 code into the existing codebase involves integration testing (rather than a blurred unit/integration test).
There are many ways we could create the new type (i.e. using factories, etc.), but, again, let’s park that for this post.
The cost of Dynamic Polymorphic behaviour
Using subtyping and polymorphic behaviour is an important tool when trying to manage change. But, like all things in life, it comes at a cost.
The code generated in the examples using the Arm GNU Toolchain v11.2.1.
We get the following assembler for the call to the member function in
Branch with Link (bl) opcode is the AArch32 function calling convention (
r0 contains the object’s address).
The generated assembler for
The actual code generated, naturally, depends on the specific ABI (Application Binary Interface). But, for all C++ compilers, it will involve a similar set of steps. Visualizing the implementation:
And examining the generated assembler, we can deduce the following behaviour:
r0contains the address of the object (passed as the parameter to
- the contents at this address are loaded into
r3now contains the vtable-pointer (
vtptris, in effect, an array of function pointers.
- The first entry in the
vtableis loaded back into
- r3 now contains the address of
- the current program counter (
pc) is moved into the link register (
lr) before the function call
branch-with-exchangeopcode is executed. So, the instruction
If, for example, we were calling
sensor.set_ID(), then the second memory load would be
LDR r3,[r3,#4] to load the address of
vtable). Most ABIs structure the
vtable based on the order of virtual function declaration.
We can deduce that the overhead of using a virtual function (for Arm Cortexv7-M) is:
However, what is significant is the second memory load (
LDR r3,[r3]), as this memory read requires Flash access. A read from Flash is typically slower than an equivalent read from SRAM. A lot of design effort goes into improving Flash read performance, so your “mileage may vary” regarding the actual timing overhead.
Using polymorphic functions
If we create a class that derives from
But by visualizing the memory model, it becomes clear how the same code:
Invokes the derived function:
The derived class has its own
vtable populated at link-time. Any overridden functions replace the
vtable entry with the address of the new function. The constructors are responsible for storing the address of the
vtable in the classes
Any virtual functions in the base class that are not overridden still point at the base class implementation. Pure-virtual functions (as used in the interface pattern) have no entry populated in the
vtable, so they must be overridden.
As previously noted, the
final specifier was introduced alongside
override in C++11.
final specifier was introduced to ensure that a derived class cannot override a virtual function or that a class cannot be further derived from it.
For example, currently, we could derive further from the
When defining the
Rotary_encoder class, this may not have been our intended design. Adding the
final specifier stops any further derivation.
A class, may be specified as
Or an individual function can be tagged as
Okay, so how can this help with compiler optimization?
When calling a function, such as
read_sensorand the parameter is a pointer/reference to the Base class, which in turn calls a virtual member function, the call must be polymorphic.
If we overload
read_sensor to take a
Rotary_encode object by reference, e.g.
If the compiler can prove exactly which actual method is called at compile time, it can change a virtual method call into a direct method call.
final specifier, the compiler cannot prove that the
sensor, isn’t bound to a further derived class instance. So the generated assembler for both
read_sensor functions are identical.
However, if we apply the
final specifier to the
Rotary_encoder class, the compiler can prove that the only matching call must be
Rotary_encoder::get_value, then it can apply devirtualization and generate the following code for
Templates and final
The code generator will bind dynamically or statically as appropriate, depending on whether we call with a
Sensor object or a
Revising the Interface
Given the potential for
devirtualization, can we utilize this in our Interface design?
Unfortunately, for the compiler to be able to prove the actual method call, we must use
final in conjunction with a pointer/reference to the derived type. Given the original code:
The compile cannot perform
devirtualization because it has a reference to the interface (base) class and not the derived class. This leaves us with two potential refactoring solutions:
- Modify the link type to the derived type
- Make the client a template class
Devirtulization using a direct link
Using a direct link is a “quick and dirty” fix.
It does change the
PDO_protocol header, but otherwise, it “does the job”. The generated code now calls
CAN_bus::recieve directly rather than through a
However, using this approach, we reintroduce the coupling between the “Mechanism layer” and the “Utility layer”, breaking the DIP.
Devirtulization using templates
Alternatively, we can rework the client code as a template class, where the template parameter specifies the link class.
Templates bring their complications, but it does ensure we get static binding to any classes specified as
final specifier offers an opportunity to refactor existing interface code to alter the binding from dynamic to static polymorphism, typically improving runtime performance. The actual gains will depend significantly on the underlying ABI and machine architecture (start throwing in pipelining and caching, and the waters get even muddier).
Ideally, when using virtual functions in embedded applications, considering whether a class should be specified as
final should be decided at design time rather than late in the project’s timeline.
- Disassembling a Cortex-M raw binary file with Ghidra - December 20, 2022
- Using final in C++ to improve performance - November 14, 2022
- Understanding Arm Cortex-M Intel-Hex (ihex) files - October 12, 2022
Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.