Death and (virtual) destruction*

This time, we’ll have a more detailed look at one of those everybody-knows-that elements of C++ – virtual destructors.

More specifically, I want to reinforce under what circumstances you should make your destructor virtual; and when you don’t need to (despite what your compiler might say)

(*there’s no death)

Continue reading

Posted in C/C++ Programming, Design Issues | Tagged , , , , , , , , | 9 Comments

Getting your head around auto’s type-deduction rules

Automatic type-deduction is perhaps one of the more divisive features of Modern C++.  At its core it’s a straightforward concept:  let the compiler deduce the type of an object from its initialiser.   Used in the right way this can improve the readability and maintainability of your code.

However, because auto is based on template type-deduction rules there are some subtleties that can catch the unwary programmer.

In this article we’ll have a look at auto in the context of the template type-deduction rules to see where all these subtleties come from.

Continue reading

Posted in C/C++ Programming | Tagged , , , , , , , , , , | 1 Comment

An Introduction to Hypervisors

Hypervisors are becoming commonplace in the embedded world, especially in high-end multi-core systems. If you’d asked me about virtualisation or hypervisors 2 years ago, like most people I didn’t know much about them. A hypervisor, that’s a super-supervisor, right? Virtualisation, you mean Virtual Machines, right? Running Linux on Windows using VMware, right? Not any more!

screen-shot-2016-10-19-at-16-02-07

Here at Feabhas we’ve noticed a lot of our clients and contacts are starting to look at designs using hypervisors in embedded systems for a number of reasons. The main drivers are connected devices being used in the Internet of Things, multi-core split RTOS/Linux systems offering real-time Linux solutions, and the need for security and trusted execution environments, where critical processes are isolated from the outside world.

There is a lot of potential for using hypervisors on projects that designers haven’t considered using them in. Hypervisors can be used on bare metal systems or run on operating systems, they offer good task protection and isolation of hardware and memory resources. If you haven’t considered using a hypervisor on your system, or if you are considering using one and are just getting started, our Introduction to Hypervisors webinar is worth tuning in to.

 

 

Posted in ARM, Design Issues, General, Industry Analysis, Linux | Tagged , , , , , | Leave a comment

Off to the Embedded Linux Conference Europe and Open IoT Summit, Berlin 11th-13th October 2016

It’s hard to believe another year has passed and it’s time once again for the Embedded Linux Conference, and next week I’ll be off to Berlin to join a couple thousand other Linux enthusiasts for our annual bash.

A lot has happened in the past 12 months especially in the fields of security and the Internet of Things (IoT). A lot of people were talking about the IoT a year ago, we’re now seeing a lot more projects being completed, especially in industrial IoT, and there have been a number of high-profile hacking cases in the past 12 months (baby monitors, automotive (Tesla this time) and industrial (Ukraine National Grid)) to give food for thought about security.

Continue reading

Posted in General | Leave a comment

Great Expectations

Previously, we’ve looked at the basic concepts of function parameter passing, and we’ve looked at the mechanics of how parameters are passed at the Application Binary Interface (ABI) level.

Far too often we focus on the mechanisms and efficiency of parameter passing, with the goal: if it’s efficient then it’s good; that’s all there is to it.  In this article I want to move past simple mechanics and start to explore function parameter design intent – that is, what can I expect (to change) about the objects I use as function arguments; and what can I expect to be able to do with an object as a function implementer.

To that end, we’ll take a look at parameter passing from the perspective of the mutability (ability to be modified) of the parameters from both a caller’s and a function’s (callee’s) point of view.

Continue reading

Posted in C/C++ Programming, Design Issues | Tagged , , , , , , , , | 2 Comments

The three ‘No’s of sequential consistency

In the previous article we looked at the memory consistency problem that occurs when writing multi-threaded code for modern multi-processor systems.

In this article is we’ll have a look at how we can solve the sequential consistency problem and restore some sanity to our programming world.

Continue reading

Posted in ARM, C/C++ Programming, Cortex | Tagged , , , , , , | Leave a comment

Memory consistency made simple(ish)

The C++11 memory consistency model is probably one of the most significant aspects of Modern C++; and yet probably one of the least well-understood.  I think the reason is simple:  it’s really difficult to understand what the problem actually is.

The memory consistency problem is a concurrency problem.  That is, it’s a problem that occurs when we start writing multi-threaded code.  More specifically, it’s a parallelism problem – the real subtleties occur when you have two or more processors executing code.

In the first part of this two-part article we’ll have a look at the causes of the memory consistency problem.

Continue reading

Posted in C/C++ Programming, Design Issues, General | Tagged , , | 1 Comment

Using your Feabhas USB stick on a Mac

Nearly all our Feabhas courses now have their tools/lab exercises on a bootable Linux USB stick, either Fedora or Ubuntu. These USB sticks were designed to boot laptop PCs, but Macbook Pros are becoming increasingly popular in the laptop market, with 10% of the market in 2015.

Our USB sticks won’t boot a Macbook Pro, but we can run them in a virtual machine on a Mac.

Here I’ll talk you through what needs to be done in nine easy steps to get one of our EL-503 or EL-504 Fedora 19 USB sticks working with a Macbook Pro with Virtual Box, so that you can do all the Lab exercises in our courses including firing rockets from the USB rocket launchers!

My thanks to Niall Cooling – evangelist of all things Mac – for taking the plunge and getting a USB stick booting on his Macbook pro!

Continue reading

Posted in General, Linux, training | Tagged , , , , | Leave a comment

Boot Times in Linux/Android

There’s a vast amount of material out there on boot times and people showcasing boot times of as little as one second [1]. But the reality is often different for many devices in the field: some devices boot in 10s or less, others take over 3 minutes. Here’s a handful of devices I measured:

Raspberry Pi 2 Model B with Raspbian GNU/Linux 8

11s to shell prompt
Garmin Nüvi 42 Sat Nav 14s (detects power off after 9s)
Beaglebone Black with Angstrom Distribution 17s to shell prompt
PC booting Ubuntu 14.04 with KDE UI (no login) 37s
Android 5.1 Moto X smart phone 42s
PC booting Fedora 19 with Gnome UI from a USB Stick 43s
PC booting Mint 17.2 KDE from USB stick 90s
Pace Linux set top box with secure boot, middleware + UI 180s

Virgin Media TiVo box with secure boot, middleware + UI

190s

There’s a number of reasons why these boot times vary so drastically, and a number of things we can do to optimise boot time, but there is always a trade-off with functionality, and the development time and effort expended to make reductions.

Continue reading

Posted in Linux | Tagged , , | 2 Comments

Function Parameters and Arguments on 32-bit ARM

Function call basics

When teaching classes about embedded C  or embedded C++ programming, one of the topics we always address is “Where does the memory come from for function arguments?

Take the following simple C function:

void test_function(int a, int b, int c, int d);

when we invoke the function, where are the function arguments stored?

int main(void)
{
  //...
  test_function(1,2,3,4);
  //...
}

Unsurprisingly, the most common answer after “I don’t know” is “the stack“; and of course if you were compiling for x86 this would be true. This can be seen from the following x86 assembler for main setting up the call to test_function (Note: your milage will vary if compiled for a 64-bit processors):

  ...
  subl $16, %esp
  movl $4, 12(%esp)
  movl $3, 8(%esp)
  movl $2, 4(%esp)
  movl $1, (%esp)
  call _test_function
  ...

The stack is decremented by 16-bytes, then the four int’s are moved onto the stack prior to the call to test_function.

In addition to the function arguments being pushed, the call will also push the return address (i.e. the program counter of the next  instruction after the call) and, what in x86 terms, is often referred to as the saved frame pointer on to the stack. The frame pointer is used to reference local variables further stored on the stack.

This stack frame format is quite widely understood and historically been the target of malicious buffer overflows attacks by modifying the return address.

But, of course, we’re not here to discuss x86, it’s the ARM architecture we’re interested in.

The AAPCS

ARM is a RISC architecture; whereas the x86 is CISC. Since 2003 ARM have published a document detailing how separately compiled and linked code units work together. Over the years it has gone through a couple of name changes, but is now officially referred to as the “Procedure Call Standard for the ARM Architecture” or the AAPCS (I know, don’t ask!).

If we recompile main.c for ARM using the armcc compiler:

> armcc -S main.c

we get the following:

     ...
     MOV      r3,#4
     MOV      r2,#3
     MOV      r1,#2
     MOV      r0,#1
     BL       test_function
     ...

Here we can see that the four arguments have been placed in register r0-r3. This is followed by the “Relative branch with link” instruction. So how much stack has been used for this call? The short answer is none, as BL instruction moves the return address into the Link Register (lr/r14) rather than pushing it on to the stack, as per the x86 model.

Note: Around a function call there maybe other stack operations but that’s not the focus of this post

The Register Set

I’d imagine many readers are familiar with the ARM register set, but just to review;

  • There are 16 data/core registers r0-r15
  • Of these 16, three are special purpose registers
    • Register r13 acts as the stack pointer (SP)
    • Register r14 acts as the link register (LR)
    • Register r15 acts as the program counter (PC)

Basic Model

So the base function call model is that if there are four or fewer 32-bit parameters, r0 through r3 are used to pass the arguments and the call return address is stored in the link register.

If we add a fifth parameter, as in:

void test_function2(int a, int b, int c, int d, int e);
int main(void)
{
  //...
  test_function2(1,2,3,4,5);
  //...;
}

We get the following:

        ...
        MOV      r0,#5
        MOV      r3,#4
        MOV      r2,#3
        STR      r0,[sp,#0]
        MOV      r1,#2
        MOV      r0,#1
        BL       test_function2
        ...

Here, the fifth argument (5) is being stored on the stack prior to the call. 

Note however, in a larger code base you are likely to see at least one an extra stack “push” here (quite often r4) which is never accessed in the called function. This is because the stack alignment requirements defined by the AAPCS differ from functions called within the same translation unit to those called across translation units. The basic requirement of the stack is that:

SP % 4 == 0

However, the call is classes as a public interface, then the stack must adhere too:

SP % 8 == 0

Return values

Given the following code:

int test_function(int a, int b, int c, int d);
int val;
int main(void)
{
  //...
  val = test_function(1,2,3,4);
  //...
}

By analyzing the assembler we can see the return value is place in r0

        ...
        MOV      r3,#4
        MOV      r2,#3
        MOV      r1,#2
        MOV      r0,#1
        BL       test_function
        LDR      r1,|L0.40|  ; load address of extern val into r1
        STR      r0,[r1,#0]  ; store function return value in val
        ...

C99 long long Arguments

The AAPCS defines the size and alignment of the C base types. The C99 long long is 8 bytes in size and alignment. So how does this change our model?

Given:

long long test_ll(long long a, long long b);

long long ll_val;
extern long long ll_p1;
extern long long ll_p2;

int main(void)
{
  //...
  ll_val = test_ll(ll_p1, ll_p2);
  //...
}

We get:

   ...
   LDR      r0,|L0.40|
   LDR      r1,|L0.44|
   LDRD     r2,r3,[r0,#0]
   LDRD     r0,r1,[r1,#0]
   BL       test_ll
   LDR      r2,|L0.48|
   STRD     r0,r1,[r2,#0]
   ...
|L0.40|
   DCD      ll_p2
|L0.44|
   DCD      ll_p1

This code demonstrates that an 64-bit long long uses two registers (r0-r1 for the first parameter and r2-r3 for the second). In addition, the 64-bit return value has come back in r0-r1.

Doubles

As with the long long, a double type (based on the IEEE 754 standard) is also 8-bytes in size and alignment on ARM. However the code generated will depend on the actual core. For example, given the code:

double test_dbl(double a, double b);

double dval;
extern double dbl_p1;
extern double dbl_p2;

int main(void)
{
  //...
  dval = test_dbl(dbl_p1, dbl_p2);
  //...
}

When compiled for a Cortex-M3 (armcc –cpu=Cortex-M3 –c99 -S main.c) the output is almost identical to the long long example:

        ...
        LDR      r0,|L0.28|
        LDR      r1,|L0.32|
        LDRD     r2,r3,[r0,#0]
        LDRD     r0,r1,[r1,#0]
        BL       test_dbl
        LDR      r2,|L0.36|
        STRD     r0,r1,[r2,#0]
        ...
|L0.28|
        DCD      dbl_p2
|L0.32|
        DCD      dbl_p1

However, if we recompile this for a Cortex-A9 (armcc –cpu=Cortex-A9 –c99 -S main.c), note we get quite different generated instructions:

        ...
        LDR r0,|L0.40|
        VLDR d1,[r0,#0]
        LDR r0,|L0.44|
        VLDR d0,[r0,#0]
        BL test_dbl
        LDR r0,|L0.48|
        VSTR d0,[r0,#0]
        ...
|L0.40|
        DCD dbl_p2
|L0.44|
        DCD dbl_p1

The VLDR and VSTR instructions are generated as the Cortex-A9 has Vector Floating Point (VFP) technology.

Mixing 32-bit and 64-bit parameters

Assuming we change our function to accept a mixture of 32-bit and 64-bit parameters, e.g.

void test_iil(int a, int b, long long c);
extern long long ll_p1;

int main(void)
{
   //...
   test_iil(1, 2, ll_p1);
   //...
}

As expected we get; a in r0, b in r1 and ll_p1 in r2-r3.

       ...
       LDR r0,|L0.32|
       MOV r1,#2
       LDRD r2,r3,[r0,#0]
       MOV r0,#1
       BL test_iil
       ...
|L0.32|
       DCD ll_p1

However, if we subtly change the order to:

void test_iil(int a, long long c, int b);
extern long long ll_p1;
int main(void)
{
   //...
   test_ili(1,ll_p1,2);
   //...
}

We get a different result; a is in r0, c is in r2-r3, but now b is stored on the stack (remember this may also include extra stack alignment operations).

      ...
      MOV r0,#2
      STR r0,[sp,#0] ; store parameter b on the stack
      LDR r0,|L0.36|
      LDRD r2,r3,[r0,#0]
      MOV r0,#1
      BL test_ili
      ...
|L0.36|
      DCD ll_p1

So why doesn’t parameter ‘c’ use r1-r2? because the AAPCS states:

“A double-word sized type is passed in two consecutive registers (e.g., r0 and r1, or r2 and r3). The content of the registers is as if the value had been loaded from memory representation with a single LDM instruction”

As the complier is not allowed to rearrange parameter ordering, then unfortunately the parameter ‘b’ has to come in order after ‘c’ and therefore cannot use the unused register r1 and ends up on the stack.

C++

For all you C++ programmers out there, it is important to realize that for class member functions the implicit ‘this’ argument is passed as a 32-bit value in r0. So, hopefully, you can see the implications if targeting ARM of:

class Ex
{
public:
    void mf(long long d, int i);
};

vs.

class Ex
{
public:
    void mf(int i, long long d);
};

Summary

Even though keeping arguments in registers may be seen as “marginal gains“, for large code bases I have seen, first-hand, significant performance and power improvements simply by rearranging the parameter ordering.

And finally…

I’ll leave you with one more bit of code to puzzle over. An often quoted guideline when programming in C is not to pass struct’s by value, but rather to pass by pointer.

So given the following code:

typedef struct
{
   int a;
   int b;
   int c;
   int d;
} Example;

void pass_by_copy(Example p);
void pass_by_ptr(const Example* const p);

Example ex = {1,2,3,4};

int main(void)
{
   //...
   pass_by_copy(ex);
   pass_by_ptr(&ex);
   //...
}

Can you guess/predict the difference in performance and memory implications of each option?

Feabhas embedded programming training courses

This post originally appear on the ARM Connected Community site

Posted in ARM, C/C++ Programming, Cortex | Tagged , , | 5 Comments