Boot Times in Linux/Android

There’s a vast amount of material out there on boot times and people showcasing boot times of as little as one second [1]. But the reality is often different for many devices in the field: some devices boot in 10s or less, others take over 3 minutes. Here’s a handful of devices I measured:

Raspberry Pi 2 Model B with Raspbian GNU/Linux 8

11s to shell prompt
Garmin Nüvi 42 Sat Nav 14s (detects power off after 9s)
Beaglebone Black with Angstrom Distribution 17s to shell prompt
PC booting Ubuntu 14.04 with KDE UI (no login) 37s
Android 5.1 Moto X smart phone 42s
PC booting Fedora 19 with Gnome UI from a USB Stick 43s
PC booting Mint 17.2 KDE from USB stick 90s
Pace Linux set top box with secure boot, middleware + UI 180s

Virgin Media TiVo box with secure boot, middleware + UI

190s

There’s a number of reasons why these boot times vary so drastically, and a number of things we can do to optimise boot time, but there is always a trade-off with functionality, and the development time and effort expended to make reductions.

Continue reading

Posted in Linux | Tagged , , | 2 Comments

Function Parameters and Arguments on 32-bit ARM

Function call basics

When teaching classes about embedded C  or embedded C++ programming, one of the topics we always address is “Where does the memory come from for function arguments?

Take the following simple C function:

void test_function(int a, int b, int c, int d);

when we invoke the function, where are the function arguments stored?

int main(void)
{
  //...
  test_function(1,2,3,4);
  //...
}

Unsurprisingly, the most common answer after “I don’t know” is “the stack“; and of course if you were compiling for x86 this would be true. This can be seen from the following x86 assembler for main setting up the call to test_function (Note: your milage will vary if compiled for a 64-bit processors):

  ...
  subl $16, %esp
  movl $4, 12(%esp)
  movl $3, 8(%esp)
  movl $2, 4(%esp)
  movl $1, (%esp)
  call _test_function
  ...

The stack is decremented by 16-bytes, then the four int’s are moved onto the stack prior to the call to test_function.

In addition to the function arguments being pushed, the call will also push the return address (i.e. the program counter of the next  instruction after the call) and, what in x86 terms, is often referred to as the saved frame pointer on to the stack. The frame pointer is used to reference local variables further stored on the stack.

This stack frame format is quite widely understood and historically been the target of malicious buffer overflows attacks by modifying the return address.

But, of course, we’re not here to discuss x86, it’s the ARM architecture we’re interested in.

The AAPCS

ARM is a RISC architecture; whereas the x86 is CISC. Since 2003 ARM have published a document detailing how separately compiled and linked code units work together. Over the years it has gone through a couple of name changes, but is now officially referred to as the “Procedure Call Standard for the ARM Architecture” or the AAPCS (I know, don’t ask!).

If we recompile main.c for ARM using the armcc compiler:

> armcc -S main.c

we get the following:

     ...
     MOV      r3,#4
     MOV      r2,#3
     MOV      r1,#2
     MOV      r0,#1
     BL       test_function
     ...

Here we can see that the four arguments have been placed in register r0-r3. This is followed by the “Relative branch with link” instruction. So how much stack has been used for this call? The short answer is none, as BL instruction moves the return address into the Link Register (lr/r14) rather than pushing it on to the stack, as per the x86 model.

Note: Around a function call there maybe other stack operations but that’s not the focus of this post

The Register Set

I’d imagine many readers are familiar with the ARM register set, but just to review;

  • There are 16 data/core registers r0-r15
  • Of these 16, three are special purpose registers
    • Register r13 acts as the stack pointer (SP)
    • Register r14 acts as the link register (LR)
    • Register r15 acts as the program counter (PC)

Basic Model

So the base function call model is that if there are four or fewer 32-bit parameters, r0 through r3 are used to pass the arguments and the call return address is stored in the link register.

If we add a fifth parameter, as in:

void test_function2(int a, int b, int c, int d, int e);
int main(void)
{
  //...
  test_function2(1,2,3,4,5);
  //...;
}

We get the following:

        ...
        MOV      r0,#5
        MOV      r3,#4
        MOV      r2,#3
        STR      r0,[sp,#0]
        MOV      r1,#2
        MOV      r0,#1
        BL       test_function2
        ...

Here, the fifth argument (5) is being stored on the stack prior to the call. 

Note however, in a larger code base you are likely to see at least one an extra stack “push” here (quite often r4) which is never accessed in the called function. This is because the stack alignment requirements defined by the AAPCS differ from functions called within the same translation unit to those called across translation units. The basic requirement of the stack is that:

SP % 4 == 0

However, the call is classes as a public interface, then the stack must adhere too:

SP % 8 == 0

Return values

Given the following code:

int test_function(int a, int b, int c, int d);
int val;
int main(void)
{
  //...
  val = test_function(1,2,3,4);
  //...
}

By analyzing the assembler we can see the return value is place in r0

        ...
        MOV      r3,#4
        MOV      r2,#3
        MOV      r1,#2
        MOV      r0,#1
        BL       test_function
        LDR      r1,|L0.40|  ; load address of extern val into r1
        STR      r0,[r1,#0]  ; store function return value in val
        ...

C99 long long Arguments

The AAPCS defines the size and alignment of the C base types. The C99 long long is 8 bytes in size and alignment. So how does this change our model?

Given:

long long test_ll(long long a, long long b);

long long ll_val;
extern long long ll_p1;
extern long long ll_p2;

int main(void)
{
  //...
  ll_val = test_ll(ll_p1, ll_p2);
  //...
}

We get:

   ...
   LDR      r0,|L0.40|
   LDR      r1,|L0.44|
   LDRD     r2,r3,[r0,#0]
   LDRD     r0,r1,[r1,#0]
   BL       test_ll
   LDR      r2,|L0.48|
   STRD     r0,r1,[r2,#0]
   ...
|L0.40|
   DCD      ll_p2
|L0.44|
   DCD      ll_p1

This code demonstrates that an 64-bit long long uses two registers (r0-r1 for the first parameter and r2-r3 for the second). In addition, the 64-bit return value has come back in r0-r1.

Doubles

As with the long long, a double type (based on the IEEE 754 standard) is also 8-bytes in size and alignment on ARM. However the code generated will depend on the actual core. For example, given the code:

double test_dbl(double a, double b);

double dval;
extern double dbl_p1;
extern double dbl_p2;

int main(void)
{
  //...
  dval = test_dbl(dbl_p1, dbl_p2);
  //...
}

When compiled for a Cortex-M3 (armcc –cpu=Cortex-M3 –c99 -S main.c) the output is almost identical to the long long example:

        ...
        LDR      r0,|L0.28|
        LDR      r1,|L0.32|
        LDRD     r2,r3,[r0,#0]
        LDRD     r0,r1,[r1,#0]
        BL       test_dbl
        LDR      r2,|L0.36|
        STRD     r0,r1,[r2,#0]
        ...
|L0.28|
        DCD      dbl_p2
|L0.32|
        DCD      dbl_p1

However, if we recompile this for a Cortex-A9 (armcc –cpu=Cortex-A9 –c99 -S main.c), note we get quite different generated instructions:

        ...
        LDR r0,|L0.40|
        VLDR d1,[r0,#0]
        LDR r0,|L0.44|
        VLDR d0,[r0,#0]
        BL test_dbl
        LDR r0,|L0.48|
        VSTR d0,[r0,#0]
        ...
|L0.40|
        DCD dbl_p2
|L0.44|
        DCD dbl_p1

The VLDR and VSTR instructions are generated as the Cortex-A9 has Vector Floating Point (VFP) technology.

Mixing 32-bit and 64-bit parameters

Assuming we change our function to accept a mixture of 32-bit and 64-bit parameters, e.g.

void test_iil(int a, int b, long long c);
extern long long ll_p1;

int main(void)
{
   //...
   test_iil(1, 2, ll_p1);
   //...
}

As expected we get; a in r0, b in r1 and ll_p1 in r2-r3.

       ...
       LDR r0,|L0.32|
       MOV r1,#2
       LDRD r2,r3,[r0,#0]
       MOV r0,#1
       BL test_iil
       ...
|L0.32|
       DCD ll_p1

However, if we subtly change the order to:

void test_iil(int a, long long c, int b);
extern long long ll_p1;
int main(void)
{
   //...
   test_ili(1,ll_p1,2);
   //...
}

We get a different result; a is in r0, c is in r2-r3, but now b is stored on the stack (remember this may also include extra stack alignment operations).

      ...
      MOV r0,#2
      STR r0,[sp,#0] ; store parameter b on the stack
      LDR r0,|L0.36|
      LDRD r2,r3,[r0,#0]
      MOV r0,#1
      BL test_ili
      ...
|L0.36|
      DCD ll_p1

So why doesn’t parameter ‘c’ use r1-r2? because the AAPCS states:

“A double-word sized type is passed in two consecutive registers (e.g., r0 and r1, or r2 and r3). The content of the registers is as if the value had been loaded from memory representation with a single LDM instruction”

As the complier is not allowed to rearrange parameter ordering, then unfortunately the parameter ‘b’ has to come in order after ‘c’ and therefore cannot use the unused register r1 and ends up on the stack.

C++

For all you C++ programmers out there, it is important to realize that for class member functions the implicit ‘this’ argument is passed as a 32-bit value in r0. So, hopefully, you can see the implications if targeting ARM of:

class Ex
{
public:
    void mf(long long d, int i);
};

vs.

class Ex
{
public:
    void mf(int i, long long d);
};

Summary

Even though keeping arguments in registers may be seen as “marginal gains“, for large code bases I have seen, first-hand, significant performance and power improvements simply by rearranging the parameter ordering.

And finally…

I’ll leave you with one more bit of code to puzzle over. An often quoted guideline when programming in C is not to pass struct’s by value, but rather to pass by pointer.

So given the following code:

typedef struct
{
   int a;
   int b;
   int c;
   int d;
} Example;

void pass_by_copy(Example p);
void pass_by_ptr(const Example* const p);

Example ex = {1,2,3,4};

int main(void)
{
   //...
   pass_by_copy(ex);
   pass_by_ptr(&ex);
   //...
}

Can you guess/predict the difference in performance and memory implications of each option?

Feabhas embedded programming training courses

This post originally appear on the ARM Connected Community site

Posted in ARM, C/C++ Programming, Cortex | Tagged , , | 4 Comments

My Top 5 Podcasts

For the final blog post of the year I’ve decided to do something a little different; I hope that’s okay?noun_15457

Due to the nature of the job, the technical team at Feabhas spend a lot of time travelling. This means many an hour spent in the car driving to and from client sites; often involving navigating the wonderful M25 London orbital [car park!]. We all while away this time in different ways, some prefer music, others radio (which, being in the UK means we’re very well served with the array of BBC stations).

I, however, have been a long term fan of podcasts. From having my first fruit-based music device I have listened to podcasts especially on long drives. Over the years many people have asked me what I listen too, so I though it might be useful to finally share my listening habits. So here are my ‘current’ top 5.

Kermode and Mayo’s File Review (better knows a ‘Wittertainment’)

My first podcast is not a technology show, but a film podcast. Another “benefit” of spending lots of time on long-haul flights and hotel rooms is I get to watch a lot of films (which drives my family to despair). The live broadcast of the BBC Radio 5 Live Friday afternoon show is wrapped in a weekly podcast.

It is one of those shows that has many ‘in-jokes’; the longer you listen the funnier it gets. I have had many a time where I have been in tears of laughter on my own in the car. Just excellent.

Typical show length ~2 hours

Security Now

Ever since first reading “The Cuckoo’s Egg” back in 1990 (on my honeymoon of all things!) I have been developed a real interest in computer security. From an embedded guy’s perspective many aspects of security and their breaches have been of passing interest (e.g. SQL injection attacks, cross-site scripting, etc.) as they’re “not part of my world’. However with our headlong rush into IoT and connected systems I predict this is going to have the largest impact on the embedded software community for many a year.

SN has been running for over 10 years now (an amazing achievement) and is hosted by Steve Gibson and Leo Laporte. They make a great double-act, and as someone who spends their time training software engineers, I appreciate Steve’s ability to take complex subjects and make them simple (but not dumb). Leo is also a great technologist, having a good, broad understanding of technology trends through his running of the TWiT.tv Netcast Network.

Each show now is typically around two hours long. Every other week (when not overwhelmed by security news) there is a Q&A show. I must admit if I’m getting behind I will skip the questions part of the Q&A show (but not the preamble before the questions which covers this weeks security news). I find the show very pragmatic and suitable for people who are not security professionals. In addition, Steve keeps both show notes and a textual transcription of the show on his site . I have found this useful where there is something I have heard while driving (or walking the dog) and want to look into further. For example, this year I’ve spent a lot of time looking at SSL/TLS and DTLS for IoT where I found Steve’s coverage of SSL/TLS a useful starting point for further research.

Typical show length ~2 hours

TED Radio hour

Hopefully you’re already aware of TED (Technology, Entertainment and Design) Talks; if so, then this show takes a common theme (e.g. the Open Source World, Playing with Perceptions) and brings together a number of different TEDTalks by interviewing the presenters, interlaced with clips from their talks. The host, Guy Raz, does an incredible job of bringing out the story behind the talk and making you want to immediately get online and watch the full, original, talk. I really like this as it exposes me to aspects of our world I wouldn’t necessarily consider. For example, a recent show focused on “Quiet” and explored ways to find quiet in our busy lives.

Typical show length ~50 mins

FreakonomicsRadio

I was drawn to this podcast as I’d previously read the excellent book Freakonomics by Steven D. Levitt and Stephen J. Dubner. The book was published back in 2005 with the podcast starting in 2010. Fundamentally it’s a show (following on from the premise of the book) about how economics effect decisions in our everyday lives. The subject areas vary widely; from crack gangs though to real-estate agents. The shows always make me consider my own viewpoint (e.g. show 210 “Is It Okay for Restaurants to Racially Profile Their Employees?”) even when I didn’t appreciate I had one! I also really like when the show wanted to ask for donations to continue (I do do a monthly donation) they did an episode exploring the issues and economics : show 141 “How to Raise Money Without Killing a Kitten”.

Being on NPR, the show is, naturally, U.S. centric; but still a very worthwhile listen.

Typical show length 30-45 mins

Infinite Monkey Cage

Another offering from the BBC; this show is a bizzar mix of hard science and comedy. People who know me well might be surprised by this choice as I’m not a great fan of one of the hosts, the acclaimed physicist Brian Cox’s TV programmes (e.g. The Wonders of… series). However, on this show, his very witty co-host, Robin Ince, brings a nice balance to the show. There are also typically a two or three additional guests helping bring the show along.

Following on with a common theme from the previous two podcasts (TED and Freakonomics) I like this show as it attempts to explore subject areas that wouldn’t always be foremost in your mind (e.g. the apocalypse and space travel). An great intellectual show that doesn’t dumb down the subject and makes me think.

Typical show length ~30 mins

Honourable mentions

If you’re doing any form of longish commute I recommend trying the different podcasts out. Personally I use the Downcast app to listen to my podcasts in preference to the native apps.

Are there any really good ones out there I’ve missed? If so please leave me a comment or let me know on twitter (@feabhas).

So that’s pretty much it for 2015 from the team at Feabhas. We hope you’ve appreciated the blogs that we’ve put out (with Glennan doing such a sterling job on Modern C++ subjects). We’re already lining up blogs for 2016, but if there are areas you feel we could address better please let us know.

Otherwise, I wish you all a Merry Christmas from all at Feabhas and look forward to 2016 where we’ll have some new major announcements!

Thanks you all for your support.

Niall.

Posted in General | Tagged | Leave a comment

Seeing stars. And dots. And arrows.

This time I want to look at a seemingly trivial concept in C++ programming: accessing class members, either directly or via a pointer.  More than anything it’s an excuse to talk about two of C++’s more obscure operators – .* and ->*

Continue reading

Posted in C/C++ Programming | Tagged , , , , , , , , , | 1 Comment

Becoming a Rule of Zero Hero

“Do, or do not; there is no ‘try’.”

Previously, we’ve looked at The Rule of Zero which, in essence, says: avoid doing your own resource management; use a pre-defined resource-managing type instead.

This is an excellent guideline and can significantly improve the quality of your application code. However, there are some circumstances where you might not get exactly what you were expecting. It’s not that the code will fail; it just might not be as efficient as you thought.

Luckily, the solution is easy to implement and has the additional side-effect of making your code even more explicit.

Continue reading

Posted in C/C++ Programming | Tagged , , , , , , , , , , , | 7 Comments

Bitesize Modern C++ : Smart pointers

The dynamic creation and destruction of objects was always one of the bugbears of C. It required the programmer to (manually) control the allocation of memory for the object, handle the object’s initialisation then ensure that the object was safely cleaned-up after use and its memory returned to the heap. Because many C programmers weren’t educated in the potential problems (or were just plain lazy or delinquent in their programming) C got a reputation in some quarters for being an unsafe, memory-leaking language.

Things didn’t significantly improve in C++. We replaced malloc and free with new and delete; but the memory management issue remained.

image

I concede – the code above is trivial and stupid but I suspect if I looked around I could find similar (or even worse!) examples in actual production code.

Languages such as Java and C# solved this problem by taking memory management out of the hands of the programmer and using a garbage collector mechanism to ensure memory is cleaned up when not in use.

In Modern C++ they have chosen not to go down this route but instead make use of C++’s Resource Acquisition Is Initialisation (RAII) mechanism to encapsulate dynamic object creation / destruction within smart pointers.

A smart pointer is basically a class that has the API of a ‘raw’ pointer. In Modern C++ we have four classes for dynamic object management:

std::auto_ptr : Single-owner managed pointer, from C++98; now deprecated

std::shared_ptr : A reference-counted pointer, introduced in C++98 TR1

std::unique_ptr : Single-owner managed pointer which replaces (the now deprecated) auto_ptr

std::weak_ptr : Works with shared_ptr in situations where circular references could be a problem

 

Avoid using std::auto_ptr

std::auto_ptr was introduced in C++98 as a single-owner resource-managed smart pointer. That is, only one auto_ptr can ever be pointing at the resource.

auto_ptr objects have the peculiarity of taking ownership of the pointers assigned (or copied) to them: An auto_ptr object that has ownership over one element is in charge of destroying the element it points to and to deallocate the memory allocated to it when itself is destroyed. The destructor does this by calling delete automatically.

image

When an assignment operation takes place between two auto_ptr objects, ownership is transferred, which means that the object losing ownership is set to no longer point to the element (it is set to nullptr).   This also happens if you copy from one auto_ptr to another – either explicitly, or by passing an auto_ptr to a function by value.

This could lead to unexpected null pointer dereferences – an unacceptable consequence for most (if not all) systems. Therefore, we recommend avoiding the use of auto_ptr. It has now been deprecated in C++11 (and replaced with the much more consistent std::unique_ptr)

 

Use std::unique_ptr for single ownership

std::unique_ptr allows single ownership of a resource. A std::unique_ptr is an RAII wrapper around a ‘raw’ pointer, therefore occupies no more memory (and is generally as fast) as using a raw pointer. Unless you need more complex semantics, unique_ptr is your go-to smart pointer.

unique_ptr does not allow copying (by definition); but it does support move semantics, so you can explicitly transfer ownership of the resource to another unique_ptr.

 

The utility function make_unique<T>() hides away the memory allocation and is the preferred mechanism for dynamically creating objects. make_unique<T>() is not officially supported in C++11; but it is part of C++14 and is supported by many C++11-compliant compilers. (A quick search will turn up an implementation if your compiler doesn’t currently support it)

image

For sharing a resource, use std::shared_ptr

std::shared_ptr is a reference-counted smart pointer.

Creating a new dynamic object also creates a new associated management structure that holds (amongst other things) a reference count of the number of shared_ptrs currently ‘pointing’ at the object.

Each time a shared_ptr is copied the reference count is incremented. Each time one of the pointers goes out of scope the reference count on the resource is decremented. When the reference count is zero (that is, the last shared_ptr referencing the resource goes out of scope) the resource is deleted.

std::shared_ptrs have a higher overhead (in memory and code) than std::unique_ptr but they come with more sophisticated behaviours (like the ability to be copied at relatively low cost).

image

Once again, the standard library provides a utility function make_shared<T>() for creating shared dynamic objects; and, once again, this is the preferred mechanism.

 

Use std::weak_ptr for tracking std::shared_ptrs

A std::weak_ptr is related to a std::shared_ptr. Think of a weak_ptr as a ‘placeholder’ for a shared_ptr. std::weak_ptrs are useful if you want to track the existence of a resource without the overhead of a shared_ptr; or you need to break cyclic dependencies between shared_ptrs (A topic that is outside the scope of this article; but have a look here if you’re interested)

When you create a weak_ptr it must be constructed with an extant shared_ptr. It then becomes a ‘placeholder’ for that shared_ptr. You can store weak_ptrs, copy and move them, but doing so has no effect the reference count on the resource.

image

Note you cannot directly use a weak_ptr. You must convert it back to a shared_ptr first. weak_ptrs have a method, lock(), that creates (in effect) a copy of the original shared_ptr, which can then be accessed.

image

Since weak_ptrs can have a different lifetime to their associated shared_ptr there is a chance the original shared_ptr could go out of scope (and conceptually delete its resource) before the weak_ptr is destroyed.  (Strictly speaking, the resource is deleted when the last referencing shared_ptr and/or weak_ptr have gone out of scope)

A weak_ptr can therefore be invalid – that is, referencing a resource that is no longer viable. You should use the expired() method on the weak_ptr to see if it is still valid, before attempting to access it (alternatively, calling lock() on an expired weak_ptr will return nullptr).

 

That’s all for now.

We’ve got to the end of the Bitesize series for Modern C++.  You should now be in a much stronger position to explore the new features of C++ in more detail.

If you missed an article, or just want the complete set in a single document, you can download the full set of articles as a PDF, here.

To learn more about Feabhas’ Modern C++ training courses, click here.

Posted in C/C++ Programming | Tagged , , , , , , , , , , | 3 Comments

Bitesize Modern C++ : std::array

C++98 inherited C’s only built-in container, the array. Arrays of non-class types behave in exactly the same way as they do in C. For class types, when an array is constructed the default constructor is called on each element in the arrayimage

Explicitly initialising objects in an array is one of the few times you can explicitly invoke a class’s constructor.

image

For track[], the non-default constructor is called for first three elements, followed by default (no parameter) constructor for the last two elements; hence they are 0.0.

(Note the performance implications of this – five constructor calls will be made whether you explicitly initialise the objects or not.)

Arrays are referred to as ‘degenerate’ containers; or, put more antagonistically: they are a lie.

Arrays are basically a contiguous sequence of memory, pointers, and some syntactic sugar. This can lead to some disturbing self-delusion on the part of the programmer.

image

Despite the fact that the declaration of process() appears to specify an array of five Position objects, it is in fact a simple Position* that is passed. This explains why the array_sizeof macro fails (since the size of a Position is greater than the size of a pointer!). It also explains why we can increment the array name (which should be a constant) – as it is in main())

In C++11, use of ‘raw’ arrays is undesirable; and there are more effective alternatives.

std::array is fixed-size contiguous container. The class is a template with two parameters – the type held in the container; and the size.

image

std::array does not perform any dynamic memory allocation. Basically, it’s a thin wrapper around C-style arrays. Memory is allocated – as with built-in arrays – on the stack or in static memory. Because of this, and unlike std::vector, std::arrays cannot be resized.

If C-style notation is used there is no bounds-checking on the std::array; however, if the at() function is used an exception (std::out_of_range) will be thrown if an attempt is made to access outside the range of the array.

std::arrays also have the advantage that they support all the facilities required by the STL algorithms so they can be used wherever a vector or list (etc.) could be used; without the overhead of dynamic memory management.

image

Finally, because container types are classes (not syntactic sugar) they can be passed around the system like ‘proper’ objects.

image

 

More information

Can’t wait? Download the full set of articles as a PDF, here.

To learn more about Feabhas’ Modern C++ training courses, click here.

Posted in C/C++ Programming | Tagged , , , , , | Leave a comment

Bitesize Modern C++ : noexcept

We have some basic problems when trying to define error management in C:

  • There is no “standard” way of reporting errors. Each company / project / programmer has a different approach
  • Given the basic approaches, you cannot guarantee the error will be acted upon.
  • There are difficulties with error propagation; particularly with nested calls.

The C++ exception mechanism gives us a facility to deal with run-time errors or fault conditions that make further execution of a program meaningless.

In C++98 it is possible to specify in a function declaration which exceptions a function may throw.

image

The above function declarations state:

  • get_value() can throw any exception. This is the default.
  • display() will not throw any exceptions.
  • set_value() can throw exceptions of only of type char* and Sensor_Failed; it cannot throw exceptions of any other type.

This looks wonderful, but compilers (can) only partially check exception specifications at compile-time for compliancy.

image

If process() throws an exception of any type other than std::out_of_range this will cause the exception handling mechanism – at run-time – to call the function std::unexpected() which, by default, calls std::terminate() (although its behaviour can – and probably should – be replaced).

Because of the limitations of compile-time checking, for C++11 the exception specification was simplified to two cases:

  • A function may propagate any exception; as before, the default case
  • A function may not throw any exceptions.

Marking a function as throwing no exceptions is done with the exception specifier, noexcept.

(If you read the noexcept documentation you’ll see it can take a boolean constant-expression parameter. This parameter allows (for example) template code to conditionally restrict the exception signature of a function based on the properties of its parameter type. noexcept on its own is equivalent to noexcept(true). The use of this mechanism is beyond the scope of this article.)

image

On the face of it, the following function specifications look semantically identical – both state that the function will not throw any exceptions:

image

The difference is in the run-time behaviour and its consequences for optimisation.

With the throw() specification, if the function (or one of its subordinates) throws an exception, the exception handling mechanism must unwind the stack looking for a ‘propagation barrier’ – a (set of) catch clauses. Here, the exception specification is checked and, if the exception being thrown doesn’t match the provided specification, std::unexpected() is called.

However, std::unexpected() can itself throw an exception. If the exception thrown by std::enexpected() is valid for the current exception specification, exception propagation and stack unwinding continues as before.

This means that there is little opportunity for optimisation by the compiler for code using a throw() specification; in fact, the compiler may even introduce pessimisations to the code:

  • The stack must be maintained in an unwindable state.
  • Destructor order must be maintained to ensure objects going out of scope as a result of the exception are destroyed in the opposite order to their construction.
  • The compiler may introduce new propagation barriers to the code, introducing new exception table entries, thus making the exception handling code bigger.
  • Inlining may be disabled for the function.

In contrast, in the case of a noexcept function specification std::terminate() is called immediately, rather than std::unexpected(). Because of this, the compiler has the opportunity to not have to unwind the stack during an exception, allowing it a much wider range of optimisations.

In general, then, if you know your function will never throw an exception, prefer to specify it as noexcept, rather than throw().

 

More information

Can’t wait? Download the full set of articles as a PDF, here.

To learn more about Feabhas’ Modern C++ training courses, click here.

Posted in C/C++ Programming | Tagged , , , , , , | Leave a comment

Bitesize Modern C++ : Override and Final

Override specifier

In C++98 using polymorphic types can sometimes lead to head-scratching results:image

On the face of it this code looks sound; indeed it will compile with no errors or warnings. However, when it runs the Base version of op() will be executed!

The reason? Derived’s version of op() is not actually an override of Base::op since int and long are considered different types (it’s actually a conversion between an int and a long, not a promotion)

The compiler is more than happy to let you overload functions in the Derived class interface; but in order to call the overload you would need to (dynamic) cast the Base class object in usePolymorphicObject().

In C++11 the override specifier is a compile-time check to ensure you are, in fact, overriding a base class method, rather than simply overloading it.

image

Final specifier

In some cases you want to make a virtual function a ‘leaf’ function – that is, no derived class can override the method. The final specifier provides a compile-time check for this:

image

More information

Can’t wait? Download the full set of articles as a PDF, here.

To learn more about Feabhas’ Modern C++ training courses, click here.

Posted in C/C++ Programming | Tagged , , , , , , | Leave a comment

Security and Connected Devices

With the Internet of Things, we are seeing more and more devices that were traditionally “deep embedded” and isolated from the outside world becoming connected devices. Security needs to be designed into connected products from the outset as the risk of outside attacks is very real. This is especially true if you’re migrating from embedded RTOS systems to Linux and encountering a smorgasbord of “free” connectivity functionality for the first time.

Here we list 10 top tips to help make your connected device as secure as possible. Remember, in many cases, it may not be a question of ‘if’ but ‘when’ an attack occurs.

1. Keep your subsystems separate.

The Jeep Cherokee was chosen as a target for hacking by Charlie Miller and Chris Valasek following an assessment of the vulnerabilities of 24 models of vehicle to see if the internet-connected devices used primarily for communication and entertainment were properly isolated from the driving systems [1].

Most car driving systems are controlled using a CAN bus. You could access them via a diagnostic port – this is what happens when they are serviced in a garage. You would have to have physical access to the vehicle to do this. But if you are connecting to the comms/entertainment systems via the internet, and they’re connected to the driving systems, you could potentially access the driving systems from the internet.

With the explosion of devices being connected, consideration needs to be made to the criticality of functions and how to prevent remote access to a car’s brakes, steering, accelerator, power train and engine management controls. While it might be permissible to grant remote read access for instruments (e.g. mileage and fuel consumption), any control systems should only be accessible by the driver at the controls. And with things like heart monitors starting to become connected devices, the criticality of separation is likely to increase.

2. Secure Your Boot Code

One of the most effective ways of hijacking a system is via the boot code. Some of the earliest computer viruses, e.g. the Elk Cloner for Apple II [2], Brain and Stoned viruses for PCs, infected the boot sectors of removable media. Later viruses corrupted the operating system or even loaded their own. The same possibilities exist with computers and embedded devices today if the bootloader is well known, e.g, grub, u-boot or redboot.

Most devices designed with security in mind have a secure bootloader and a chain of trust. The bootloader will boot from a secure part of the processor and will have a digital signature, so that only a trusted version of it will run. The bootloader will then boot a signed main runtime image.

In many cases the bootloader will boot a signed second stage bootloader, which will only boot a signed main runtime. That way, the keys or encryption algorithms in the main runtime can be changed by changing the second stage bootloader.

3. Use Serialisation and Control Your Upgrade Path

When it comes to upgrading images in the field (to support new features, or to fix bugs or security flaws), this can be done using serialisation to target specific units in the field at particular times to reduce the risk of large numbers of units failing simultaneously after an upgrade.

Each runtime image should be signed with a version number so that only higher number versions can run. Upgrades can be controlled by a combination of different keys held in the unit’s FLASH.

4. Design for Disaster Recovery

Your box no longer boots in the field because the runtime image has become corrupted. What then? Truck rolls or recalls are very expensive and they deprive the user of their product. There are alternatives:

(i) Keep a copy of the runtime for disaster recovery. This can be stored in onboard FLASH as a mirror of the runtime itself, or in a recovery medium, e.g. a USB stick, which is favoured these days by PC manufacturers.

(ii) Alternatively, the bootloader can automatically try for an over-the-air download – this is often favoured with things like set top boxes where the connection is assumed good (it wouldn’t be much of a set top box if it couldn’t tune or access the internet). This saves on FLASH but deprives the user of their product while the new runtime image is being downloaded.

5. Switch off debug code

Don’t give out any information that might be of use to the outside world. The Jeep Cherokee hack was made possible by an IP address being passed back to the user. It’s hard to see what use this would be to a typical non-tech user.

6. Harden the Kernel

The Linux Kernel contains thousands of options, including various ports, shells and communication protocols. It almost goes without saying that any production build needs everything switched off except the features you need. But implementing this isn’t always so straightforward due to the inter-dependencies of some kernel features. Don’t use bash unless it’s unavoidable, use ash instead. The disclosure of the Shellshock, a 25-year-old vulnerability [3], in September 2014, triggered a tidal wave of hacks, mainly distributed denial of service attacks and vulnerability scanning.

Disable telnet. Disable SSH unless you have an essential usage requirement. Disable HTTP. If there is any way a user might form a connection with the box, especially using a method well-used on other boxes, that’s a door into the box that needs locking.

With the growing capabilities and connected nature of embedded RTOS systems approaching that of embedded Linux in Machine to Machine communications and the Internet of Things, similar “hardening” processes need to be followed.

7. Use a Trusted Execution Environment

Most of the main processors used in connected devices (smart phones, tablets, smart TVs, set top boxes) now contain a secure area known as a Trusted Execution Environment (TEE).

A TEE provides isolated execution environment where confidential assets (e.g. video content, banking information) can be accessed in isolation. Four popular uses are: (i) premium content protection, especially 4k UHD content (ii) mobile financial services (iii) authentication (facial recognition, fingerprints and voice) (iv) secure handling of commercially sensitive or government-classified information on devices.

TEEs have two security levels: Profile 1 is intended to prevent software attacks. Profile 2 is intended to prevent hardware and software attacks.

8. Use a Container Architecture

If you are designing a system with a processor that doesn’t use a TEE, you can still design a reasonably safe solution using a container architecture to isolate your key processes.

Linux Containers have been around since August 2008 and rely on Kernel cgroups functionality that first appeared in Kernel version 2.6.24. LXC 1.0, which appeared in February 2014, is considerably more secure than earlier implementations, allowing regular users to run “unprivileged containers”.

Alternatives to LXC are virtualization technologies such as OpenVZ and Linux-Vserver. Other operating systems contain similar technologies such as FreeBSD jails, Solaris Containers, AIX Workload Partitions. Apple’s iOS also uses containers.

9. Lock your JTAG port

Quihoo360 Unicorn Team’s hack of Zigbee [4] was made possible by dumping the contents of the FLASH from the board of the IoT gateway. This enabled them to identify the keys used on the network. The fact that the keys themselves were stored in a format that enabled them to be decoded made the hack easier.

If your JTAG port is unlocked, and hackers have access to the development tools used for the target processor, then they could potentially overwrite any insecure boot code with their own, allowing them to take control of the system and its upgrades.

10. Encrypt Communications Channels and any Key Data

If all the above steps are taken, a device can still be vulnerable to a man-in-the middle attack if the payload is sent unencrypted.

If you have a phone, table, smart TV or set top box accessing video on demand (VOD), the user commands need to be encrypted, otherwise it is possible to get free access to the VOD server by spoofing the server to capture box commands, and then spoofing the box to capture the server responses. It might even be possible to hack the server to grant access to multiple devices in the field, or mount a denial of service attack.

GPS spoofing by Quihoo 360 was demonstrated at DEF CON 23, where signals were recorded and re-broadcast [5]. It’s not the first time GPS spoofing has happened. Spoofing / MoM attacks on any user-connected system are commonplace.

Bonus Extra Tip: Get a Third Party to Break It

This is probably the most useful advice of all. As with software testing in general, engineers shouldn’t rely on marking their own homework: the same blind spots missed in a design will be missed in testing. Engineers designing systems won’t have the same mentality as those trying to hack them. An extra pair of eyes going over the system trying to break it will expose vulnerabilities you never thought existed.

Conclusion

Security is a vast subject and we’ve only scratched the surface in this blog. Feabhas offer a course EL-402 in Secure Linux Programming, for more information click here.

References

  1. Fiat Chrysler Jeep Cherokee hack http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/

  2. Elk Cloner http://www.theregister.co.uk/2012/12/14/first_virus_elk_cloner_creator_interviewed/

  3. Shellshock https://shellshocker.net

  4. Zigbee hack Def Con 23

  5. GPS Spoofing Def Con 23

Posted in General, Industry Analysis, Linux, RTOS, training | Tagged | 1 Comment