You are currently browsing the archives for the C/C++ Programming category.

Templates and polymorphism

July 10th, 2014

Introduction

Template functions and classes tend to cause consternation amongst programmers. The conversation tends to go something like this:

  • I understand the syntax of templates (although it’s ugly)
  • I get the idea of replacing function-like macros with template functions
  • I can see the application of template classes for containers
  • Most containers and generic functions are library code
  • I don’t write libraries
  • What’s the point of me using templates?

In this article we’re going to look at an application of templates beyond writing library code – replacing run-time polymorphism (interfaces) with compile-time polymorphism. This idea is known as a Policy. The idea is reminiscent of the Strategy Pattern, but uses templates rather than interfaces.

Read more »

Template inheritance

June 19th, 2014

Introduction

Previously we looked at template class syntax and semantics. In this article we’ll extend this to look at inheritance of template classes.

Read more »

Template classes

June 12th, 2014

Introduction

Last time we looked at template functions, which introduced the concept of generic programming in C++.

This time let’s extend the idea of generic programming to classes.

Read more »

An introduction to C++ templates

May 29th, 2014

Introduction

Templates are a very powerful – but often very confusing – mechanism within C++. However, approached in stages, templates can be readily understood (despite their heinous syntax).

The aim of this series of articles is to guide beginners through the syntax and semantics of the foundation concepts in C++ template programming.

Read more »

Demystifying C++ lambdas

March 7th, 2014

A new (but not so welcome?) addition

Lambdas are one of the new features added to C++ and seem to cause considerable consternation amongst many programmers. In this article we’ll have a look at the syntax and underlying implementation of lambdas to try and put them into some sort of context.

Read more »

goto fail and embedded C Compilers

February 27th, 2014

I can’t imagine anyone reading this posting hasn’t already read about the Apple “goto fail” bug in SSL. My reaction was one of incredulity; I really couldn’t believe this code could have got into the wild on so many levels.

First we’ve got to consider the testing (or lack thereof) for this codebase. The side effect of the bug was that all SSL certificates passed, even malformed ones. This implies positive testing (i.e. we can demonstrate it works), but no negative testing (i.e. a malformed SSL certificate), or even no dynamic SSL certificate testing at all?

What I haven’t established* is whether the bug came about through code removal (e.g. there was another ‘if’ statement before the second goto) or, due to trial-and-error programming, the extra goto got added (with other code) that then didn’t get removed on a clean-up. There are, of course, some schools of thought that believe it was deliberately put in as part of prism!

Then you have to query regression testing; have they never tested for malformed SSL certificates (I can’t believe that; mind you I didn’t believe Lance was doping!) or did they use a regression-subset for this release which happened to miss this bug? Regression testing vs. product release is always a massive pressure. Automation of regression testing through continuous integration is key, but even so, for very large code bases it is simplistic to say “rerun all tests”; we live in a world of compromises.

Next, if we actually analyse the code then I can imagine the MISRA-C group jumping around saying “look, look, if only they’d followed MISRA-C this couldn’t of happened” (yes Chris, it’s you I’m envisaging) and of course they’re correct. This code breaks a number of MISRA-c:2012 rules, but most notably:

15.6 (Required) The body of an iteration-statement or selection-statement shall be a compound-statement

Which boils down to all if-statements must use a block structure, so the code would go from (ignoring the glaring coding error of two gotos):
Read more »

The hokey-cokey* of function calls

January 20th, 2014

Functions are the lifeblood of a C program. The program flow is altered by passing parameters to functions, which are then manipulated. Conceptually function parameters are defined as being either:

  • Inputs (Read-only) – client-supplied objects manipulated within the function only
  • Outputs (Write-only) – objects generated by the function for use by the client.
  • Input-Outputs (Read-Write) – client objects that can be manipulated by the function.

Defining the use of a parameter gives vital information not only to the implementer, but (perhaps more importantly) to the user of the function, by more-explicitly specifying the ‘contract’ of the function.

Many programming languages (for example, Ada) support these concepts explicitly. C, however, does not. One has to remember that when Kernighan and Ritchie developed C structured programming was very much in its infancy and many of these ideas were still being formulated (also remember that one of the C design goals was parsimony).

Even today, though, these concepts are rarely taught to C programmers and that has often led to clumsy, insecure or even downright dangerous APIs.

If C doesn’t support these concepts explicitly, can we simulate them? The answer is (of course) yes, by using some basic language constructs and forming some idioms.

Let’s look at each parameter type in turn.

Input parameters

C specifies a call-by-value or call-by-copy paradigm. That is, when a C function is called the compiler sets up a call frame that holds copies of the function parameters. Therefore, when you pass parameters by value you are – in effect – creating a parameter for the function to use that in no way affects the caller’s data

image

This is fine for simple types, but what about user-defined types – structs? What’s the problem with passing them by value?

image

Passing a structure by value means allocating enough memory for the parameter and then copying the contents of the original object into the parameter. In many embedded systems, where memory is at a premium, this could easily overflow the stack – at run-time, where its consequences could be difficult to track.

Strictly, to be explicit you should specify the type of the parameter as a const:

image

For simple types this is unlikely to add much value; however it may provide some benefit with structures.

If a parameter is passed as a const struct the compiler has the opportunity to perform a lazy evaluation – it passes the address of the structure instead of making a copy.

image

Note that this optimisation may not be supported by all compilers; or might not occur at all levels of optimisation.

Input-Output parameters

The resolution to the above problem is to explicitly pass a pointer to the structure:

image

This is clearly more efficient than copying the whole structure. OK, the syntax has got a little messier, but we can live with that.

But hang on: do we still have an Input parameter? Actually, no.

What we’ve got here is an input-output parameter. By passing a ‘raw’ pointer the function can manipulate the caller’s object. To fix this we need to prevent manipulation of the pointed-to object:

image

Still not quite there, though. What happens below?

image

Strictly we should make the pointer itself const to prevent (either accidently or maliciously) the function manipulating the caller’s object:

image

This is a very good general rule-of-thumb for functions: make all pointers const

Output parameters

An output parameter is one that the function can write to, but never read (i.e. write-only). In C the only real mechanism we have for that is the function return value.

Most programmers are happy to return simple types from functions but what about the following code?

image

Since C performs pass (and return!) by value this would appear very inefficient:

image

The original object (biiig) is constructed. Then, when makeBigStruct is called space for the return value is allocated. Inside makeBigStruct, temp is allocated. On return temp is copied into the return value then, finally, copied into biiig.

Knowing this, most programmers never return structures from functions; preferring instead to supply them as input-output parameters. However, most modern compilers provide an optimisation which does just this.

Below is the same code but showing the optimisation. Instead of returning the structure the address of the receiving object is (implicitly) passed to the function. At the end of the function the return value is copied into the receiver, negating the need for a temporary return object.

image

In general, then, it is OK to return a struct from a function by value (unless you’re using an ancient C compiler). If you’re not certain (or your compiler doesn’t support this optimisation) it’s probably safer for you to use input-output parameters instead.

Finally, it’s worth noting the small detail that, unlike other languages, a C function can only have one output parameter. You’ll need to use input-output parameters for the rest.

Making the world a better place.

Using these idioms consistently is a very good way to improve the quality of your code. Firstly, it allows the compiler to provide stronger checking on your code. Second, it gives the reader extra information about how to use your functions and what guarantee (or promise) they can expect from them.

You may have noticed I’ve ignored arrays in this article. Check out this blog post for passing arrays to functions.

In summary:

image

 

* Or, hokey-pokey if you prefer.

Shock horror! I learned something about arrays in C

November 28th, 2013

Every so often you pick up a snippet of information that completely changes the way you view things. This week, it’s the use of arrays as function parameters.

At first glance the code horrified me (as I’m sure it will horrify some of you out there!) but as I’ve played with it I can see real merit in the technique.

Arrays, pointers and syntactic sugar

In C there is a close (if somewhat messy!) relationship between arrays and pointers. As far as the C compiler is concerned an array is merely a contiguous sequence of objects (all of the same type). Pointer arithmetic semantics ensure that elements can be accessed as offsets from the array’s base address. The array (‘[]’) notation is syntactic sugar to hide these details from the programmer:

image

Arrays as function parameters

When passing arrays to functions (as parameters) things can get a little confusing. It is not possible to pass an array by-value to a function, so the function process_array() below does not make a copy of the array:

image

The array parameter degrades to a pointer – the address of the first element; so we could (and many C programmers do) just as legitimately write the following and get the same result:

image

In fact, all these declarations for process_array() are semantically identical; the code generated is the same in each case:

image

A word of warning here: as we discussed above the array name yields a constant value that is the address of the first element. In the case of a function parameter, though, we could easily delude ourselves:

image

What looks like an array name is (of course) just a (non-const) pointer; which can be modified, either deliberately or accidently.

Once inside our function, very commonly we wish to know the number of elements in the array. The sizeof() operator yields the amount of memory an object occupies; so for an array this is the number of elements times the size of the element. A simple piece of macro programming can yield the number of elements:

image

However, within our function we may not get the answer we expect:

image

In a 32-bit architecture we’ll always get an answer of 1, irrespective of the actual number of elements in the array!

If we re-write the function to the (semantically-identical, remember!) equivalent and expand the macro:

image

We’re dividing the size of a pointer by the size of an int. Oops.

Because of this it is normal practice to pass an additional parameter, the number of elements in the (supplied) array:

image

Is there any other way?

An alternative (I’m loathe to use the word ‘better’ here) approach is to use the mechanism preferred for all large data types – pass-by-pointer.

The syntax for passing an array by pointer is a little unwieldy due to C’s precedence rules:

image

The function signature declares that process_array() is expecting a pointer to an array of (exactly!) 10 uint32_t objects.

To call the function, you must pass the address of the array (just as you would with a struct):

image

This may cause confusion for some readers – They’re thinking “Hang on! The array name yields the address of the array!  Why isn’t he just calling the function with the name of the array?”.  Remember: the array name yields the address of the first element (and will be, in our case, of type uint32_t*).  We need a pointer to an array (of 10 uint32_t) so we must use the address-of operator (which will yield a pointer of type uint32_t ( * )[10]). 

The array-pointer is strongly typed, so the following code will fail to compile:

image

In case you were wondering, the following code will now work as expected (although, since you are specifying the expected size of the array it is a little redundant):

image

Although unusual for arrays (that is, not used often) this approach has a number of benefits:

  • The function declaration explicitly states the size of the array it is expecting.
  • It allows compile-time type checking
  • It is consistent with passing structs
  • The ARRAY_SIZE macro can be used on the function parameter (correctly)

The above code is actually the preferred mechanism for passing arrays in MISRA-C++ 2008 (Rule 5-2-12) but is not included in the MISRA-C 2012 rules (for some reason).

Casting – what could possibly go wrong?

September 27th, 2013

Type casting in C++ is a form of what is known in computer science as type punning – that is, circumventing the type system of a programming language.

C++ inherits its conversion and casting mechanism from C, but supplements it (although sensibly we should say, replaces it) with four, more explicit cast operations:

  • static_cast
  • reinterpret_cast
  • const_cast
  • dynamic_cast

In C and C++ – and particularly in embedded systems – casting is a necessary evil; so much so that many programmers just accept it as part of everyday programming practice.

So then: why is casting ‘evil’? Basically, because every time you do a type cast you are opening up your program to potentially unpredictable or unexpected behaviour. Let’s have a look at the four type-cast operators and the fun and games they can unleash on the unsuspecting.

 

static_cast<>

The static_cast operator converts between different object types; for example between a double and an int. So what you are effectively saying is

“I’m about to squeeze a big object into a smaller one, so you should probably make sure the receiving object is big enough to hold the values it’s going to get.”

Or:

I’m about to force a floating point number into an integer and all those decimal places (that are probably quite important) are going to be lost”

Of course, you could also be saying:

“I’m about to put the contents of a small object into an object capable of holding much larger values (or with greater precision)”

(which is emphasising a bit of a non-problem, really)

Thankfully, C++ doesn’t let you type-cast between different class types unless you’ve defined your own explicit conversion functions (which – hopefully – should do a sensible conversion). But that’s for another time.

 

reinterpret_cast<>

reinterpret_cast is used in two ways:

  • To convert a pointer-to-type into a pointer-to-different-type
  • To convert an integer type to a pointer type; or vice versa.

When reinterpret_cast appears in code it tells the reader:

“I’m going to take the object address you gave me and treat it as a completely different type, with different memory layout and different behaviour(s). You should make sure it’s capable of supporting what I want to use it for.”

Or, in another usage:

“That (random) number you gave me? I’m going to use it as the address of an object. You’d probably better make sure it’s a valid address, in a reachable region of memory; unless you’re a big fan of segmentation faults.”

 

const_cast<>

The const_cast operator removes the const-ness of an object; that is, it makes a read-only object writeable.

Significantly for embedded programmers, const_cast removes any cv (const-volatile) qualification the original object may have. This means a volatile object – for example, one used to represent a hardware register in an embedded system – can have that qualification removed with const_cast.

Using const_cast says:

“The object you didn’t want me to change? I might (accidently) change it without your consent.”

Or, perhaps in an embedded system:

“The compiler might now optimise away any reads or writes to that object you gave me. Be prepared for behaviour NOT to happen as you expect!”

 

dynamic_cast<>

The dynamic_cast operator is a special case here, in that it is used for ‘safe down-casting’ – that is, casting a pointer-to-base-type to a pointer-to-derived-type, whilst checking whether this is, in fact, a valid cast. dynamic_cast uses Run-Time Type Identification (RTTI) to ensure the types of the pointers are valid. Thus, unlike the other cast operators, dynamic_cast is a run-time check and has associated overheads. If the pointer types are not compatible dynamic_cast returns 0 (for pointers) or throws an exception (for references).

dynamic_cast also has a role in multiple inheritance, where a class has two or more base classes. The dynamic_cast operator allows you to cast a pointer of one base class type to another. Although this is basically a variation on safe down-casting we tend to use the term ‘cross-casting’. Cross-casting is commonly encountered when a class realises (inherits from) two or more interface (pure virtual) classes.

In your code this means:

“I need to access the extended interface of a particular derived type. You’d better be prepared to deal with the consequences of the derived type NOT being what I want.”

Or:

“I need to know if the object you’ve supplied supports some other – possibly completely different – set of characteristics. ”

 

So – don’t use type casts?

Obviously, it’s impracticable (if not impossible) to write code with no type casting; especially in embedded systems. I leave you with the following guidance:

  • Don’t cast if you don’t need to.
  • Think about the consequences of what the cast is (potentially) doing.
  • Leave a big, obvious comment documenting why we’re doing something so potentially dangerous.

The Rule of the Big Five

September 13th, 2013

The dynamic creation and destruction of objects was always one of the bugbears of C. It requires the programmer to manually control the allocation, initialisation and deallocation of memory for the object. Because many C programmers weren’t educated in the potential problems (or were just plain lazy or delinquent in their programming) C got a reputation in some quarters for being an unsafe, memory-leaking language.

C++ improved matters significantly with an idiom known as RAII/RRID; more generically referred to as resource management. Resource management frees the client from having to worry about the lifetime of the managed object, potentially eliminating memory leaks and other problems in C++ code.

However, introducing resource management can lead to potential problems, particularly if the ‘manager’ objects are passed around the system. These problems led to the need for establishing a ‘copy policy’ for each of your types, sometimes referred to as ‘The Rule of the Big Three’. C++11 further complicated this by introducing move semantics.

This whitepaper explores the copy and move semantics of C++ and introduces a policy we call ‘The Rule of The Big Five’.

The whitepaper can be downloaded from here

Example source code for Visual Studio 2012 and GCC can be downloaded from GitHub.

%d bloggers like this: