You are currently browsing the archives for the C/C++ Programming category.

Importing IAR EW 5.4 Projects into Parasoft C++test

November 17th, 2010

Background

Recently I have been experimenting with Parasoft’s C++test tool for static analysis of C and C++ code. As part of this I went through the process of importing an existing C project developed in IAR’s Embedded Workbench toolset. Even though importing a project and checking it against MISRA-C isn’t too taxing, I though I would share my notes for doing this.

Read more »

EMBEDDED PROGRAMMERS’ GUIDE TO THE ARM CORTEX-M ARCHITECTURE

October 13th, 2010

At Embedded Live 2010 I shall be presenting a half-day tutorial entitled “EMBEDDED PROGRAMMERS’ GUIDE TO THE ARM CORTEX-M ARCHITECTURE”.

Feabhas have been training embedded software engineers in languages and architectures for the last 15 years. For the last decade we have been using ARM based target systems for all our programming based courses (C, C++ and testing – ARM7TDMI) and embedded Linux courses (ARM926). However with the development and release of the new generation Cortex micros we are moving our training over to Cortex-M for the languages and Cortex-A for Linux.

As part of this exercise we have to spend lots of time getting to know the Cortex microprocessors in detail, looking at different implementations and various support tools and environments.

The majority of supporting material around the new generation of ARM Cortex-M architectures (M0, M3 & M4), unsurprisingly, focuses heavily on the key hardware specifics of the microcontroller core, with most coding examples being in THUMB2 assembler. However the majority of programming for the Cortex will be in the C programming language (recently a VDC report showed C is still head-and-shoulders above other languages for embedded programming )

Core Features

This class looks at all the really useful features added to the Cortex-M that makes it a truly excellent target environment for the embedded software engineer.  As a simple example many embedded processors do not support integer division in hardware (e.g. ARM7), so division typically handled by an intrinsic library function call or compiler ‘tricks’

The new Cortex-M3 has new signed and unsigned integer division instructions, that can also support modulo operation ( x % y )

There are many other features that I shall cover including unaligned-transfers, bit-banding and the new improved interrupt support architecture (NVIC).

However, there are three other significant supporting technologies that really help the software engineer.

  1. Cortex Microcontroller Software Interface Standard (CMSIS)
  2. Debug Support
  3. RTOS Support

CMSIS

Simply put, CMSIS is a collection of source files (.c, .h and assembler) to create a minimal board support package (BSP) for Cortex-M series processors. Very usefully, it defines a common way to access peripheral registers and define exception vectors. It also defines the register names of the Core Peripherals and the names of the Core Exception Vectors. So, instead of having to spend time and effort defining structs for register definitions for onboard devices (or hoping you development environment has already done this for you) you can be assured that they already exist. For example, the NXP LPC17xx family of microcontrollers support a watchdog timer. Being CMSIS compliant, then the supplied header LPC17xx.h defines the register layout and necessary #defines:

Debug Support

JTAG units, such as the Keil ULINK, have made target programming and source-level debug very affordable. However, for small pin count micros, the 4-wire JTAG is seen as quite expensive option (in terms of pure pin-count). As part of the Cortex-M core is support for a new serial-wire interface. The advantage being that it only requires 2-wires, which makes it very easy and affordable to support debug (and power) over a simple USB connection.

At the other end of the spectrum, ARM have added the option for an Embedded Trace Macro (ETM) unit, which allows features such as debug of events in real-time systems where the target cannot be halted and software profiling and code coverage.

RTOS Support

For someone who has a long background in Real-Time Operating Systems, I was very interested to discover how ARM has made it simpler and easier for an RTOS vendor to support the Cortex-M.  As you can guess CMSIS is a huge step forward, as it means once an RTOS has been ported using CMSIS, the core aspects will work on, say, all Cortex-M3 implementations.

As a simple example, pretty much all RTOS require a time-frame reference (the “tick” timer) for timeouts and delays, etc.  ARM has integrated this directly into the core (called Systick) rather than each silicon vendor having to implement their own count-up or count-down variant. There are already 20+ RTOSs running on the Cortex-M.

Also as, as an optional part of the Cortex-M3/M4 core is a memory-protection unit. An RTOS can make use of this to create a safer multitasking platform without the expense of a full-blow MMU.

Finally, what makes the Cortex-M so attractive from a embedded software engineers perspective is to abundance of low cost evaluation kit, such as mbed, LPCXpresso, STM32 Value line Discovery, Energy Micro Gecko Starter Kit, and Actel’s  SmartFusion to name just a small selection.

I hope to see you at Embedded Live 2010. If so please come and say hello.

Scope and Lifetime of Variables in C

September 27th, 2010

In a previous posting we looked at the principles (and peculiarities) of declarations and definitions. Here I would like to address the concepts of scope and lifetime of variables (program objects to be precise).

In the general case:

  • The placement of the declaration affects scope
  • The placement of the definition affects lifetime

Lifetime

The lifetime of an object is the time in which memory is reserved while the program is executing. There are three object lifetimes:

  • static
  • automatic
  • dynamic

Given the following piece of code:

int global_a;       /* tentative defn; become actual defn init to 0 */
int global_b = 20;     /* defn and implicit-decl */

int f(int* param_c)
{
   int local_d = 10;
   . . .
   return local_d;
}
int main(void)
{
   int *ptr = malloc(sizeof(int)*100);
   ...
   global_a = f(ptr);
   ...
   free(ptr);
}

global_a and global_b are static
The memory allocated by the call to malloc is dynamic
All others (including param_c, ptr and the return value from function f) are automatic.

Static Objects

The memory for static objects is allocated at compile/link time. Their address is fixed by the linker based on the linker control file (LCF).  You may know this file by another name such as linker-script file, linker configuration file or even scatter-loading description file. The LCF file defines the physical memory layout (Flash/SRAM) and placement of the different program regions.

The static region is actually subdivided into two further sections, one for initialised-definitions (int global_ b = 20;)  and one for uninitialized-definitions (int global_a;). So it would not be unexpected for the address of global_a and global_b to not be adjacent to each other in SRAM. The uninitialised-definitions’ section is commonly known as the .bss or ZI section. The initialised-definitions’ section is commonly known as the .data or RW section.
Finally, the initial value of global_a will be zero (0) and 20 for global_b.

Automatic objects

The majority of variables are defined within functions and classed as automatic variables. This also includes parameters and any temporary-returned-object (TRO) from a non-void function, e.g.

int f(int* param_c)  /* tro(int) and parameter(param_c) */
{  
   int local_d = 10; /* local variable */
   . . .
   return local_d;   /* copy local_d to tro */
}

The default model in general programming is that the memory for these program objects is allocated from the stack. For parameters and TRO’s the memory is normally allocated by the calling function (by pushing values onto the stack), whereas for local objects, memory is allocated once the function is called. This key feature enables a function to call itself – recursion (though recursion is generally a bad idea in embedded programming as it may cause stack-overflow problems).
In this model, automatic memory is reclaimed by popping the stack on function exit.

Within a function variables may be localised to a block associated with a control structure, e.g.

for(x = 0; x < N; ++x) {
   int block_y = 0;   /* nested local variable */
   . . .
}

Here the memory is allocated on entry to the block and reclaimed on exit.
However, on most modern microcontrollers, especially 32-bit RISC architectures, automatics are stored in scratch registers, where possible, rather than the stack. For example the ARM Architecture Procedure Call Standard (AAPCS) defines which CPU registers are used for function call arguments into, and results from, a function and local variables.

Importantly, if an automatic is not explicitly initialised, then the initial value is indeterminate (thus garbage) and therefore should never be read before being set. If the automatic is explicitly initialised then the memory is reinitialised on each call of the function.
The location and size of the stack are typically defined using the LCF.
Finally, there still are the (historic) keywords auto and register that can be applied to automatics. Both are pretty much redundant in modern programming.

Dynamic Objects

Strictly speaking (according to the C standard) dynamically allocated objects are also called automatics. However, it is important to differentiate between this type of object and automatics for two reasons:

  1. The memory is allocated from a different memory area (the heap not the stack)
  2. The lifetime is under the control of the programmer rather than the C run-time system.

When calling on malloc, calloc or realloc, these functions return an address (void*) for a block of dynamically allocated memory. The lifetime of this memory is from allocation until the call to either free or realloc the memory.

The realloc function takes an allocated memory block and expands (or contracts) it to a bigger (or smaller) size. This may involve moving the chunk of memory and copying over the old contents. When this is done, the old contents are automatically freed.

The contents of the memory return from malloc are indeterminate; whereas for calloc the memory is initialised to all zeros. If realloc expands the allocated memory area, then the contents of the extra expended area are indeterminate.
The size and location of the heap are also usually defined in the LCF.

Programming errors involving not releasing dynamically allocated memory have been, and still are, a major source of run-time errors (memory leaks). This is why most modern language use garbage collection (which limits their applicability to many real-time embedded applications) and why many coding standards, such as MISRA-C, ban dynamic memory allocation.

Static local variables

Before we leave lifetimes, there is one further anomaly. The keyword static can be applied to a local variable, e.g.

#include <stdio.h>
void f1(void)
{
   static int slocal = 10;        /* static local */
   int alocal = 10;              /* automatic local */
   printf("In f1: slocal = %d, alocal = %d\n", slocal, alocal);
   ++slocal;
}

int main(void)
{
   f1();
   f1();
   f1();
}

Applying static to a local variable changes the objects lifetime from automatic to static. This means that the memory is allocated at compiler/link time and its address in memory is fixed. However, as the memory is static these local variables retain their value from function call to function call. The local static is initialised only the first call of the function. So given the example above, the output is:
In f1: slocal = 10, alocal = 10
In f1: slocal = 11, alocal = 10
In f1: slocal = 12, alocal = 10

Local statics may look useful, however they cause major problems when trying to port code to a multi-task/multi-threading environment, and should generally be avoided where possible.

Scope

The scope of an object is the part of the program where the variable can be accessed (i.e. it is visible). The scope of an object generally falls into one of two general categories:

  • File scope
  • Block scope

As explained in the posting on declarations and definitions, a variable must be declared before it is accessed. Hence the scope of a variable is determined by the placement of its declaration. Returning to the previous example (slightly modified):

int global_a;       /* Decln and Defn */

int f(int* param_c)
{
   int local_d = param_c;       /* automatic local */
   static int local_s = 10;     /* static local    */
   . . .
   local_s = global_a;
   . . .
   return local_d;
}

int main(void)
{
   int *ptr = malloc(sizeof(int)*100);
   ...
   global_a = f(ptr);
   ...
   free(ptr);
}

In the example given, identifier global_a has file scope, whereas all other variables have block scope.

File Scope

Any variable declared with file scope can be accessed by any function defined after the declaration (in our example both f and main can access global_a). If global_a was declared after the function f but before main it would only be accessible within main.

Block Scope

Block scope is defined by the pairing of the curly braces { and } .  A variable declared within a block can only be accessed within that block. For example, local_d has block scope determined by the function-block for f and cannot be accessed outside that function. The variable ptr also has function-block scope limited to the main function. Note also that the local static, local_s, has block scope even though it has static lifetime.
Interestingly the parameter of function f, param_c, is also classed as have block scope. It can be accessed anywhere within the function it is a parameter of. Personally I would prefer to define this as “function” scope, but that would be incorrect according to the standard!

Within a function further localised (inner) scopes can be introduced, e.g.

for(x = 0; x < N; ++x) {
   int block_y = 0;
   . . .
}

Here, block_y is scoped to within the for-loop (i.e. it cannot be accessed in the for-expression region or outside of the for-block).

In a file and/or function we can have overlapping scopes, e.g.

int k = 20;
int main()
{
   int k = 10;
   printf( "In main, k is %d\n", k);
}

The rule is that an inner scope identifier always hides an outer scope identifier. Hence, the block-scoped identifier k hides the file-scoped identifier k (and thus the value displayed will be ten). Note that the file-scoped k is still in scope but is rendered invisible. It is generally bad practice to have variables with overlapping scopes.

Good programming practices limit scope as much as possible. By localising scope the potential for programming errors to creep in are significantly reduced.

Scope of Dynamic Objects

So it can be seen that the general case is that static objects have file scope and automatic objects have function scope. But what about the scope of dynamic objects?
A dynamic object doesn’t actually have scope, as such. In effect, its scope is dictated by the scope of any pointer holding the address of the dynamically allocated memory. As long as the pointer is in scope it can be dereferenced and the memory accessed.

External and Internal Linkage

Before leaving scope there is one final item to address. By default a variable with file scope can be accessed by any function in the whole program (e.g. in other files from where it is defined) as long as it is declared in scope for the function.
If a variable is defined with file scope in one file, but is required in another, then it can be brought into scope using the “extern” storage-class specifier, e.g.

/* file a.c */
int global_a = 10;       /* definition of global_a */

int f(int* param_c)
{
   int local_d = param_c;
   static int local_s = 10;
   . . .
   local_s = global_a;
   . . .
   return local_d;
}

/* file main.c */
extern int global_a;    /* declaration of global_a, now visible */
int f(int*);

int main(void)
{
   int *ptr = malloc(sizeof(int)*100);
   ...
   global_a = f(ptr);  /* global_a is visible so can be accessed */
   ...
   free(ptr);
}

Quite often we have the case where we need a variable with static lifetime, we don’t want it globally accessible (i.e. want to limit its use to functions in the current file), but we don’t want to define it as a local static as it is needed in multiple local functions.
To achieve this we can use the keyword static, but this time to affect scope rather than lifetime. If a file scoped variable is tagged as static then it has, what is called, internal linkage, e.g.

/* file a.c */
int global_a = 10;      /* external linkage – global scope */
static int internal_b;    /* internal linkage – this-file scope  */

int f(int* param_c)
{
   int local_d = param_c;   /* function scope, auto */
   static int local_s = 10; /* function scope, static */
   . . .
   local_s = global_a;
   . . .
   return local_d;
}

If another file tried to declare internal_b as extern, then this would result in a link-time error.
Note that internal linkage can also be applied to functions. All functions have external linkage by default, so it is very good practice to declare a function as static if it is only being used with the current file.

Next time: Why understanding Scope and Lifetime is important to embedded programming

Polymorphism in C++

May 21st, 2010

The term polymorphism is central to most discussions in and around object oriented design and programming. However I find that many people are still confused or don’t have a complete understanding of the advantages and disadvantages of using polymorphism.

I have heard many different simplified definitions of the root term for polymorphism, usually relating to chemistry or biology. Rather than trying to justify the name, I’ll give you my very simplistic definition from a software perspective.  Simply put polymorphism means:

Multiple functions with the same name.

Yep, as simple as that.

Most C programmers don’t realise they have been using polymorphic operations since they started programming. Take, for example, the following code:

b + c

That’s a polymorphic expression. Why?  Well we know nothing about the types of b and c. If b and c are of type int then the code generated is significantly different to if they are double.

But, I hear to shout, what about virtual functions and all that?

So herein lies on of the main problems. When most people use the term polymorphism they are actually referring to Dynamic Polymorphism. The expression b + c is related to Static Polymorphism.

With static polymorphism, the actual code to run (or the function to call) is known at compile time. C++ Overloading is static polymorphic, e.g.

void swap(int* a, int* b);
void swap(double* a, double *b);
int main()
{
   int x = 10, y = 20;
   double a = 1.2, b = 3.4;
   swap(&x, &y);            // swap(int,int)
   swap(&a, &b);            // swap(double, double)
}

Here the compile, based on the number and type of arguments, can determine which function to call.

Dynamic polymorphism, which in C++ is called Overriding, allows us to determine the actual function method to be executed at run-time rather than compile time.

For example, if we are using the  uC/OS-II RTOS and have developed a Mutex class, e.g.

class uCMutex
{
public:
   uCMutex();
   void lock();
   void unlock();
private:
   OS_EVENT* hSem;
   INT8U err;
   // not implemented
   uCMutex( const uCMutex& copyMe );
   uCMutex& operator=( const uCMutex& rhs );
};

And have also implemented  a very simple stack class (note this code is just for explanation purposes and has many shortcomings) that requires mutual exclusion, it may look something along the flowing lines:

class myStack
{
public:
   myStack();
   bool push(int val);
   int pop();
private:
   static const int sz = 10;
   int m_stack[sz];
   unsigned int count;
   uCMutex tm;   // uCMutex Object
};
myStack::myStack(iMutex& m):count(0), tm(m)
{
   memset(m_stack,0,sizeof(m_stack));
}
bool myStack::push(int val)
{
   bool retval = false;
   tm.lock();    // LOCK
   if (count < sz) {
      m_stack[count++] = val;
      retval = true
   }
   tm.unlock();     // UNLOCK
   return retval;
}
int myStack::pop()
{
   int val = -1;
   tm.lock();       // LOCK
   if (count != 0) {
      val = m_stack[--count];
   }
   tm.unlock();     // UNLOCK
   return val;
}

If, then, in our new design we are going to use VxWorks rather than uC/OS-II, our stack class would require reworking, thus:

class VxMutex
{
public:
   VxMutex();
   void lock();
   void unlock();
private:
   …
};
class myStack
{
private:
   ...
   VxMutex tm;
};

Even though the change from the uC/OS-II mutex to the VxWorks mutex class is within the private part of the stack class, this still has many detrimental knock on effects. Significantly, we have changed the stack class’s definition, so all files that use the stack now need recompiling. This, then, has a knock on effect to the amount of regression testing that is required.

An alternative strategy is to use dynamic polymorphism and interfaces to make our code more testable and reusable.  So, by defining an interface class for the mutex abstraction:

class iMutex
{
public:
   iMutex(){}
   virtual ~iMutex(){}
   virtual void lock() = 0;      // pure virtual function
   virtual void unlock() = 0;    // pure virtual function
private:
   // not implemented
   iMutex( const iMutex&);
   iMutex& operator=( const iMutex&);
};

We can alter the stack code so the mutex object is passed in as a constructor parameter (also the mutex classes require changes to inherit from the iMutex interface):

class uCMutex : public iMutex
{
}
class VxMutex : public iMutex
{
}
class myStack
{
public:
   explicit myStack(iMutex& m);
   bool push(int val);
   int pop();
private:
   static const int sz = 10;
   int m_stack[sz];
   unsigned int count;
   iMutex& tm;   // Mutex Reference
};

Our main code now becomes:

uCMutex ucm;
myStack ms(ucm);

or

VxMutex vxm;
myStack ms(vxm);

This is dynamic polymorphism in operation. Depending on the actual object passed (vxm or ucm), the actual code called when

tm.lock();

is executed, will either be VxMutex::lock() or uCMutex::lock().

Dynamic polymorphism is an incredibly powerful construct and, used well, creates code that can easily be adapted in the face of changing requirements with minimal impact.

However it all comes at a cost. The run-time lookup for virtual functions requires additional code and data. Each dynamic polymorphic class requires a virtual table (v-table), and each object of that type a v-table pointer (vtptr). To call the polymorphic function the run-time system requires indexing into the v-table via the object’s vtptr to actually call the function. In certain environments this can be twice as slow as a normal function call.

So how can we get the benefits of dynamic polymorphism, allowing us to abstract the code from how we’re doing it (e.g. VxWork’s lock call) to what we’re doing (Mutex lock call), but not have to extra overhead of virtual functions.

Well we have C++ templates. So modifying the stack class to become:

template <typename mutex_t>
class myStack
{
public:
   myStack();
   bool push(int val);
   int pop();
private:
   static const int sz = 10;
   int m_stack[sz];
   unsigned int count;
   mutex_t tm;   // Mutex Template Instance
};

and main becomes

 myStack<uCMutex> ms;
 ms.push(10);

With template based code we revert back to static polymorphism from dynamic polymorphism, as the actual call to tm.lock() will be compile time resolved at the possible expense of code readability and complexity.

Finally, I have found that the terms for polymorphism have a number of different names, e.g.

Dynamic Polymorphism

  • Subtype polymorphism
  • Overriding
  • Late binding

Static Polymorphism

  • Parametric polymorphism
  • Overloading
  • Early binding

Declarations and Definitions in C

January 18th, 2010

Please Note: This post is focusing on pre-C99. The reason being is that it is aimed at the embedded C programmer who tends to be working with pre-C99 based cross-compilers. Also I have split it into two as it became my larger, due to feedback, than first anticipated.

On the surface declarations and definitions in C are pretty straight-forward; but once we start introducing the concepts of scope, storage-duration, linkage and namespace life is not so simple.

Program Objects (Variables)

Let’s start with a general rule for variables:
  1. if the statement has an “=” it’s a definition?
  2. otherwise, if it has “extern” and no “=” it’s a declaration?
  3. otherwise it’s a tentative-definition that may become a declaration or a actual-definition

Object Definitions

Simply put, a definition allocates storage (memory) e.g.
int ev = 20; /* definition – reserves enough memory to hold an int */
Let’s assume from here-on that an int occupies 32-bits.

Object Declaration

A declaration gives meaning to an identifier; that is, it defines the type information of the identifier. This allows the compiler to generate correct object code to access the variable based its size (i.e. the number of bytes to read or write).

Usage

When compiling a source file, a variable must be declared before it is used or it will result in a compiler error.

int main(void)
{
   ev = 10; /* fails to compile as ev has not been declared */
   return 0;
}
int ev = 20; /* definition – allocates 32-bits */
Importantly, an object declaration does not reserve memory. e.g.
extern int ev; /* declaration – no memory reserved but defines sizeof(ev) */
int main(void)
{
   ev = 10; /* okay to use ev as declared, knows to read (say) 32-bits; k = 20 */
   return 0;
}
 int ev = 20; /* definition – memory reserved here and initialised */
Key point 1:
If no declaration is encountered before the definition, then the definition acts as an implicit declaration.
int ev = 20; /* definition and implicit-declaration: reserves memory */
int main(void)
{
   ev = 10; /* okay to use ev as declared (implicitly) */ 
   return 0;
}
Key point 2:
In a compiled source file there may be only one definition for an identifier, but there may be multiple declarations (as long as they agree).
extern int ev; /* 1st declaration */
extern int ev; /* 2nd declaration */
int main(void) { ev = 10; /* okay to use ev as declared */ return 0; } int ev = 20; /* definition */
In the examples so far, all definitions have included an initialisation and all declarations have used the “extern” keyword. But there is one further concept we need to examine and that is the concept of a tentative definition (this only applies to variables defined outside of functions – more on that later). Take, for example, the following program snippet:

int ev = 20; /* actual definition    */
int td;      /* tentative definition */
int main(void) { ... return 0; }
With a tentative definition, the following rule applies:

If an actual definition is found later in the source file, then the tentative definition just acts as a declaration. If the end of the source file is reached and no actual definition is found, then the tentative definition acts as an actual definition (and implicit declaration) with an initialisation of 0 (zero).

int ev; /* tentative definition becomes declaration */
int td; /* tentative definition become actual definition initialised to 0 */
int main(void)
{
   ...
   return 0;
}

int ev = 20; /* actual definition */
I’d like to address two more syntactical items before we move on. First, It is perfectly legal to write:
 extern int ev = 20; /* actual-definition */
  

I’m sure someone can (and will) tell me why this is useful, but in my 25 years of doing C I’ve never had need to use it. I my view anyone found doing this should be made to sit in the corner wearing a hat with a big ‘D’ on it!

Second, it is highly unusual (so unusual that I’ve never seen it used), but the following is also legal syntax:
 extern int(ev);
 int(ev);
 int(ev) = 20;
Before we start looking at such items as scope and linkage let’s address function declarations and definitions.

Functions

Function declarations and definitions are in many ways simpler than variables. A function definition includes the function’s body. e.g.

void f(int p) /* definition and implicit-declaration */
{
   ...
}

int main(void)
{
   f(10); /* okay to call f as declared */
   return 0;
}

A function’s declaration (typically called its prototype) makes the compiler aware there is a valid function with this identifier. e.g.

void f(int p); /* declaration */

int main(void)
{
   f(10); /* okay to call f as declared */
   return 0;
}

void f(int p) /* definition */
{
   // ...
}
On the call to the function “f” in main, the declaration enables the compiler to construct the correct call frame based on three things:
  1. the validity of the identifier
  2. the storage required to pass any parameters (by stack or register)
  3. the storage required for any return information
At the call, the names of function parameters, if any, are irrelevant (to the compiler), so can be omitted from the declaration, e.g. void f(int); /* declaration */
Also it is not illegal to have parameter names that differ from the declaration and the definition (but obviously very bad practice).

Before we move on, there are two problem areas we need to cover. First, let’s look at the following snippet:

int main()
{
   f(20); /* call f with no declaration */
   return 0;
}

void f(int i) /* definition and implicit-declaration */
{
   // ...
}

Here we are trying to call a function that hasn’t been declared. As probably expected, this code fails to compile, but not for the reason you probably assume. Earlier I stated that an identifier must be declared before being used otherwise you get a compiler error. Unfortunately this only applies to variables and not functions!

With functions, if no declaration is found before its first call, the compiler creates an implicit declaration. As it cannot determine the return type, then it assumes an int return type. So for the call
f(20);
the complier assumes a declaration of
int f();
The compiler error will actually occur at the definition of function “f” due to the implicit-declaration and definition not agreeing (as the definition is void f()). The parts being compared are officially called the function designator. As the two designators don’t match the compiler will generate an error of the form:

error: ‘f’ : redefinition; different basic types

If we change f’s return type to int, then this code will compile quite happily.

int main(void)
{
   f(20); /* call f implicit-designator of int f() */
   return 0;
}

int f(int i) /* definition’s designator matches implicit-designator */
{
   // ...
}
Why int as the return type? This is historical baggage. In the original specification of C by Kernighan & Ritche it states, regarding function return types:
If the return type is omitted, int is assumed.

This baggage is still evident today, as the following code should compile successfully:


int main()
{
   f(20); /* call f implicit-designator of int f() */
   return 0;
}

f(int i) /* definition’s designator has implicit return type of int */
{
   // ...
}
Horrible? Yes (and it’s going to get worse) but all it not lost – any modern compiler worth its salt will issue a warning similar to:

warning: 'f' undefined; assuming extern returning int

Never ignore this warning. Some compilers (such as IAR) allow a non-standard extension requiring function prototypes. Note that C++ also requires prototypes, thus closing this loophole.

Can it get worse? Oh yes, much worse.

There is a very common mistake that C programmers assume that an empty parameter list means the same as void in the parameter list. Unfortunately, in some cases it does and in others it doesn’t.

With a function definition, then empty parameter list is the same as void.

void f()       /* definition and implicit-decln of void f(void) */
{
   // ...
}

int main()
{
   f(20);       /* error as call doesn’t match decln */
   return 0;
}
However (and here it comes) for declarations this isn’t the case.
void f();      /* declaration */
void f(void);  /* prototype-declaration – not the same as above */
If a declaration has a parameter list (including void) then it becomes a prototype-declaration. The empty list in a function declarator specifies that no information about the number or types of the parameters is supplied. This has a horrible implication; take for example the following code:

void f(); /* declaration */
int main(void)
{
f(20); /* okay to call f as declared */
return 0;
}
void f(int i) /* definition */
{
// ...
}
This is perfectly legal C code, which will compile and run quite happily. The standard states that the number and types of arguments are not compared with those of the parameters in a function definition that does not include a function prototype (I know, I know, but please don’t shoot the messenger). Simply put, if there is an empty parameter list the compiler assumes that arguments to the call are correct, e.g.

void f(); /* declaration */
int main(void)
{
f(20); /* okay to call f as declared!!! */
return 0;
}
void f(void) /* definition */
{
// ...
}
So what happens above? Well the standard states that if the number of arguments does not agree with the number of parameters, the behaviour is undefined. In many cases with embedded systems, this actually won’t cause a major problem. Many modern microcontroller architectures (e.g. ARM) arguments are passed in registers. Only once the compiler starts using the stack to pass arguments will problems ensue.

Guideline: For all function always supply a function-prototype.

So hopefully that lays the groundwork of declarations and definitions we can now start addressing the concepts of scope, storage-duration, linkage and namespace.

Afternote:


void f()      /* definition and implicit-decln of void f(void) */
{
   // ...
}

int main(void)
{
   f(20);       /* error a call doesn’t match decln */
   return 0;
}
Microsoft compiler bug – this code should fail to compile. Microsoft compiles, whereas both IAR and Keil fail.

Unscrambling C Declarations

December 9th, 2009
Note: Based on some feedback I should clarify that this does not cover C99 syntax

Even though the C programming language has been around since the late 1960’s, many programmers still have trouble understanding how C declarations are formed. This is not unsurprising due to the complexity that can arise when mixing pointer, array and function-pointer declarations.

In this posting we shall look at some complex declarations to try and understand them by considering how they are formed. The intent is not so you can go off and write wonderfully complex declarations, but more hopefully you may actually be able to understand someone else’s code. Finally we shall look at how most complex declarations can be easily simplified.
Here I’m going to focus on object declarations/definitions rather than functions. Also, in this posting I’m not going to examine structure, union or enumeration specifies. They’ll keep for another day.
How to read a declaration
Very simple ones (specifically those not involving “[]” or “()“) can be read from right-to-left, e.g.
int x
where ‘x’ is an (identifier for an) integer. However, this approach starts to break down very quickly, e.g.
int a[10]
Therefore a more sophisticated approach is needed for complex declarations because of precedence and associativity rules that apply to the differing symbols in the declaration.

Before building a rule-set there are a number of things we can exclude:
  1. A function cannot return a function – () foo()
  2. A function cannot return an array – [] foo ()
  3. An array cannot hold functions – foo[]()
Let’s start with some simple examples:
int x         x is an integer
This can give us:
Rule 1: Read from left to right looking for an identifier.
So ignore types (int, char, etc.), qualifiers (e.g. const, volatile) and the symbols ‘()’,’[]‘ and ‘*’ until you find the first unique identifier. This is the identifier for the declaration.
Building on this, once the identifier is found we look for either array or function notation, e.g.
int a[10]            x is an array of (ten) integers
void x(int y)    x is a function that takes an integer parameter (y) and returns nothing (void)

Rule 2:    look right from the identifier for postfix operators () or []. If [] then it is an array, else if () then it is a function.

Next we introduce pointer notation:
int * x      x is a pointer to an integer

Rule 3:    look left for prefix pointer asterisk ‘*’. If found the identifier is a pointer.

Finally we can introduce type qualifiers (const / volatile), e.g.
const int x     x is an integer constant

Rule 4:    If a const and/or volatile is next to a type specifier (int, long, etc.) it applies to that specifier

So that gives us a preliminary set of 4 rules.
These hold for the following declarations:
int const x      x is a constant integer (This is identical to the previous declaration. This is part of the confusing syntax of the C programming language, but Rule 4 still applies).
const int * x      x is a pointer to a constant integer. Rule 3 followed by Rule 4
int const * x       x is a pointer to a constant integer (as above – still confused?)
int * x[10]          x is an array of pointers to integers ( Rule 2, Rule 3)
int * x(void     x is a function that returns a pointer to an integer (Rule 2, Rule 3)
int **x                 x is a pointer to a pointer to an integer (Rule 3, Rule 3)

So far so good? Pretty straight forward? Maybe not the pointer- to-a-pointer, but we still need to add two further rules. The first affects Rule 4. What if we have a const that is not next to the type? as in:
int * const x
Here we need a new rule, which we’ll call Rule 4b (with our previous Rule 4 becoming 4a):   

Rule 4b: if a const and/or volatile is not next to a type then it applies to the pointer asterisk on its immediate left

int * const x      x is a constant pointer to an integer (this means the pointer address is constant)
Combining 4a and 4b gives us:
int const * const x     x is a constant pointer to a constant integer
We have one final rule required to force precedence. For example we’ve already seen that int * x(void)declares x as a function that returns a pointer to an integer (Rule 2, Rule 3). But what if I wanted to declare a pointer to a function that returns an integer?
The syntax is as follows:
int (*x)(void)    x is a pointer to a function that returns an integer
This gives our final rule, which becomes a new Rule 2 and pushes everything down by one:

Rule 2: If the identifier is within parentheses, then evaluate inside the parentheses first

This rule is required because when we have  *x() then the function parentheses always win. Thus:
void (*x)(int y)     x is a pointer to a function that takes an integer (y) as a parameter and returns void
Rule Summary
  • Rule 1: Read from left to right looking for an identifier
  • Rule 2: If the identifier is within parentheses, then evaluate inside the parentheses first
  • Rule 3:    look right for postfix operators ( ) or [ ]. If [] then it is an array, else if () then it is a function.
  • Rule 4:    look left for prefix pointer asterisk ‘*’. If found the identifier is a pointer.
  • Rule 5a: If a const and/or volatile is next to a type specifier (int, long, etc.) it applies to that specifier
  • Rule 5b: if a const and/or volatile is not next to a type then it applies to the pointer asterisk on its immediate left
Complex Declarations
This core set that should decode C program object declarations. Let’s put it to the test on a couple of horrible declarations. First can you work out:
void (*fpa[10])(int)
Have a go before I break it down…
Okay, let’s decompose this:
Rule 1: From left to right find identifier, this gives us fpa
Rule 2: (*fpa[]) parentheses win, so evaluate inside the parentheses     
Rule 3: fpa[10]  postfix [] wins; fpa is a ten element array ($ now represents fpa[10])
Rule 4:    *$    prefix * wins; fpa is an array of pointers. Now we’ve evaluated inside the parentheses we step outside.
Rule 3: $() postfix, () wins fpa is an array of pointers to functions
Rule 2: void $(int   parentheses; fpa is an array of pointers to functions each taking an integer parameter and returning void
So the identifier fpa represents an array of ten pointers to functions each of which takes an integer as a parameter and returns void. Phew…
Okay one last one to try, go to the C standard library and look at the declarations in <signal.h> and you should see:
 void (*signal(int sig, void(*func)(int)))(int);
If you can decode this then I’m really impressed!

Let’s apply our rule-set to this:
First, as always is rule 1; signal is the identifier. signal is in parentheses, so based on Rule 2 we must evaluate that first. If we match parenthesis then we get:
(*signal(int sig, void(*func)(int)))
Which we can temporarily simplify (by ignoring the function parameters) to:
(*signal())       
Based on Rule 3, then signal is a function that returns a pointer. The question is a pointer to what?  Using the simplification we can work out the return type as:
void (* signal() )(int)
which becomes
void (*$)(int)
which means the function signal returns a pointer to a function that has an integer parameter and returns void.
So let’s return to the parameters, this gives us:
signal(int sig, void(*func)(int))
So signal takes two parameters
style='font-family: "Courier New",Courier,monospace;'>int sig – sig is an integer
void(*func)(int)func is a pointer to a function that has an integer parameter and returns void.
To summarise:
  • signal is a function
  • that returns a pointer to a function that has an integer parameter and returns void
  • and takes two parameters of
  • an integer, and
  • a pointer to a function that has an integer parameter and returns void
It doesn’t get much worse that this (and remember this example comes from the standard library, which is shameful!).
How to avoid complexity in declarations
Avoid by design, as far as possible. If this fails, divide and conquer remembering that typedef is your friend.  A typedef declaration does not introduce a new type, only a synonym for the type specified. For example:
typedef  int  MILES;
MILES  m;   /* m is of type int */
typedef int*  int_ptr;
int_ptr  ip;  /* ip is of type integer pointer int* */
Used well typedef’s makes life easier. For example:
typedef void (*FuncPtr)(int);
FuncPtr is a typedef for a pointer to any function which takes an integer parameter and returns void.
In the “signal” example, both function pointers are of this type, so using the typedef, the declaration
void (*signal(int sig, void(*func)(int)))(int)
becomes
FuncPtr signal(int sig, FuncPtr)
and our previous declaration of:
void (*fpa[10])(int)
becomes
FuncPtr  fpa[10]
After that I need to find a dark room to lie down in.
Decoding Rule-set
Rule 1:  Read from left to right looking for an identifier
Rule 2:  If the identifier is with parentheses, then evaluate inside the parentheses first
Rule 3:   look right for postfix operators ( ) or [ ]. If [] then it is an array, else if () then it is a function.
Rule 4:   look left for prefix pointer asterisk ‘*’. If found the identifier is a pointer.
Rule 5a: If a const and/or volatile is next to a type specifier (int, long, etc.) it applies to that specifier
Rule 5b: if a const and/or volatile is not next to a type then it applies to the pointer asterisk on its immediate left

Also check out http://www.cdecl.org/ (thanks @FrankSansC)