Static and Dynamic Libraries on Linux

April 4th, 2014

A Quickstart Guide

We’re going to look at how to create and use libraries on Linux and try to gain some insight on how libraries work behind the scenes.

Decisions Decisions!

Often when working with 3rd party code you may be limited on the options available. Some well known open-source projects have dual-licensed binaries that dictate different terms for static or dynamic linking.

Writing a library is a good way to provide an interface to customers, get code reuse and can be a major source of headaches!. To understand what’s best for your usecase it’s worth looking at what each type provides.
Static libraries (.a files) are precompiled object code which is linked into other executables at compile time and become part of that final application. These libraries load quickly, have less indirection and don’t run the risk of dependency hell which can beset their dynamic peers.

Library
Read more »

Rapid Application Development with Python

March 11th, 2014

Following on from my previous post on Python and our new course on Python for Test Engineers which takes an elementary approach, I felt it was time to pay homage to that wonderful language once again but this time focusing on its applicability for Rapid Application Development.

The Higher Level the Language; The More Productive the Programmer

I love writing Python. I’ll be honest, it’s the closest I’ll get to writing executable pseudo-code which best mirrors how my mind works and I’m sure I can’t be alone in this. When coding in Python, its dynamic nature means you can sculpt and remodel your code as you express your ideas; this process gives a better understanding of the codes impact as well as providing more accurate estimates for how long a given task will take as there are fewer unknowns in the language itself.

“But Python is for high level applications!” – sorry but the world of embedded is evolving, Embedded Linux is everywhere with its “everything is a file” interface for things that were predominantly the domain of device drivers – I can manipulate framebuffers, SPI and I2C busses and GPIO using Python and even microcontrollers, that last bastion of embedded, are getting in on the action with the recently Kickstarted Micropython which brings Python further into traditional “deeply embedded” territory.

Python as a System Programming Language

An argument I get from other developers is that Python isn’t a real systems programming language – yes, you can script in it but that’s not how “Real coders” write software. Well, when those real Scotsmen, uh, coders take time from etching ones and zeroes directly into their hard disc platters and explain what they mean it typically comes down to a misunderstanding of Pythons capabilities.

“I need threading”

Cool, we’ve got you covered but was that plain threading, process-based threading or maybe just a few recurrent timers?

“Uh, swell, but I also need IPC/Networking”

Sockets, Signals etc? Pick one, pick all of them? Maybe you just need to serialise some data?

“I also need serial comms”

OK, not built in but how about the cross-platform PySerial?

“And what about my existing programs?”

Nice, did you want to extend Python by wrapping your code in a library? Interacting with your program? Or why not embed Python into your existing program?

Python can handle all of those things with aplomb and also affords you ready-to-roll web services, file handling, parsing of data in XML/JSON form or plain text using regular expressions. Maybe you want to interact with the latest pluggable devices using PyUSB and you’ll lose track of the many ways to show a Graphical User Interface that it’s more akin to a beauty pageant.

As we move forward, multimedia is increasingly a key part in product development, many devices need to push and manipulate video and sound and once again, Python provides a huge leg up whether via simple graphics manipulation using Pillow, image processing using scikit-image or even live and post-processing of video using gstreamer or MoviePy respectively.

Mmm, a delicious cinemagraph that’s generatable in ~3 lines of Python with MoviePy

Reducing Time to Market

Time to market is the phrase that pays in the world of product development – it defines our coding deadlines, the features we can ship with and the very nature of our products and we’re always under pressure to reduce it.
Research indicates [1] that designing and writing a program in Python (or similar scripting language) typically takes half as long and requires half as much code as an equivalent C or C++ program.

Given that the number of lines we, as programmers, can write in a given time is constant and not dependent on the language or its type we can infer that with a higher level language such as Python, we get a much higher instructions-per-line metric and developers only have to spend half as long coding as our C++ wielding brethren. I can be more productive than if I was using C++ safe in the knowledge that the same task will require less code in Python.

This is how we get our minimum viable product, this is how we crush our time to market – we code smarter, not harder.

Of Data Types and Paradigms

Data types are important as they define the information we can hold in a variable. Built-in types are a godsend in Python – we have lists, dictionaries, sets and many more and we don’t need to track down and incorporate 3rd party libraries to provide neither them, nor the functions to manipulate them.

This ability to develop as you prototype means that you can develop supporting routines as you go and so to do the types of data you will need to handle which means you can pivot and rearchitect as you develop your software.

Python is a multi-paradigm language (urgh, sorry!) that allows us as programmers to use procedural, object-oriented and functional where it makes sense without shoe-horning one into the other.

One thing that has been on the increase is that Python is entering into areas traditionally aligned with domain specific tools such as Matlab and similar but as the power of Python is becoming apparent, modelling in one language and porting it to something like C is losing out to the decision to model, develop and code in just one language.

Add-on Libraries such as NumPy, SciPy and many others allow a combined model of modelling and deployment that affords huge benefits in a multi-disciplinary team as everyone speaks the same language: Python.

Good enough is good enough

Python isn’t just a prototyping language – you don’t need to throw away any of your code, companies are readily using Python for their deployed products as they evolve from RAD ideas to fully-fledged, production ready programs.

Python may not be everyone’s cup of tea and it has its limitations – it’s never going to offer the performance per watt you’d get from hand-coded assembly – but more often than not it’s good enough to get to market and, as companies are finding out, it’s good enough to keep you in the market.

Inquire today about how we can help you kick start your development process with Python.

[1] http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprtTR.pdf

Demystifying C++ lambdas

March 7th, 2014

A new (but not so welcome?) addition

Lambdas are one of the new features added to C++ and seem to cause considerable consternation amongst many programmers. In this article we’ll have a look at the syntax and underlying implementation of lambdas to try and put them into some sort of context.

Read more »

goto fail and embedded C Compilers

February 27th, 2014

I can’t imagine anyone reading this posting hasn’t already read about the Apple “goto fail” bug in SSL. My reaction was one of incredulity; I really couldn’t believe this code could have got into the wild on so many levels.

First we’ve got to consider the testing (or lack thereof) for this codebase. The side effect of the bug was that all SSL certificates passed, even malformed ones. This implies positive testing (i.e. we can demonstrate it works), but no negative testing (i.e. a malformed SSL certificate), or even no dynamic SSL certificate testing at all?

What I haven’t established* is whether the bug came about through code removal (e.g. there was another ‘if’ statement before the second goto) or, due to trial-and-error programming, the extra goto got added (with other code) that then didn’t get removed on a clean-up. There are, of course, some schools of thought that believe it was deliberately put in as part of prism!

Then you have to query regression testing; have they never tested for malformed SSL certificates (I can’t believe that; mind you I didn’t believe Lance was doping!) or did they use a regression-subset for this release which happened to miss this bug? Regression testing vs. product release is always a massive pressure. Automation of regression testing through continuous integration is key, but even so, for very large code bases it is simplistic to say “rerun all tests”; we live in a world of compromises.

Next, if we actually analyse the code then I can imagine the MISRA-C group jumping around saying “look, look, if only they’d followed MISRA-C this couldn’t of happened” (yes Chris, it’s you I’m envisaging) and of course they’re correct. This code breaks a number of MISRA-c:2012 rules, but most notably:

15.6 (Required) The body of an iteration-statement or selection-statement shall be a compound-statement

Which boils down to all if-statements must use a block structure, so the code would go from (ignoring the glaring coding error of two gotos):
Read more »

The Top 5 Things I’ve Learnt about Git

January 31st, 2014

During the last couple of years, internally we’ve moved over to using Git as our Revision Control Old_Git_Wit___RGB___small_2System (RCS). It’s been an interesting exercise, especially where, like me, you’ve come from a traditional model (such as subversion or even back to good old SCCS). I’m sure you’ve all got your own “top 5” and I don’t necessarily expect you to agree with me, but here’s my key learning points:

#1 “Branch always, branch often”
At the outset this was certainly the biggest mindset change*. In many RCSs, branching is particularly “heavyweight” and merging branches (and branches of branches, and branches… you get the gist) can be, to say the least, challenging. So you could end up being counter-productive by trying to avoid branching unless absolutely necessary.

It couldn’t be more different using Git, and once I’d “got it”, then it is an instinct to always work on a branch. I guess it all fits in with the aim to keep work cycles too much shorter timeframes, so branching and merging must be trivial. There is an excellent tutorial to help learn branching on Git.

#2 “Don’t ignore your .gitignore
You know you’re going to need it, so do it before you start any real work. It’ll just save you a whole load of tidying up later on (ok so maybe this is an age thing? don’t procrastinate!). There are even a language specific .gitignore templates at https://github.com/github/gitignore, so no excuses.

It’s also worth getting to know the basics of the pattern matching that can be used in the .gitignore file, they’re well documented.

#3 “GitHub is not Git”
Maybe it’s my onset of senility, but initially it wasn’t exactly clear where Git ended and GitHub started. Most of the early stuff I looked at to help understand and learn Git almost always used GitHub to host your project. In hindsight it’s all obvious (well of course it’s just a web-based hosting service for software development projects that use Git), but at the time I can say my understanding was a little “fuzzy” around the edges.

Two things really helped; first the excellent book Pro Git written by Scott Chacon which you can find online at http://git-scm.com/book. I found it so useful to I also bought the eBook (Kindle) version.

Second, moving to BitBucket away from GitHub. In fact we use both, GitHub for our publicly hosted projects and BitBucket for internal work. This just made it plainly obvious where Git ended and GitHub/BitBucket started.

#4 “Quickly learn how to undo”
There are many ways to mess it up, if you really try. Committing without adding, modifying files without branching (see #1), committing unwanted files (see #2), not pushing changes (see #3) and the list goes on (and I’ve done them all).

Be assured, you can always fix it, from a simple commit-amend to, probably one of the coolest things, being able to create a branch from a stash. But I’d highly recommend spending a bit of time understanding unstaging as opposed to unmodifying a file.

Using Git is very much about understanding workflow and typically messing up means you’ve not quite got your workflow model running smoothly. The nice folks at Atlassian have put together a set of guides on workflow.

#5 “Initially do all your work on the command line”
For Linux people (and hopefully most programmers on a Mac), this doesn’t need saying as why would you do it any other way? However for all those Windows-based people (and in the embedded world this is the majority) then there is a tendency to use graphical tools. Both GitHub and BitBucket offer graphical frontends to managing Git local/remote repo’s but without understanding the underlying Git model (which you only really get from using it at the command line) then it’s easy to get lost.

I didn’t find the GitHub Windows client tool especially friendly, whereas SourceTree from Alassian I found a more intuitive client. Nevertheless I’d still rather work at the shell.

So that’s my top 5; What have I missed (remember you’ve got to drop one to add one)?

What would I do differently? certainly I’d read “Pro Git” cover-to-cover much sooner than I did. Otherwise, as usual, just throw yourself in and have a go.

Now what was that someone said about Mercurial being better…

* Actually the initial challenge was getting used to say “Git” outside of its usual context in the UK.

The hokey-cokey* of function calls

January 20th, 2014

Functions are the lifeblood of a C program. The program flow is altered by passing parameters to functions, which are then manipulated. Conceptually function parameters are defined as being either:

  • Inputs (Read-only) – client-supplied objects manipulated within the function only
  • Outputs (Write-only) – objects generated by the function for use by the client.
  • Input-Outputs (Read-Write) – client objects that can be manipulated by the function.

Defining the use of a parameter gives vital information not only to the implementer, but (perhaps more importantly) to the user of the function, by more-explicitly specifying the ‘contract’ of the function.

Many programming languages (for example, Ada) support these concepts explicitly. C, however, does not. One has to remember that when Kernighan and Ritchie developed C structured programming was very much in its infancy and many of these ideas were still being formulated (also remember that one of the C design goals was parsimony).

Even today, though, these concepts are rarely taught to C programmers and that has often led to clumsy, insecure or even downright dangerous APIs.

If C doesn’t support these concepts explicitly, can we simulate them? The answer is (of course) yes, by using some basic language constructs and forming some idioms.

Let’s look at each parameter type in turn.

Input parameters

C specifies a call-by-value or call-by-copy paradigm. That is, when a C function is called the compiler sets up a call frame that holds copies of the function parameters. Therefore, when you pass parameters by value you are – in effect – creating a parameter for the function to use that in no way affects the caller’s data

image

This is fine for simple types, but what about user-defined types – structs? What’s the problem with passing them by value?

image

Passing a structure by value means allocating enough memory for the parameter and then copying the contents of the original object into the parameter. In many embedded systems, where memory is at a premium, this could easily overflow the stack – at run-time, where its consequences could be difficult to track.

Strictly, to be explicit you should specify the type of the parameter as a const:

image

For simple types this is unlikely to add much value; however it may provide some benefit with structures.

If a parameter is passed as a const struct the compiler has the opportunity to perform a lazy evaluation – it passes the address of the structure instead of making a copy.

image

Note that this optimisation may not be supported by all compilers; or might not occur at all levels of optimisation.

Input-Output parameters

The resolution to the above problem is to explicitly pass a pointer to the structure:

image

This is clearly more efficient than copying the whole structure. OK, the syntax has got a little messier, but we can live with that.

But hang on: do we still have an Input parameter? Actually, no.

What we’ve got here is an input-output parameter. By passing a ‘raw’ pointer the function can manipulate the caller’s object. To fix this we need to prevent manipulation of the pointed-to object:

image

Still not quite there, though. What happens below?

image

Strictly we should make the pointer itself const to prevent (either accidently or maliciously) the function manipulating the caller’s object:

image

This is a very good general rule-of-thumb for functions: make all pointers const

Output parameters

An output parameter is one that the function can write to, but never read (i.e. write-only). In C the only real mechanism we have for that is the function return value.

Most programmers are happy to return simple types from functions but what about the following code?

image

Since C performs pass (and return!) by value this would appear very inefficient:

image

The original object (biiig) is constructed. Then, when makeBigStruct is called space for the return value is allocated. Inside makeBigStruct, temp is allocated. On return temp is copied into the return value then, finally, copied into biiig.

Knowing this, most programmers never return structures from functions; preferring instead to supply them as input-output parameters. However, most modern compilers provide an optimisation which does just this.

Below is the same code but showing the optimisation. Instead of returning the structure the address of the receiving object is (implicitly) passed to the function. At the end of the function the return value is copied into the receiver, negating the need for a temporary return object.

image

In general, then, it is OK to return a struct from a function by value (unless you’re using an ancient C compiler). If you’re not certain (or your compiler doesn’t support this optimisation) it’s probably safer for you to use input-output parameters instead.

Finally, it’s worth noting the small detail that, unlike other languages, a C function can only have one output parameter. You’ll need to use input-output parameters for the rest.

Making the world a better place.

Using these idioms consistently is a very good way to improve the quality of your code. Firstly, it allows the compiler to provide stronger checking on your code. Second, it gives the reader extra information about how to use your functions and what guarantee (or promise) they can expect from them.

You may have noticed I’ve ignored arrays in this article. Check out this blog post for passing arrays to functions.

In summary:

image

 

* Or, hokey-pokey if you prefer.

Shock horror! I learned something about arrays in C

November 28th, 2013

Every so often you pick up a snippet of information that completely changes the way you view things. This week, it’s the use of arrays as function parameters.

At first glance the code horrified me (as I’m sure it will horrify some of you out there!) but as I’ve played with it I can see real merit in the technique.

Arrays, pointers and syntactic sugar

In C there is a close (if somewhat messy!) relationship between arrays and pointers. As far as the C compiler is concerned an array is merely a contiguous sequence of objects (all of the same type). Pointer arithmetic semantics ensure that elements can be accessed as offsets from the array’s base address. The array (‘[]’) notation is syntactic sugar to hide these details from the programmer:

image

Arrays as function parameters

When passing arrays to functions (as parameters) things can get a little confusing. It is not possible to pass an array by-value to a function, so the function process_array() below does not make a copy of the array:

image

The array parameter degrades to a pointer – the address of the first element; so we could (and many C programmers do) just as legitimately write the following and get the same result:

image

In fact, all these declarations for process_array() are semantically identical; the code generated is the same in each case:

image

A word of warning here: as we discussed above the array name yields a constant value that is the address of the first element. In the case of a function parameter, though, we could easily delude ourselves:

image

What looks like an array name is (of course) just a (non-const) pointer; which can be modified, either deliberately or accidently.

Once inside our function, very commonly we wish to know the number of elements in the array. The sizeof() operator yields the amount of memory an object occupies; so for an array this is the number of elements times the size of the element. A simple piece of macro programming can yield the number of elements:

image

However, within our function we may not get the answer we expect:

image

In a 32-bit architecture we’ll always get an answer of 1, irrespective of the actual number of elements in the array!

If we re-write the function to the (semantically-identical, remember!) equivalent and expand the macro:

image

We’re dividing the size of a pointer by the size of an int. Oops.

Because of this it is normal practice to pass an additional parameter, the number of elements in the (supplied) array:

image

Is there any other way?

An alternative (I’m loathe to use the word ‘better’ here) approach is to use the mechanism preferred for all large data types – pass-by-pointer.

The syntax for passing an array by pointer is a little unwieldy due to C’s precedence rules:

image

The function signature declares that process_array() is expecting a pointer to an array of (exactly!) 10 uint32_t objects.

To call the function, you must pass the address of the array (just as you would with a struct):

image

This may cause confusion for some readers – They’re thinking “Hang on! The array name yields the address of the array!  Why isn’t he just calling the function with the name of the array?”.  Remember: the array name yields the address of the first element (and will be, in our case, of type uint32_t*).  We need a pointer to an array (of 10 uint32_t) so we must use the address-of operator (which will yield a pointer of type uint32_t ( * )[10]). 

The array-pointer is strongly typed, so the following code will fail to compile:

image

In case you were wondering, the following code will now work as expected (although, since you are specifying the expected size of the array it is a little redundant):

image

Although unusual for arrays (that is, not used often) this approach has a number of benefits:

  • The function declaration explicitly states the size of the array it is expecting.
  • It allows compile-time type checking
  • It is consistent with passing structs
  • The ARRAY_SIZE macro can be used on the function parameter (correctly)

The above code is actually the preferred mechanism for passing arrays in MISRA-C++ 2008 (Rule 5-2-12) but is not included in the MISRA-C 2012 rules (for some reason).

Casting – what could possibly go wrong?

September 27th, 2013

Type casting in C++ is a form of what is known in computer science as type punning – that is, circumventing the type system of a programming language.

C++ inherits its conversion and casting mechanism from C, but supplements it (although sensibly we should say, replaces it) with four, more explicit cast operations:

  • static_cast
  • reinterpret_cast
  • const_cast
  • dynamic_cast

In C and C++ – and particularly in embedded systems – casting is a necessary evil; so much so that many programmers just accept it as part of everyday programming practice.

So then: why is casting ‘evil’? Basically, because every time you do a type cast you are opening up your program to potentially unpredictable or unexpected behaviour. Let’s have a look at the four type-cast operators and the fun and games they can unleash on the unsuspecting.

 

static_cast<>

The static_cast operator converts between different object types; for example between a double and an int. So what you are effectively saying is

“I’m about to squeeze a big object into a smaller one, so you should probably make sure the receiving object is big enough to hold the values it’s going to get.”

Or:

I’m about to force a floating point number into an integer and all those decimal places (that are probably quite important) are going to be lost”

Of course, you could also be saying:

“I’m about to put the contents of a small object into an object capable of holding much larger values (or with greater precision)”

(which is emphasising a bit of a non-problem, really)

Thankfully, C++ doesn’t let you type-cast between different class types unless you’ve defined your own explicit conversion functions (which – hopefully – should do a sensible conversion). But that’s for another time.

 

reinterpret_cast<>

reinterpret_cast is used in two ways:

  • To convert a pointer-to-type into a pointer-to-different-type
  • To convert an integer type to a pointer type; or vice versa.

When reinterpret_cast appears in code it tells the reader:

“I’m going to take the object address you gave me and treat it as a completely different type, with different memory layout and different behaviour(s). You should make sure it’s capable of supporting what I want to use it for.”

Or, in another usage:

“That (random) number you gave me? I’m going to use it as the address of an object. You’d probably better make sure it’s a valid address, in a reachable region of memory; unless you’re a big fan of segmentation faults.”

 

const_cast<>

The const_cast operator removes the const-ness of an object; that is, it makes a read-only object writeable.

Significantly for embedded programmers, const_cast removes any cv (const-volatile) qualification the original object may have. This means a volatile object – for example, one used to represent a hardware register in an embedded system – can have that qualification removed with const_cast.

Using const_cast says:

“The object you didn’t want me to change? I might (accidently) change it without your consent.”

Or, perhaps in an embedded system:

“The compiler might now optimise away any reads or writes to that object you gave me. Be prepared for behaviour NOT to happen as you expect!”

 

dynamic_cast<>

The dynamic_cast operator is a special case here, in that it is used for ‘safe down-casting’ – that is, casting a pointer-to-base-type to a pointer-to-derived-type, whilst checking whether this is, in fact, a valid cast. dynamic_cast uses Run-Time Type Identification (RTTI) to ensure the types of the pointers are valid. Thus, unlike the other cast operators, dynamic_cast is a run-time check and has associated overheads. If the pointer types are not compatible dynamic_cast returns 0 (for pointers) or throws an exception (for references).

dynamic_cast also has a role in multiple inheritance, where a class has two or more base classes. The dynamic_cast operator allows you to cast a pointer of one base class type to another. Although this is basically a variation on safe down-casting we tend to use the term ‘cross-casting’. Cross-casting is commonly encountered when a class realises (inherits from) two or more interface (pure virtual) classes.

In your code this means:

“I need to access the extended interface of a particular derived type. You’d better be prepared to deal with the consequences of the derived type NOT being what I want.”

Or:

“I need to know if the object you’ve supplied supports some other – possibly completely different – set of characteristics. ”

 

So – don’t use type casts?

Obviously, it’s impracticable (if not impossible) to write code with no type casting; especially in embedded systems. I leave you with the following guidance:

  • Don’t cast if you don’t need to.
  • Think about the consequences of what the cast is (potentially) doing.
  • Leave a big, obvious comment documenting why we’re doing something so potentially dangerous.

The Rule of the Big Five

September 13th, 2013

The dynamic creation and destruction of objects was always one of the bugbears of C. It requires the programmer to manually control the allocation, initialisation and deallocation of memory for the object. Because many C programmers weren’t educated in the potential problems (or were just plain lazy or delinquent in their programming) C got a reputation in some quarters for being an unsafe, memory-leaking language.

C++ improved matters significantly with an idiom known as RAII/RRID; more generically referred to as resource management. Resource management frees the client from having to worry about the lifetime of the managed object, potentially eliminating memory leaks and other problems in C++ code.

However, introducing resource management can lead to potential problems, particularly if the ‘manager’ objects are passed around the system. These problems led to the need for establishing a ‘copy policy’ for each of your types, sometimes referred to as ‘The Rule of the Big Three’. C++11 further complicated this by introducing move semantics.

This whitepaper explores the copy and move semantics of C++ and introduces a policy we call ‘The Rule of The Big Five’.

The whitepaper can be downloaded from here

Example source code for Visual Studio 2012 and GCC can be downloaded from GitHub.

UK based One-day ARM User Conference (and it’s free!)

September 11th, 2013

For those of you that are not on our company hit list, sorry I mean mailing list, then you may not have heard about next week’s ARM User Conference run by the good folks at Hitex UK.

The event is titled “ARM – Continually Raising the Standard” and is being held at Stoneleigh Park near Coventry on the 19th September 2013. This year there are two streams running to allow a wider choice of presentation.

Button

The event is also preceded by a number of (paid) workshops on the 17th  and 18th.

I shall be presenting the paper on “Developing a Generic Hard-fault Handler for the ARMv7–M Architecture”. Feabhas shall also have a table-top there so if you’re attending please stop by and say hello.

Full details of the event can be found here

%d bloggers like this: