goto fail and embedded C Compilers

I can’t imagine anyone reading this posting hasn’t already read about the Apple “goto fail” bug in SSL. My reaction was one of incredulity; I really couldn’t believe this code could have got into the wild on so many levels.

First we’ve got to consider the testing (or lack thereof) for this codebase. The side effect of the bug was that all SSL certificates passed, even malformed ones. This implies positive testing (i.e. we can demonstrate it works), but no negative testing (i.e. a malformed SSL certificate), or even no dynamic SSL certificate testing at all?

What I haven’t established* is whether the bug came about through code removal (e.g. there was another ‘if’ statement before the second goto) or, due to trial-and-error programming, the extra goto got added (with other code) that then didn’t get removed on a clean-up. There are, of course, some schools of thought that believe it was deliberately put in as part of prism!

Then you have to query regression testing; have they never tested for malformed SSL certificates (I can’t believe that; mind you I didn’t believe Lance was doping!) or did they use a regression-subset for this release which happened to miss this bug? Regression testing vs. product release is always a massive pressure. Automation of regression testing through continuous integration is key, but even so, for very large code bases it is simplistic to say “rerun all tests”; we live in a world of compromises.

Next, if we actually analyse the code then I can imagine the MISRA-C group jumping around saying “look, look, if only they’d followed MISRA-C this couldn’t of happened” (yes Chris, it’s you I’m envisaging) and of course they’re correct. This code breaks a number of MISRA-c:2012 rules, but most notably:

15.6 (Required) The body of an iteration-statement or selection-statement shall be a compound-statement

Which boils down to all if-statements must use a block structure, so the code would go from (ignoring the glaring coding error of two gotos):

       if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
              goto fail;
       if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
              goto fail;
              goto fail;
       if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
              goto fail;

to

       if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) {
              goto fail;
       }
       if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) {
              goto fail;
              goto fail;
       }
       if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) {
              goto fail;
       }

This would then stop the unconditional goto be executed. But would cause a further rule violation:

Rule 2.1 (Required) A project shall not contain unreachable code

Nevertheless, what might surprise you (unless you follow MISRA-C closely) is that the use of the goto statement IS allowed:

Rule 15.2 (Required) The goto statement shall jump to a label declared in the same function.

but discouraged:

Rule 15.1 (Advisory) The goto statement should not be used.

In addition, I would expect all static analysis tools to flag this error; indicating even rudimentary static analysis is not being applied to this codebase.

But that’s not really what struck me when I first saw the code. My reaction was

They must have ignored compiler warnings, as any compiler worth its salt  would warn about unreachable code

Now I do most of my work, here at Feabhas using either the ARM/Keil compiler (armcc) or the IAR Compiler (iccarm) and am very used to seeing this warning as it is common to have infinite loops in multi-tasking code.

Sure enough, give the following code:

int calculate_F(void);
int simpleTest(int p) 
{
   int ret_val = 1;
   if(p & 0x000F)
      goto out;
   if(p & 0x00F0)
      goto out;
      goto out;
   if(p & 0x0F00)
      goto out;
      ret_val = calculate_F();
out:
       return ret_val;
}

As expected, both ARM/Keil and IAR report warnings regarding unreachable code.

goto fail

So how could this happen? To my utter amazement, compiling this with GCC and -Wall (Warnings all) doesn’t report any warnings. Also apparently neither does Clang (the default compiler on OSX) with -Wall, but does if you specify -Wunreachable-code (which isn’t part of all ???).

So what’s my takeaway from this? I’ve always advocated the use of Static Analysis tools as an integral part of the build cycle (i.e. not trying to apply it retrospectively to 100,000’s of lines of code) rather than relying on the compiler to generate appropriate warnings. And if you weren’t convinced before, this is just another, now well documented, example of why you should.

* Please let me know if this hasnow been established

Niall Cooling
Dislike (0)
Website | + posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.

About Niall Cooling

Co-Founder and Director of Feabhas since 1995. Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking. His current interest lie in IoT Security and Agile for Embedded Systems.
This entry was posted in ARM, C/C++ Programming, Industry Analysis, Testing and tagged , , , , , . Bookmark the permalink.

5 Responses to goto fail and embedded C Compilers

  1. Pingback: the Apple “goto fail” bug in SSL | Roman's knowledgebase

  2. Dan says:

    "Please let me know if this has now been established". I've been assured by people who would know that there are some smart people working on understanding exactly what happened.

    That said - this code is crap. The goto (never used one) and the lack of braces (never omit them) to me screams, "I'm not a hardcore, mission-critical C programmer." Sad. Even more sad is what must be an utterly deficient lack of thoroughness in testing, particularly security testings, at Apple.

    But if you'd shown the whole code snippet, the reader would see something that induces even more head-scratching: the code at the "fail:" label executes (and is intended to execute) even in the non-failure case. (It should really be called "done" or "cleanup_and_exit" or whatever).

    I work on safety-critical and life-critical systems. I'm not perfect, but my coding style, my peers' review thoroughness, when combined with our testing methodologies, development processes (including strictest compiler warning settings & static analysis) would have helped prevent this.

    P.S. You can see a condensed version of the code here:

    https://www.wired.com/threatlevel/2014/02/gotofail/

    or the actual code (it's quite a large source file) here:

    https://opensource.apple.com/source/Security/Security-55471/libsecurity_ssl/lib/sslKeyExchange.c

    Like (0)
    Dislike (0)
  3. Nick Glynn says:

    So one thing to note is that in the latest GCC release, -Wunreachable-code was removed (with the option still valid but silently ignored so it doesn't break existing Makefiles).
    The justification was that its output varied so much between releases (depending on what got optimized away) that it wasn't helpful... whether this bit of publicity to it will encourage the opposite.

    Interestingly, clang tends to follow gcc on this.

    Like (0)
    Dislike (0)
  4. Niall says:

    Hi Dan, it would be useful to see the commit-history and watch a "time-lapse" evolution of this code. Looking at it it is clear a number of people have worked on it. In many places the style is quite different (good use of block structure and separating out the evaluation from the if statement).
    I've seen the same style regually in commercial software (by that I mean unregulated; no standard such as 61508 to adhere to) with no SA tool.

    Like (0)
    Dislike (0)
  5. Pingback: Embedded Link Roundup | UpEndian

Leave a Reply