Vulnerabilities in C : When integers go bad!

Insecure C?

weak_linkWe are at the dawn of a new era of connected embedded devices, broadly being marketed as the “Internet of Things” (IoT). The majority of these systems are likely to be programmed using C/C++. To date, much of the embedded world has been connected to propriety networks, however with the gold rush in to IoT we are not going to be able to rely on “Security through Obscurity“. This is the first in a series of articles looking at some of the vulnerabilities at the programming language level.

This and many other issues are covered in the Feabhas Training course Secure Linux Programming

Integral data types in C

Due, mainly to history, the integer types in C can be a little confusing, but for simplicity and brevity I’ll consider the core integral types to be:

  • char
  • short
  • int
  • long
  • long long

In reality, of course, a short is a short int, but for this discussion I’ll keep to the generally accepted model of referencing them as they’re shown above.

Next we can apply signness to the types:

  • unsigned
  • signed

Again for simplicity I’m going to assume that a signed int is using 2’s compliment representation. Even though the standard allows for “Sign and Magnitude” and “1’s compliment” I don’t know any (mainstream) modern compiler not using 2’s compliment[1].

Next we have to look at the underlying data models. The actual sizes of the data types are implementation defined in <limits.h>, but the implementation values must be greater than or equal to:

  • A char is a minimum of 8 bits
  • A short is a minimum of 16 bits
  • An int is a minimum of 16 bits
  • A long is a minimum of 32 bits
  • A long long is a minimum of 64 bits

Note the emphasis on the word “minimum”. However, it is also accepted that plain int’s “have the natural size suggested by the architecture of the execution environment”; thus on a 16-bit architecture a plain int would most likely be 16-bits, whereas on a 32-bit architecture they would be 32-bits.

For the remainder of this discussion I will base my examples around a “ILP32LL” architecture, meaning that the int, long and pointer are 32-bits, char is 8, short is 16 and long long is 64 (e.g. commonly found on ARMv7 architecture).

Ideally, to help reduce some of this confusion we should be using the C99 platform independent types from <stdint.h> and <inttypes.h>, but for now I’ll still reference the base types.

What are the potential underlying problems?

The problems with integers occur in a number of ways, significantly:

  • Overflow
  • Underflow
  • Promotion/extension
  • Demotion/narrowing
  • Sign conversion

with the behaviour of each issue being dependent of the underlying types

A char is neither signed or unsigned

Unlike int a char is not signed by default; there are actually three different char types, char, signed char and unsigned char. A char should only be used to store ASCII characters (0..127) and should never be used as a small integer. For a particular compiler, the char will use either an underlying signed or unsigned representation, but we should never build programs based on char’s being integers. Also, basing code on particular compiler flags, such as gcc’s flag -funsigned-char is poor practice as the root issue os not being addressed.

Overflow

As each type has a fixed size number of bits then there is a range of valid numbers for each type. There are macro’s defined in <limits.h> for these values.

For example, with the ARM compiler[2] the limits are

  • #define SHRT_MAX  0x7fff  /* maximum value for an object of type short int */
  • #define USHRT_MAX 65535   /* maximum value for an object of type unsigned short int */

Overflow occurs when we require a number that exceeds these limits. For example the simple expression of:

a + b

may cause overflow.

So what happens if we overflow?

Unsigned Overflow

For unsigned the results is well defined; there is no concept of overflow[3] as the result is reduced to modulo the number by Utype_MAX+1, resulting in conceptual wraparound (Utype_MAX + 1 = 0).

For example:

 will always display the output of:

Result is 65000 + 540 = 4

Examining this using hex/binary representation the result becomes even more apparent:

65000 => 0xfde8 => b’1111 1101 1110 1000
  540 => 0x021c => b’0000 0010 0001 1100
                 b’1 0000 0000 0000 0100

So, of course, the actual result is 65540, but using an unsigned short and applying modulo-arithmetic the final result is 4.

Signed Overflow

Unfortunately signed number overflow is not as well defined, in fact the result is implementation-defined or an implementation-defined signal is raised.[4]

However, on a modern ILP32LL machine using 2’s complement representation then the result appears a predictable one. Given

  • SHRT_MAX   => 0x7fff
  • 0x7fff + 1 => 0x8000
  • 0x8000     => -32768

As you’d expect, due to speed of computation, then basic integer maths is performed and the result is interpreted using 2’s comp representation.

Underflow

The results for signed and unsigned integer underflow follow the overflow model. Unsigned will wrap around from 0 to Utype_MAX and signed will (likely) go from type_MIN to type_MAX, e.g.

will, most likely, result in the not unexpected output of:

65535 32767

Type Promotion/Extension

Type promotion occurs when we convert from a small sized integer to a larger one, e.g. from short to int.

As you may guess, there are no issues with type promotion as the large size integer can hold a superset of values of the smaller type.

Importantly, negative numbers are correctly promoted, e.g.

will result in the output:

-32768 -32768
8000 ffff8000

Type Demotion/Narrowing

Keeping with the spirit of C, narrowing follows the idiom of “Make it fast, even if it is not guaranteed to be portable.”

The simplest way of performing type narrowing is through truncating the bits to the target type’s size, e.g. going from int to short will result in the bottom 16-bits of the 32-bit int being copied to the short.

For unsigned numbers, this may result is a loss of information (i.e. large numbers being truncated to small numbers). For signed numbers, narrowing can result in unexpected change of signness; as show in the following example. Given:

results in the following output:

4294934415 32655  143
  ffff7f8f  7f8f   8f
    -32881 32655 -113    <=Note the change in sign for short<
  ffff7f8f  7f8f   8f

As you can see, for both signed and unsigned numbers, narrowing is achieved through simple truncation. Note, however, that the narrowing from int to short, in this case, has resulted in a change of sign.

Sign conversion

This is where a signed integer is converted to an unsigned number or vice versa. Again, for performance reasons, conversion is most commonly achieved by simply reinterpreting the bit pattern in the context of the target objects’ type:

  • If the most-significant-bit (MSB) is a zero (0) then there are no issues with the conversion in either direction.
  • If, however, the MSB is a 1 then a change in sign and value will occur.

Executing the following program:

results in:

 32896 -32640
  8080   8080

Arithmetic Conversion/Promotion

So far we have mostly focused on types of the same size (e.g. short and unsigned short), but if we have arithmetic or logic operations a pattern called the usual arithmetic conversions [5]are applied.

This means, that for arithmetic and logic operations, integer types shorter than an int are promoted to an int for the operation. The promotions can sometimes lead to unexpected consequences, such as signed values being interpreted as unsigned and vice versa. A good example of the unexpected is shown by running the following program:

 Gives the following output:

0 != 0

 Huh?

It is reasonable to expect the results of ~0xff to be 0x00, however due to promotion, if we change the printf statements to:

We now get the output:

ffffff00 != 00000000

 As uc1 has been promoted to the unsigned integer 0x000000ff, when complimented it results in 0xffffff00, as shown and thus not equal to zero.

INT_MIN

There is one other anomaly to be aware of based around INT_MIN. When using 2’s compliment the number range of an integer is not symmetrical, i.e. the range is:

  • -2147483648..2147483647

All negative values, apart from INT_MIN, have a positive representation. Unfortunately we cannot represent -2147483648 as a positive signed number. This leads to the strange behaviour that the absolute of INT_MIN and -INT_MIN both are likely to yield INT_MIN[6]. For example:

When run outputs:

-2147483648 -2147483648 -2147483648

Bitshifting << and >>

Bitshifting is often used as replacements for fast multiplication and division. For example

When executed gives:

16 * 16 = 256
16 << 4 = 256

Left and right shifting unsigned numbers is pretty safe as, in both cases, it will zero fill. There are two cases that are unsafe (and both will typically generate compiler warnings):

  • shifting by a negative amount, e.g. i << -4
  • shifting by > 31, e.g. i << 31

In both cases the result is undefined.

Shifting signed numbers has greater problems.

  • When left shifting the number has the potential to change from negative to positive and vice versa.  
  • When right shifting, and the original number is negative, the standard does not define whether the shift is arithmetic or logical (i.e. will it preserve the sign or not).

The default right-shifting model, however, is sign preserving, i.e. INT_MIN  >> 1 will behave as INT_MIN / 2.

size_t and ptrdiff_t

Finally, before looking at exploits we have two further types from <stddef.h> of interest:

  • size_t – an unsigned integer type of the result of the sizeof operator
  • ptrdiff_t – a signed integer type of the result of subtracting two pointers

In most ILP32LL compilers, size_t is typdef’ed to an unsigned long and ptrdiff_t is a long.

Exploiting these weaknesses

The most common root problem using integer based attacks is where the implementation of an algorithm has mixed signed and unsigned values. Good targets are where standard library functions, such as malloc or memcpy have been used, as in both cases they take parameters of type size_t. For example:

int copySize;
// do work, copySize calculated…
if (copySize > MAX_BUF_SZ) {
    return -1;
}
memcpy(&d, &s, copySize*sizeof(type));

 If the attacker can craft copySize so that it is a negative number, then the test is true. Executing the following program with an input value of -2147482047will result in a buffer overflow:

$ ./a.out -2147482047<
s[1024] 1712400 c[0] 0
About to copy 6404 bytes
s[1024] 1712400 c[0] 1712400

The output shows that by crafting the value of copySize we have caused the memcpy to overflow the destination buffer (d) into the following memory (buffer c).

Over the years there have been numerous reported vulnerabilities due to integer manipulation, for example the following code is taken from the SSHD Casting Vulnerability; in this case port was defined as a signed integer but sin_port was an unsigned short. Using negative values therefore made it easy to subvert the error check but still assign a port below 1024:

/*
  * Check that an unprivileged user is not trying to forward a
  * privileged port. IPPORT_RESERVED is 1024
  */
if (port < IPPORT_RESERVED && !is_root)
          packet_disconnect("Requested forwarding of port %d but user is not root.",
                                           port);
…
sockadd.sin_port = port;

 Defense Against the Dark Arts

In short, it can be very difficult to protect ourselves against building programs which accidentally or deliberately use the undefined or implementation defined integer behaviour. Nevertheless, there are a number of things we can do:

Education

Assuming you’ve made it this far without skipping the content then you already, hopefully, have a better understanding of the potential issues and vulnerabilities associated with using integers; spread the word. Further reading includes:

  • Secure Coding in C and C++ / Robert C. Seacord — 2nd ed. (cert.org/books/secure-coding)
  • Hacking : the art of exploitation / Jon Erickson. — 2nd ed. (www.nostarch.com/hacking2.htm)
  • Anything by John Regehr (www.cs.utah.edu/~regehr/)

Use your compiler flags

Some compilers support compiler flags that affect the behaviour of integers. For example, it is not uncommon for gcc programmers to utilize the flag:

-fwrapv: This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation.

This means consistent behaviour on gcc, but of course could lead to security vulnerabilities if ported to a different compiler. An alternative flag is -ftrapv which will generate traps for signed overflow on addition, subtraction, multiplication operations. For example, executing the bufferOverflow.c program when built with -ftrapv will, by default, generate a core dump.

clang has some additional flags, for example compiling bufferOverflow.c with the flags -fsanitize=undefined -fno-sanitize-recover leads to the following useful output:

bufferOverflow.c:39:18: runtime error: signed integer overflow: 1804289383 * 100 cannot be represented in type ‘int’

Unfortunately, these flags are uncommon on cross-compilers.

Follow a Security based coding standard

One of the best examples is the CERT C Secure Coding Standard

Enforce the Coding Standard using a Static Analysis (SA) Tool

It is so important that any coding standard is enforced through automation; ideally it is a natural part of a Continuous Integration (CI) strategy (i.e. SA checked after a clean build but before tests are executed). Importantly for embedded systems we want consistency of checking across compilers, so you’ll need to seek out analysers that understand your compiler’s dialect. The CERT weblink has a list of analysers supporting its standard.

Summary

On the surface integers appear very simple, however, as you have hopefully seen there are a number of subtle issues any C/C++ programmer should be aware of.

Appropriate use of a good SA tool will eliminate pretty much all these issues, unfortunately most people aren’t working on green-field projects and therefore have a huge amount of legacy code; making applying SA retrospectively pretty much a non-starter (until of course it all goes wrong!).

Pragmatically, at least focus on new code or any code you’re refactoring and make that “integer secure”, then maybe, just maybe you can chip away at the codebase before those flaws are exploited…

The example code is available at here bitbucket.org/nscooling/intsecurity


 [1] But, of course, basing code on this assumption is a potential security flaw

[2] armcc  5.04

[3] C Rational §6.2.5-25

[4] ISO/IEC 9899:1999 §6.3.1.3

[5] ISO/IEC 9899:1999 §6.3.1.8

[6] This, officially, is undefined behaviour

Niall Cooling

Director at Feabhas Limited
Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.
Dislike (0)

About Niall Cooling

Co-Founder and Director of Feabhas since 1995. Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking. His current interest lie in IoT Security and Agile for Embedded Systems.
This entry was posted in C/C++ Programming and tagged . Bookmark the permalink.

2 Responses to Vulnerabilities in C : When integers go bad!

  1. Pingback: Links to Peruse: Good Unit Tests & Embedded Vulnerabilities - UpEndian

  2. Pingback: Sticky Bits » Vulnerabilities in C : When integers go bad! | Freedom Embedded

Comments are closed.