A convenient untruth

Array notation in C is a lie!

Sorry, dear reader*, but I cannot participate in this conspiracy any longer.  You have been lied to, manipulated and coerced into thinking arrays are a construct of the C language.  I feel it is my solemn duty to blow the whistle on this charade and expose the dirty secrets of C’s so-called arrays.

(* It is statistically possible that more than one person might read this, of course)

Lie #1 – Array names don’t name arrays

Here’s an array declaration in C:

int main(void)
{
  int arr[5];
}

(Fatuous side note:  when demonstrating arrays, always call them ‘arr’ – it allows you to talk like a pirate  🙂 )

Our compiler has allocated a contiguous sequence of integers.  We can tell by looking at the size of arr:

int main(void)
{
  int arr[5];

  printf(“%d”, sizeof(arr));  // => 20
}

That’s consistent with sizeof for scalar types, but perhaps not as useful as it could be.  Wouldn’t you rather know how many elements were in the array?  Preprocessor to the rescue!

#define ARRAY_SIZEOF(a) (sizeof(a) / sizeof(a[0]))

int main(void)
{
  int arr[5];

  printf(“%d”, ARRAY_SIZEOF(arr));  // => 5 (*much* more useful!)
}

So, a dumb question:  what’s the type of arr?  If you said int [5] you’d be right; but wait:

int main(void)
{
  int arr[5];

  int another_arr[5] = arr;  // Nope.
  another_arr = arr;         // Nope.
}

Even though arr and another_arr are declared as the same type I can’t use one to initialise the other; nor can I assign one array to another.

Why is this failing?  Because the array’s name is a lie!  Using a variable as an expression normally yields its value, but in the case of arrays the array name yields a pointer (to the first element; which is at least reasonable)

int main(void)
{
  int arr[5];

  int *ptr = arr;  // ptr holds the address of first element
}

The array’s name yields a non-modifiable l-value expression.  This (in effect) means the pointer is constant; hence why you can’t assign arrays to each other.

It might be tempting to believe at this point that arrays and pointers are pretty much the same thing.  That would be crazy thinking though – arrays are arrays; pointers are pointers.

Being a reasonable human being you should want to give the readers of your code a bit more of a clue as to what’s going on:

int main(void)
{
  int arr[5];

  int *int_ptr = &arr[0];  // The same as before; but now explicit
}

Just to mess with you some more:  taking the address of an array yields a pointer-to-array; and not, as many believe, a pointer to the first element:

int main(void)
{
  int arr[5];
  int *int_ptr;
  
  int_ptr = arr;       // OK.
  int_ptr = &arr       // Warning - actually the wrong type.


  int (*arr_ptr)[5];

  arr_ptr = arr;       // Nope.
  arr_ptr = &arr;      // Yup.
  int_ptr = *arr_ptr;  // OK.  Confused now?...
}

At this point we have to conclude the following about arrays:

  • Arrays are a contiguous sequence of objects
  • Arrays don’t behave the same as the types in the array.

Lie #2 – Array access is just syntactic sugar

Accessing array elements is done with the index operator ([]).  If it were only that simple.  The index operator is merely a smokescreen hiding the insidious truth:  Array access is pointer arithmetic.

Pointer arithmetic, as you will know doubt know, modifies the address stored in the pointer object by multiples of the size of the type being pointed to; that is:

int main(void)
{
  int a;
  int *int_ptr = &a;

  ++int_ptr;  // int_ptr => int_ptr + sizeof(int)
}

Ever wondered why pointer arithmetic is the way it is?  The answer is that it’s all part of the Great Array Conspiracy. (Which is not a real thing.  Yet)

Using array arithmetic, I could access array members like this:

int main(void)
{
  int arr[5];
  int *int_ptr = &arr[0];

  *(int_ptr + 3) = 100;  // Modify the 4th array element 
                         // via the pointer.
}

This is sneaky, underhand  and just plain difficult-to-read code I hope you would agree.  It’s the sort of code written by programmers who believe Code Obscurity == Job Security.

But this is exactly what the index operator is doing.  When you write (for example)

arr[0]

The compiler is re-writing your code as:

*(arr + 0)

We saw previously that the array name, as an expression, yields a pointer; so this is exactly the pointer arithmetic code we just mentioned.

The index operator is not a particularly fussy operator: any pointer of the correct type will do:

int main(void)
{
  int arr[5];
  int *int_ptr = &arr[0];  // Oooh, look – pointer arithmetic!


  int_ptr[3] = 100;        // Same as *(int_ptr + 3)  = 100;
                           // Also the same as arr[3] = 100;
}

Once again, many naïve programmers are left to reason that arrays and pointers must be the same, since they (appear) to work in the same way.

In C (as in mathematics) arithmetic is symmetrical; so a + b is the same as b + a.  Rather surprisingly this symmetry is also true for pointer arithmetic; and this can lead to some truly bizarre-looking code:

int main(void)
{
  int arr[5];
  int *int_ptr = &arr[0];

 
  *(int_ptr + 3) = 100;  // Same as int_ptr[3] = 100
  *(3 + int_ptr) = 100;  // Same as above.
  3[int_ptr]     = 100   // Exactly the same as the others!
}

If you’re ever tempted to write code like the last line above, you really need to sit down and have a good stiff word with yourself.

Lie #3- There are no multi-dimensional arrays

If there were multi-dimensional arrays in C I’d be able to write something like:

int main(void)
{
  int one_D_arr[10];
  int two_D_arr[10, 10];
  int three_D_arr[10, 10, 10];

  ...
}

Of course, I can’t.  C only allows one-dimensional arrays.  However, it is pretty lackadaisical about the type of object in the array.  It has no objections to having arrays as array elements

int main(void)
{
  int arr_arr[4][2];  // Array of arrays.  But which is which?
}

Although this looks like a two-dimensional array, it isn’t, it’s a contiguous sequence of eight integers.  The compiler, though, sees it as a contiguous sequence of four elements, where each element is a contiguous sequence of two integers.

(A special prize of a year’s supply of Brownie Points if you can work out the type of the expression arr_arr; (and no, it’s not int**))

Accessing array-of-array elements has simple syntax (once you’ve worked out which index represents which axis: the right-most index represents the ‘minor’ array; the left-most is the ‘major’ array)

int main(void)
{
  int arr_arr[4][2]; 

  arr_arr[3][1] = 100;  // Looks easy, but what’s really going on?
}

We know that the index operator is merely syntactic sugar to fool us (but we’re wise to that, now).  However, navigating your way through the collusion and misdirection to work out how array-of-array elements are accessed requires a degree of mental intrepidity.

We know how to deal with arr_arr[3].  It is pointer arithmetic that yields (in this case) an array of two integers (as an l-value expression).  We know that an array used as an l-value expression yields the address of the first element so we apply the pointer arithmetic again to get an integer (again as an l-value expression).

Of course, if you really want to send people off the edge into insanity you could always write the above code as:

int main(void)
{
  int arr_arr[4][2];

  1[3[arr_arr]] = 100;  // <= How to lose friends, fast.
}

Lie #4 – You can’t pass arrays to functions by value

The code below appears to refute this lie:

void process_array(int arr_param[10])
{
  ...
}


int main(void)
{
  int arr[10];

  process_array(arr);
}

Everything points to this being a pass-by-value call; a copy of arr is made on the call to process_array().

Let’s chip away at this façade.  Our ARRAY_SIZEOF macro from earlier should still work:

#define ARRAY_SIZEOF(a) (sizeof(a) / sizeof(a[0]))


void process_array(int arr_param[10])
{
  int sz = ARRAY_SIZEOF(arr_param);  // sz => 1. Or 2. Ummm?...
}


int main(void)
{
  int arr[10];

  int sz = ARRAY_SIZEOF(arr);       // sz => 10, as expected.
  process_array(arr);
}

The call should give us a clue why this is a lie:  an array name is being used as an l-value expression.  We know this really means ‘give a pointer to the first element’.  We also know a pointer is not the same as an array of integers.  So what’s happening?

The signature of the process_array() function is a lie:  You cannot pass an array to a function by value.

The parameter signature degenerates to a pointer.  To demonstrate this I could have written the function signature as

void process_array(int arr_param[])  // The index is ignored,
                                     // because it has no meaning

or perhaps even more accurately

void process_array(int *arr_param) // Exactly the same as the above.

The ARRAY_SIZEOF is simply compounding all the lies we covered previously to give us an answer we weren’t expecting.  Expanding the macro shows us why.

void process_array(int *arr_param)
{
  int sz = ARRAY_SIZEOF(arr_param);
  //     => sizeof(arr_param) / sizeof(arr_param[0])
  //     => sizeof(arr_param) / sizeof(*(arr_param + 0)
  //     => sizeof(int*)      / sizeof(int)
  //     => (4/8 bytes)       / (4 bytes)
}

Inside the function we are relying on the syntactic sugar of the index operator.  Insidious!

The actual value you get depends on your architecture.  On a 32-bit machine you’ll typically get 1, on a 64-bit machine you’ll get 2.

Lie #5 – String literals aren’t constant

Literals are fixed values.  You can’t change the value of 26.7, or 136.  They are constants.  C supports string literals, defined as arrays of chars.

int main(void)
{
  puts(“An array of 20 chars”);
}

C lets you initialise arrays of characters with literals; and helpfully adds a NUL character terminator to the array.

int main(void)
{
  char string[] = “Hello world”;
}

It’s worth paying close attention to the type of the array:  char.  Not const char.  This means I can perform some subtle abuse:

void mod_string(char string[])  // Remember, it’s a lie!
{
  string[0] = ‘h’;              // Seems legit...
  puts(string);
}


int main(void)
{
  mod_string(“Hello world”);
}

What’s going to happen here?  We know the string literal won’t be copied (that’s a lie).  We also know the index operator will just operate on the address of the string literal.  So where’s the string literal being stored?  Most likely in the Code section.  Trying to modify the code section of my program will either do nothing (if my code section is in Flash / ROM) or cause a segmentation fault.  Joy.

The conclusion to all this?  Be careful with string literals.  The compiler may not stop you doing dumb and dangerous things.  Make life a little safer by only using const char with string literals in C.

Summary

Of course, if you’ve read this far you’ll (hopefully) realise that this post should have been taken in jest.  Arrays aren’t really a lie (any more than any of C’s constructs are).  Despite all the ‘trickery’ C’s arrays work well for many, many programming tasks.  They are – as the title of this article suggests – a very convenient set of untruths.

However, by exploring the C semantics we’ve highlighted some traps that can befuddle the unwary C neophyte.

If you want to explore the C and C++ languages in more detail the following courses might be of interest to you:

Glennan Carnie

Glennan Carnie

Technical Consultant at Feabhas Ltd
Glennan is an embedded systems and software engineer with over 20 years experience, mostly in high-integrity systems for the defence and aerospace industry.

He specialises in C++, UML, software modelling, Systems Engineering and process development.
Glennan Carnie

Latest posts by Glennan Carnie (see all)

Dislike (10)

About Glennan Carnie

Glennan is an embedded systems and software engineer with over 20 years experience, mostly in high-integrity systems for the defence and aerospace industry. He specialises in C++, UML, software modelling, Systems Engineering and process development.
This entry was posted in C/C++ Programming and tagged , , , . Bookmark the permalink.

9 Responses to A convenient untruth

  1. Alex says:

    "If there were multi-dimensional arrays in C I’d be able to write something like: int 1D_arr[10];"

    You wouldn't get very far, trying to use an identifier that starts with a digit. 😉

    Like (6)
    Dislike (0)
  2. Jacob says:

    I'm guessing I would fail your interview tests. Nonetheless the subtleties are always amusing.

    Like (2)
    Dislike (0)
  3. Yup. That's pretty dumb, isn't it? :facepalm:

    I've updated the text to make those array names more... palatable for a compiler.

    Thanks for the spot!

    Like (1)
    Dislike (0)
  4. Carlo says:

    Hi Glennan, not to be negative but i think the title is kinda off the subject, i think this are great examples of misconceptions that coders have around pointers when learning C and that (if learned properly) after some practice or some time playing with the language are corrected; but at the same time i understand your point, lies(or misconceptions) can indeed go far enough to make some people (sometimes) defend them.

    Like (0)
    Dislike (0)
  5. Trevor says:

    Doesn't make a huge difference, but it's worth noting that accessing array elements by pointer (I.e. *arr = x; arr++) is slightly faster than by subscription (I.e. arr[]) in loops, even though it can look a lot more confusing. Check out the assembly! Without optimization there's no contest, but even with -O3 or -O4 (using GCC) subscription requires a few slower computations

    Like (0)
    Dislike (0)
  6. Alius Umbra says:

    Oh my goodness 😥 I'm so glad I left C behind and embraced C# 😄👍

    Like (0)
    Dislike (6)
  7. John A Lauro says:

    It is a lie that C can not do arrays, or even your lie #4 that you can't pass by value. If you want to have an array type, than make it so. pass arrays around (by value, or copying, etc...). Hopefully we all know that generally isn't a good idea, but sometimes, especially for small arrays it can be... All you have to do is define it as such.

    typedef struct { int arr[5]; } array5;

    int main(void)
    {
    array5 arr, another_arr;

    another_arr = arr; // Yup.
    }

    You can access individual elements such as arr.arr[4], or another_arr.arr[3], etc... and now you can pass arr by value (watch your stack for big arrays).

    Like (2)
    Dislike (0)
  8. R. J. Smith says:

    Really nice article! Don't mean to spoil anything... but the type of arr_arr as an l-value is "pointer-to-array-of-2-int", correct?

    I've been asked why the index doesn't matter in a function prototype, and you explain that very well. It doesn't matter if you use any of these, because the argument decays to a pointer-to-int (the first element in the array:

    int arr[ ]
    int arr[5]
    int * arr
    int arr[10]

    An interesting question is why the dimensions of a multi-dimension array *do* matter with function prototypes. If you have 'int arr3d[5][10][20]', then the array name will yield a 'pointer-to-array-of-10-arrays-of-20-ints'. So any of these are valid in a function prototype:

    int [5][10][20]
    int (*arr)[10][20]
    int [ ][10][20]
    int [9999][10][20] // but seriously, don't...!

    Again, the first dimension is "ignored", but the other dimensions must be correct. This means that functions that handle multi-dimensional arrays in this fashion aren't terribly useful or general... this one would only process arrays whose elements are arrays-of-10-arrays-of-20-ints. With some casting trickery, a more general function can be written with a parameter of this type:

    int *** arr

    However, after the ugly casting you "lose" the information about the other dimensions and must pass in those dimensions separately (in reality the array is linear and contiguous, so it becomes up to you to slice it properly). Some clever macros can help with the calculation of the arguments.

    Basically the index operator has to know the size of the object it's working on for the pointer arithmetic to work properly. This (bad) image I put together quickly gave a couple of my friends a "light bulb" moment:

    http://i.imgur.com/DF3xDJO.jpg

    When "passing a 1D array" to a function, the function will know the size of the elements in the array, but without more information, can't know how many. With a 2D array, it will know the size of the "row", but not how many rows (and the rows are statically sized - hence the 2nd dimension is required). With a 3D array, it knows the size of each "grid", but not how many grids (and grids are statically sized - hence the 2nd *and* 3rd dimensions are required).

    Of course this is recursive for all higher dimension arrays, so the first rule is really the _only_ rule... the function will know the size of the elements in the array, but not how many.

    Like (2)
    Dislike (0)
  9. Silvio Fonseca says:

    See? More than one reader indeed. Coding "1[3[multid_array_pointer]]" is the most evil C code I have seen in a while...

    Like (1)
    Dislike (0)

Leave a Reply