# Code Quality – Cyclomatic Complexity

In the standard ISO 26262-6:2011 [1] the term “complexity” appears a number of times, generally in the context of reducing or lowering said complexity.

There are many different ways of defining “complexity”, for example, Fred Brooks, in his 1986 landmark paper, “No Silver Bullet — Essence and Accidents of Software Engineering” asserts that there are two types of complexity; Essential and Accidental. [2]

Rather than getting into esoteric discussion about design complexity, I’d like to focus on code complexity.

Over the years, I have found one metric to be the simplest and most consistent indicator of code quality – Cyclomatic Complexity. This is often also referred to as the “McCabe Metric”, after Tom McCabe’s original paper in 1976 “A Complexity Measure” [3].

It’s not perfect, it can be fooled and the complexity measure can differ slightly from tool to tool (as the original work was analysing Fortran for the PDP-10). But, given that caveat, it has major value to act as “raising a red flag” if code goes beyond certain thresholds. In addition, it is easily incorporated as a “quality gate” into a modern CI/CD (Continuous Integration/Continuous Delivery) build system.

Contents

# Cyclomatic Complexity (CC)

Very simply, the metric is calculated by building a directed graph from the source code. It is done on a function-by-function basis and is not accumulative (i.e. the reported value is just for that function).

Given the following code:

void ef1(void);
void ef2(void);
void ef3(void);

void func(int a, int b)
{
ef1();
if(a < 0) {
ef2();
}
ef3();
}

A directed graph of the code would look thus:

The Complexity is measured using the very simple formula (based on graph theory) of:

v(G) = e – n + 2p

where:

e = the number of edges of the graph.
n = the number of nodes of the graph.
p = the number of connected components

However, as we are analysing just functions and not a collection of connected graphs, then the formula can be simplified to

v(G) = e – n + 2

as P = 1 representing the entry and exit nodes of a function.

In this simple example v(G) is equal to two. Running Lizard, an open-source Cyclomatic Complexity Analyzer, the result (shown as CCN) is as expected:

\$ docker run --rm -v \$(pwd):/usr/project feabhas/alpine-lizard lizard
================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
8      2     30      2       8 func@6-13@./main.c

In these examples I’m running Lizard from a docker container (as this fits with our Jenkins build system). However Lizard can be installed locally using pip install lizard if you have both Python and pip installed.

One very useful side-effect of the metric is that it is also a quantitative measure of the number of linearly independent paths through a program’s source code. Which, in turn, defines the ideal number of unit tests required to achieve branch coverage (e.g. the true and false paths of the if-statement in the example above).

Importantly, if we change the if-statement to:

void func(int a, int b)
{
ef1();
if((a < 10) && (b > 100)) {
ef2();
}
ef3();
}

and run the complexity analyser,

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
8      3     38      2       8 func@7-14@./main.c

we see that the complexity has increased by one. You may find that your tool-of-choice does not reflect this change and is insensitive to sub-expressions as the original work by McCabe didn’t reflect this (i.e. McCabe would calculate two rather than three).

If we add a second function, e.g.

void func2(int a, int b)
{
ef1();
if(a < 10) {
ef2();
} else {
ef3();
}
ef4();
}

and run the complexity analyser again:

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
10      2     37      2      10 func2@16-25@./main.c

we see that with the else path we also get, as expected, a complexity of two (a true and false path). It is worth noting that the conditional-expression (?:) also adds a value of two.

However, if we add an else-if branch:

void func3(int a, int b)
{
ef1();
if(a < 10) {
ef2();
} else if (b>100) {
ef3();
} else {
ef4();
}
ef5();
}

we can see, again as expected, the CCN of func3 is three:

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
12      3     50      2      12 func3@28-39@./main.c

## Control Structures

Each control will effect the complexity metric. We have seen with the if-else-if construct, each else-if adds one to the complexity.

### Iteration – While loops

Both the while-loop and the do-while-loop add two to the complexity:

void func_while(int a, int b)
{
ef1();
while(a > 0) {
ef2();
--a;
}
ef3();
}

void func_do_while(int a, int b)
{
ef1();
do {
ef2();
--a;
} while(a > 0);
ef3();
}
================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
9      2     33      2       9 func_while@41-49@./main.c
9      2     35      2       9 func_do_while@51-59@./main.c

### Iteration – For Statement

There as cases where the value calculated can generate a “false-positive”. For example, the following function demonstrates at typical for-statement:

void func_for_N(void)
{
ef1();
for(int i = 0; i < 100; ++i) {
ef2();
}
ef3();
}

However, the complexity analyser generates a, possibly unexpected, CCN of two!

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
8      2     38      2       8 func_for_N@61-68@./main.c

As discussed earlier, the cyclomatic complexity is

a quantitative measure of the number of linearly independent paths through a program’s source code

But in this example, there are not two paths through the code and thus, it is not possible to define two unit tests.

Unfortunately this is a consequence of the C for-statement really being a while-statement in disguise (and vice-versa). However, if we change the for-statement to:

void func_for(int a, int b)
{
ef1();
for(int i = a; i > 0; --i) {
ef2();
}
ef3();
}

We now have a valid case of requiring at least two unit tests.

### Break and Continue

Adding either a break-statement or a continue-statement to any looping construct will increase the count by one for each use, e.g.

void func_while_cont(int a, int b)
{
ef1();
while(a > 0) {
ef2();
--a;
if((a%b) == 0) continue;
}
ef3();
}

void func_while_break(int a, int b)
{
ef1();
while(a > 0) {
ef2();
--a;
if((a%b) == 0) break;
}
ef3();
}
================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
10      3     45      2      10 func_while_cont@59-68@./main.c
10      3     45      2      10 func_while_break@70-79@./main.c

But this can be a misleading value when used to break out of an infinite loop, e.g.

void func_forever(int a, int b)
{
ef1();
while(1) {
if (--a <= 0) break;
ef2();
}
ef3();
}
================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
9      3     37      2       9 func_forever@91-99@./main.c

where we get a reported value of three, which doesn’t match the true path count.

## The Switch-statement

The switch-statement is typically the construct that creates high complexity, for example:

typedef enum { START, STOP, PAUSE, RESUME} Command;

void func_switch(Command cmd)
{
switch(cmd)
{
case START:
puts("Starting the system");
break;
case STOP:
puts("Stopping the system");
break;
case PAUSE:
puts("Paused");
break;
case RESUME:
puts("Resuming");
break;
default:
puts("Unknown message type");
break;
}
}

Our complexity measure is five:

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
21      5     62      1      21 func_switch@84-104@./main.c

as there is a separate path per case label.

Note that if we drop the default label, the complexity measure will still be five.

### Differing Metric Values

In the switch-statement, multiple labels can be grouped together, e.g.

typedef enum { CMD_START,  CMD_STOP, STATUS_ON, STATUS_OFF, STATUS_READY, SIGNAL_VOLTAGE, SIGNAL_CURRENT } Message;

void func_switch2(Message msg)
{
switch(msg)
{
case CMD_START: case CMD_STOP:
puts("Processing command message");
break;

case STATUS_ON: case STATUS_OFF: case STATUS_READY:
puts("Processing status message");
break;
case SIGNAL_VOLTAGE:  case SIGNAL_CURRENT:
puts("Processing signal message");
break;
default:
puts("Unknown message type");
break;
}
}

So do we have four paths or eight?

The original metric was not intended to be language specific, so the value calculated will depend on the algorithm used to generate the directed graph. In most cases (certainly where your tool of choice is language aware) you’ll get a value of eight, e.g.

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
18      8     64      1      21 func_switch2@107-127@./main.c

There is a variant of CC called Modified Cyclomatic Complexity (also referred to as CCN3). When using a modified CCN count, a switch-statement is treated as a single decision point (i.e. a value of 2), independent whether a default is present or not. Lizard can generate the modified metric using the -m flag, e.g.

\$ docker run --rm -v \$(pwd):/usr/project feabhas/alpine-lizard lizard -m main.c
================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
18      2     53      1      21 func_switch@122-142@main.c
18      2     64      1      21 func_switch2@154-174@main.c

On the surface, this may not seem very useful, but, as we’ll discuss later, it can help when initially analysing legacy code.

## Multiple Function Return Paths

Referring back to ISO 26262-6, Table 8 (Design principles for software unit design and implementation) principle 1a is

One entry and one exit point in subprograms and functions

which indicates that the approach is highly recommended for all levels of ASIL (Automotive Safety Integrity Level).

Of course C allows multiple return-statements within a function, e.g.

int func_early_return(int a, int b)
{
ef1();
while(a > 0) {
ef2();
--a;
if((a%b) == 0) return -1;
}
ef3();
return 0;
}

An early (unstructured) return adds one to the complexity

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
11      3     50      2      11 func_early_return@167-177@./main.c

## The goto

Finally we can’t ignore the goto-statement. As you may well predict, using a goto in code will add one to the complexity value.

# McCabe’s Threshold

Returning to McCabe’s original work, he stated:

These results have been used in an operational environment by advising project members to limit their software modules by cyclomatic complexity instead of physical size. The particular upper bound that has been used for cyclomatic complexity is 10 which seems like a reasonable, but not magical, upper limit.

As part of his work, McCabe saw:

a close correlation was found between the ranking of subroutines by complexity and a ranking by reliability (performed by the project members).

He then goes on to state:

When the complexity exceeded 10 they had to either recognize and modularize subfunctions or redo the software.

Let’s not forget this is 1976.

In all of McCabe’s early work he only referred to as an upper threshold of 10 and remember this was a “reasonable, but not magical, upper limit”.

Subsequently further guidelines have been generated. A common one correlates CC and Risk, i.e.

In his excellent book, Code Complete, McConnell takes a more stringent view (p458):

Tools typically have their own default threshold (for example Lizard has it set at 15, whereas OCLint has it set at 10). As you would expect, these thresholds can be configured.

This enables these tools to be used as part of a modern CI build systems (e.g. Jenkins).

## Legacy Code Bases

For a legacy codebase, using a threshold of 10 may prove challenging. For example if we analyse the Eclipse’s Mosquitto open source message broker and run Lizard (using the modified cyclomatic complexity) against the source directory (/src) we get the following summary:

This shows an average CCN of 11.3 over 210 functions containing over 10,000 lines of code. Pragmatically, I normally recommend setting an upper threshold of 50 and using the modified CC for legacy code. This should identify the key problematic functions that are worth reviewing and consider reworking (you can only refactor if you have a good set of unit tests to begin with, and I’d suggest it is unlikely these high CCN functions have comprehensive unit tests).

Applying this the Mosquitto code base, we get the report of

\$ docker run --rm -v \$(pwd):/usr/project feabhas/alpine-lizard lizard --CCN 50 -s cyclomatic_complexity
...
================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
1127    481   8856      8    1311 config__read_file_core@711-2021@./conf.c
500    124   2911      2     578 handle__connect@109-686@./handle_connect.c
377    121   2735      4     473 mosquitto_main_loop@100-572@./loop.c
178     54   1250      2     246 main@193-438@./mosquitto.c

The results show that the function config__read_file_core has 1127 line of code, a modified cyclomatic complexity of 481 and takes 8 parameters. Interestingly, the unmodified CCN is 492 as the code uses mainly if-else-if-statements rather than switch-statements.

Importantly, especially with this example, it is worth noting that this function contains a lot of conditional compilation directives. Lizard doesn’t support defining  which conditionals should be used for a build (note; using the -Ecpre flag will ignore code in the #else branch). This, obviously, will push up the overall CCN as all code is being considered.

Some tools will allow the setting of specific conditions. For example, when using OCLint, the compiler flags can be passed as part of the tool invocation:

\$ OCLint -rc=CYCLOMATIC_COMPLEXITY=50 conf.c -- -c -I../ -I../lib -DWITH_BRIDGE -DVERSION=1 | grep cyclomatic

This results in a reported CCN of 361 rather than 492:

conf.c:711:1: high cyclomatic complexity [size|P2] Cyclomatic Complexity Number 361 exceeds limit of 50

# Summary

There are as many empirical studies [4] refuting the models based on McCabe as there are studies “validating” them. For example, Dr. Benjamin Hummel, in his Software Quality Blog  “McCabe’s Cyclomatic Complexity and Why We Don’t Use It” uses the following example to argue high complexity does not necessarily mean low readability (I’ve rewritten the original Java in C )

const char* get_month_name (int month) {
switch (month) {
case 0: return "January";
case 1: return "February";
case 2: return "March";
case 3: return "April";
case 4: return "May";
case 5: return "June";
case 6: return "July";
case 7: return "August";
case 8: return "September";
case 9: return "October";
case 10: return "November";
case 11: return "December";
default: assert(month < 12);
}
}

This gives a complexity of 13 (in his blog he claims 14):

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
17     13     94      1      17 get_month_name@27-43@cc12.c

Thirteen, fourteen, worth flagging for review? Maybe, maybe not. But what about 30+ or 50+ or 100+? Surely there is value in having an upper limit?

And of course, you could argue it could/should be written like this:

static const char* months[] = {
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December"
};

const char * get_month_name_lookup (unsigned month) {
assert(month < 12);
return months[month];
}

which gives a CCN of one:

================================================
NLOC    CCN   token  PARAM  length  location
------------------------------------------------
4      1     20      1       4 get_month_name_lookup@60-63@

Which is “better”? Well, of course there will be a degree of subjectivity, and I don’t want to open that can of worms here! Hummel does argue, quite rightly, that a low CCN does not correlate to well written code. However, I would argue that at least with a low CCN you have a greater chance of deciphering the intent of the function’s behaviour and possibly being able to write a set of unit tests before refactoring it.

The takeaway is that there is a strong correlation between size and complexity (I know, who would have though) and that highly complex code has a greater probability of containing latent errors.

When doing consultancy I usually ask my client to run CC analysis against their codebase (if they are not already doing it). This gives me a “heads-up” of possible what to expect in terms of the companies software maturity. In the last couple of years I have personally seen two different examples of CCNs greater than 1000. The current record is approximately 1328 (not a record to be proud of), though I have been told of one exceeding 2500!

Given modern FOSS (Free Open Source Software) such as Lizard and OCLint, not to mention a profusion of commercial tools, that can calculate cyclomatic complexity, I do not understand why many companies undertaking professional software development don’t use cyclomatic complexity as one of their primary quality gates. Adding to this modern build systems, such as Jenkins Blue Ocean and Docker, there really are no excuses.

# References

[1] ISO 26262 Road vehicles – Functional safety – Part 6: Product development: software level‘. International Standard, 2011.

[2] Brooks, Fred P. (1986). “No Silver Bullet — Essence and Accident in Software Engineering”. Proceedings of the IFIP Tenth World Computing Conference: 1069–1076.

[3]  McCabe (December 1976). “A Complexity Measure”. IEEE Transactions on Software Engineering: 308–320.

[4] Norman E Fenton; Martin Neil (1999). “A Critique of Software Defect Prediction Models”. IEEE Transactions on Software Engineering. 25 (3): 675–689.

##### Niall Cooling
Website | + posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.