enum ; past, present and future

The enumerated type (enum) is probably one of the simplest and most underused  features of the C and C++ which can make code safer and more readable without compromising performance.

In this posting we shall look at the basic enum from C, how C++ improved on C’s enum, and how C++0X will make them a first class type.

Often I see headers filled with lists of #defines where an enum would be a much better choice. Here is a classic example:

/* adc.h */
#define ADC_Channel_0                               (0x00) 
#define ADC_Channel_1                               (0x01) 
#define ADC_Channel_2                               (0x02) 
#define ADC_Channel_3                               (0x03) 
#define ADC_Channel_4                               (0x04) 
#define ADC_Channel_5                               (0x05) 
#define ADC_Channel_6                               (0x06) 
#define ADC_Channel_7                               (0x07) 
#define ADC_Channel_8                               (0x08) 
#define ADC_Channel_9                               (0x09) 
#define ADC_Channel_10                              (0x0A) 
#define ADC_Channel_11                              (0x0B) 
#define ADC_Channel_12                              (0x0C) 
#define ADC_Channel_13                              (0x0D) 
#define ADC_Channel_14                              (0x0E) 
#define ADC_Channel_15                              (0x0F) 

which probably would be better re-written as:

enum ADC_Channel_no {
	ADC_Channel_0,
	ADC_Channel_1,
	ADC_Channel_2,
	ADC_Channel_3,
	ADC_Channel_4,
	ADC_Channel_5,
	ADC_Channel_6,
	ADC_Channel_7,
	ADC_Channel_8,
	ADC_Channel_9,
	ADC_Channel_10,
	ADC_Channel_11,
	ADC_Channel_12,
	ADC_Channel_13,
	ADC_Channel_14,
	ADC_Channel_15
};

Before getting onto the advantages and disadvantages of enum’s, let’s have a quick review.

enum’s in C

When defining an enumerated type, as in:

enum State {OFF, STANDBY, ON};

The first named member of an enumerator-list (OFF) has value zero (0). It is then possible to use this where ever an integer constant (or #define) can be used:

int initial_state = OFF;  /* better than initial_state = 0; */

The initial benefit we get with enum’s is that the values on the next values are automatically calculated, e.g. STANDBY takes on one (1) and ON the value two (2)

initial_state = ON;      /* initial_state is now 2 */

It is possible to override the default values with other constants:

enum State {OFF = 1, STANDBY, ON};  /* STANDBY is 2, ON is 3 */

The assigned values need not be unique:

enum State {OFF = 5, STANDBY = 15, ON = 15};

and the enum identifier is optional:

enum {OFF = 5, STANDBY = 15, ON = 15};

So given the identifier is optional, what is its benefit?

First, a good identifier should help regarding program understanding and maintenance, as it defines intent. Second, it also allows us to define variables of that pseudo-type:

enum State initial_state = OFF;

This definition is a bit unwieldy, so appropriate use of typedef sorts this out:

typedef enum {OFF, STANDBY, ON} State;

int main(void) {
	State s = OFF;
	return 0;
}

This makes them very useful for managing selection criteria:

void run_SM(State s){
	switch(s){
	case OFF:
		/* do something */
		break;
	case STANDBY:
		/* do something */
		break;
	case ON:
		/* do something */
		break;
	default:
		/* error */
		break;
	}
}

enum problems in C

Unfortunately, there is one glaring hole in this model;  nothing prevents us from assigning nonsensical values to the enum variable, e.g.:

initial_state = 5;      /* Not in range 0..2 */

This is, probably, the major reason why enum’s aren’t seen as useful in C programs. However, good static analysis tools can be configured to report on any assignments to the variable (inital_state) that does not use one of the enum members.

There is also another, fairly big, issue in that each enumeration member name must be unique within a translation unit. e.g.

typedef enum {OFF, STANDBY, ON} State;
typedef enum {OFF = 0x1A, ACTIVE = 0x2A, IDLE = 0x4A} Position;

This will cause a compile-time error due to the redeclaration of enumerator ‘OFF’. Our bigger problem is that the enum definitions are likely to be spread across header files and can be problematic at integration time. But I would still argue that this is better than a redefinition of a #define which may only be reported as a warning (and I know how many projects ignore warnings!).

It’s worth noting one subtle change in C99 (6.7.2.2). When defining a list of enum members a trailing comma is permitted

typedef enum {OFF, STANDBY, ON, } State;

This, minor, but useful change aids the automatic creation of enum member lists from external design tools (e.g. a state modelling package).

enum’s in C++

The basic C++ enum is almost identical to the C enum with two significant differences. First, as with struct’s, there is no need  for typedef’s

// C++enum State {OFF, STANDBY, ON} ;

int main() {
	State s = OFF;
}

Second, assigning an integral value to an enum object is illegal

initial_state = 5;      // Compile time error

This minor semantic change makes enum’s incredibility powerful for compile-time error checking (which for real-time systems is part of our holy grail). For example, we could define the interface to a UART and constrain the configuration parameters using a set of enum’s.

enum baudRate { b9600 = 9600, b38400 = 38400, b115k = 115200};
enum dataBits {five = 0, six = 1, seven = 2, eight = 3};
enum stopBits { none = 0, one = (1<<2) };
enum parity { off = 0, odd = (1<<3), even = (3<<3)};

class UART
{
public:
  explicit UART (unsigned long address,
                 baudRate baud = b9600,
                 dataBits db = eight,
                 stopBits sb = none,
                 parity pb = off);

This then eliminates the need for code to check where, for example, the supplied baud rate is a valid integral number. Of course this isn’t fool-proof (we know how ingenious fools are), as you can cast an integer to an enum; but if you do that you deserve a good slap with a wet fish.

We can also overcome the problem of member-name clashes through the use of namespace or class scoping. For example, given the following class:

class Valve
{
public:
	explicit Valve(uint32_t valveID);
	enum ValveState {CLOSED, OPEN, UNKNOWN};
	void open();
	void close();
	ValveState getStatus() const;
private:
	};

the class’s enum would not clash with another header containing:

namespace NS1 {
	enum State {OPEN, CLOSED};
}

and we use the class enum thus:

void checkValve(const Valve& theValve){
	if (theValve.getStatus() == Valve::CLOSED){
		cout << "Valve is closed" << endl;
	}
	else{
		cout << "Value is not closed" << endl;
	}
}

If, for example, we accidently wrote:

	if (theValve.getStatus() == CLOSED)

the we would get a compile-time error of the form “error: ‘CLOSED’ was not declared in this scope”.

C++ enum weaknesses

So, enum’s in C++ are a significant improvement over C and should be widely used in good C++ code (combined with C++ const, #defines can all but be eliminated for constant values). Unfortunately there are still a number of weaknesses in the current C++ model.

First, the fact that enums implicitly convert to int can cause subtle errors. Take, for example, if in the previous example we “used” the namespace in our main code:

using namespace NS1;

then the code

	if (theValve.getStatus() == CLOSED)

would no longer fail at compile-time (I would, however, expect a warning along the lines of “warning: comparison between ‘enum Valve::ValveState’ and ‘enum NS1::State’”). The code is legal because both enum members are implicitly converted to an integer type, which of course we can check for equality! In this example, the Valve::CLOSED is 0, whereas the NS1::CLOSED is 1, opps…

This problem can be addressed within the current language by using operator overloading. If we extend the valve’s interface to be:

class Valve
{
public:
	explicit Valve(uint32_t valveID);
	enum ValveState {CLOSED, OPEN, UNKNOWN};
	bool operator==(ValveState rhs) const;private:};

we can rewrite the buggy code to

void checkValve(const Valve& theValve){
	if (theValve == Valve::CLOSED){
		cout << "Valve is closed" << endl;
	}
	else{
		cout << "Value is not closed" << endl;
	}
}

and

	if (theValve == CLOSED)

would fail at compile-time.

The second problem, is that the underlying type of an enum cannot be specified, the choice of type is implementation-defined. This pretty much eliminates the use of enum’s as members of PoD struct’s where packing is required, for example in network message packets.

It is also worth noting that C++ does not support the trailing comma syntax supported in C99 – although you may find your compiler doesn’t complain unless you compile with strict language standard settings.

enum’s in C++0x

To address C++98’s weaknesses, the soon-to-be ratified new C++ standard has added some additional syntax. Your compiler may already support many of the new C++0x features (in g++ add the -std=c++0x directive).

To eliminate the implicit conversion to integers, C++0x introduces the concepts of an “enum class”.

namespace NS1 {
	enum class State {OPEN, CLOSED};
}

The syntax for members of an enum class is the same as a regular class, e.g. State::OPEN and State::CLOSED. But more importantly any attempt to convert to an integer will compile-time fail.

	State s = State::CLOSED;
	int i = s; // error: cannot convert 'NS1::State' to 'int' 

Finally, in C++0x you can also specify the underlying type:

namespace NS1 {
   enum class State: unsigned char {OPEN, CLOSED};
}

which, in turn, allows for forward declaration of enums (though currently I cannot get this to compile under g++ 4.5.2.).

namespace NS1 {
   enum class State: unsigned char;

   void setSwitchState(State p);
   enum class State: unsigned char {OPEN, CLOSED};
}

Summary

Enum’s are a really useful tool in your programming toolbox for creating high quality, safe and reliable code whilst not effecting code size or performance. Even the weaker C enum still can massively improve code readability and maintenance, especially when combined with a good static analysis tool. Go out and enumerate…

Posted on June 15th, 2011
» Feed to this thread
» Trackback

3 Comments a “enum ; past, present and future”

  1. Dave Banham says:

    Niall,
    Thanks for the update on the evolution of the enum facility.

    I’ve made extensive use of the c90 enum for many years, with what I consider good effect; namely as symbolic constants and as strong types. The strong typing enforces type separation, which means that any muddling of enumerate values between enumerated types will be found by either (a modern) complier or a static analyser (e.g. PC Lint).

    The only real difficulties that I’ve encountered with enum’s have been with some static analysers (but not PC Lint). For example, the checker complains that a switch default is unreachable because there is a case clause for each of the enumerates of the controlling variable. This flies in the face of easy defensive programming since the range of the underlying implementation of the enum is typically far larger than the range of the enum itself. Other problems arise from the understanding of the implementation type of an enum. It is overly simplistic to say that an enum is implemented as type (signed) int. An enum is implemented as a C integer, but never as one of the small integer types. The best way of understanding this is simply to apply the integer promotion rules to the underlying values of each enumerate and select the worst case for the overall implementation type. For example, an enum designed to allow bit flags to be stored in a 32-bit bit string would have each of its enumerates set to an ascending power of 2. This means that the final enumerate on a 32 bit machine has its most significant bit stored and as a consequence an unsigned int type is (notionally) used by the complier to implement the enumerate type. However, one particular static analysis tool that I tried complained that the final enumerate in the enum is being assigned a value greater than max-int! Admittedly, the c90 language standard is ambiguous, so it is great to hear that we will finally gain control over the implementation type and even be able to use the small integer types for this too.

  2. Frank says:

    Hello Niall,

    in the ADC example, I always add another enum value like ADC_Channel_MAX. This last value is useful in loops or array declarations. And if you add another channel in between, it is automatically be adjusted.

  3. Patrick says:

    “The second problem, is that the underlying type of an enum cannot be specified”

    True, but you can be clever:

    typedef enum { A, B, C, STATE_MAKEINT = 0x7FFFFFFFu } State;

    In order to fully hold the “range” of values, you’d need at least 32 bits. It doesn’t stop the compiler from using a 64-bit integer or other silliness, but this solution can be used in non-C++0x. Honestly though, I’ve never used sizeof(enum), but I’d bet it is == sizeof(int) on pretty much every platform.

Leave a Reply