C++20 modules with GCC11

Introduction

One of the headline changes of the C++20 standard is the inclusion of modules. Modules promise to significantly change the structure of C++ codebases and possibly signal headers’ ultimate demise (but probably not in my lifetime). It also opens the door to potentially have a unified build system and package manager, similar to Rust’s Cargo package manager; though I imaging standardising a unified build system would be one bloody battle.

Pre-C++20 builds

If you want to start a heated debate on any C++ forum/channel, just state that one particular build system (e.g. Meson, CMake, Bazal, etc.) is better than the others; or that your way of using that build system is the “one, and only one, correct way”. If you are unfamiliar with build systems, I’d recommend reading this post first to understand the challenges.

Start with the Why

There have already been several articles written about Modules (significantly by Microsoft ). But my experience, in reading these, is that they focus ‘how’ modules work in C++20 and seem to miss the ‘Why’. Maybe the authors consider it obvious, but I think it depends on your background. In addition, all I have read use Microsoft MSVC, due to this having the fullest support for Modules among the mainstream toolchains.

First and foremost, when discussing modules, surely, we should be discussing modularity. We already have one form of modularity in C++ with the object/class model. But this is modularity ‘in the small’; modules are addressing ‘modularity in the large’, i.e., program-wide modularity.

So What problem are we trying to solve by adding modules?

Let’s be honest, “Headers are a mess” – they can (and have for many decades) been used effectively, but so often, I see very poorly constructed headers (IMHO). A well-crafted application will, typically, have pairs of files to “mimic” a module, e.g., file.h and file.cpp. But there is no enforcement of this approach; we also need to understand external- and internal-linkage rules to build a modular architecture safely.

The root of the problem with headers is they only exist up to and including pre-processing:

Headers do not exist during the compilation phase

We could happily (okay, maybe not happily) write a complete C++ application without any headers. There would be a lot of code duplication (declarations), and it would be a maintenance nightmare, but that’s how the current build model works (it all stems from the definition of a translation unit).

Modularity (in the large) is typically closely related to application architecture and construction – the files that make up our build.

Before we go on, I need to stress one important aspect – Modules and Namespaces are entirely orthogonal and co-exist as independent aspects (more on this later).

Other languages

Many modern languages tend to build the semantics of a module around all the code for the module existing within a single file, e.g. Java and Python

They have the concepts of exporting and importing types and behaviours from other modules. It is very similar to the public/private semantics of the class but at the file scope.

Interestingly, older languages, such as Ada and Modula-2, designed in the 1980s, around the same time as the original C++, use a two-file structure for defining modules (or packages in Ada’s case). These designs separate the module interface from the implementation.

The significant benefits of the interface/implementation file structure can be:

  • Improved build times
  • Simplified integration and testing

Though, of course, this is another hotly debated subject.

C++20 Module File Structure

Dare I say it, but C++ being C++, rather than a straightforward way of structuring models (a.la. Java), we have been given the suiss-army-knife approach to module construction. There are numerous ways of doing the same thing and lots of special cases. This, initially, caused me a lot of problems as my mental model (based on other language paradigms) wasn’t aligning with what I was being introduced to.

There is no one way of correctly using C++20 modules

I’m sure, over time, we will come up with new idioms regarding the use of modules, but for now, I can see three obvious uses of modules (think 80:20 rule)

  1. Single file module – the Java/Python model or a complete module
  2. A separate Interface file and Implementation file for a module – the Ada model
  3. Multiple separate files (partitions) combining to define a single module concept – the C++20 model

In C++20, any file containing the module syntax is referred to as a Module Unit. Therefore a Named Module may be made up of one or more Module Units.

Single-file (Complete) Module

Pre-C++20 code

Let’s start with the obligatory “hello, world!” example, splitting the behaviour across two files.

Remember

At compilation headers don’t exist

We have two files, func.cpp and main.cpp

// func.cpp
#include <iostream>

void func() {  // definition
    std::cout << "hello, world!\n";
}
// main.cpp
void func();  // declaration

int main(){
    func();
}

We can go ahead and build and run the application:

$ g++ -c func.cpp 
$ g++ -c main.cpp 
$ g++ -o App main.o func.o
$ ./App 
hello, world!

This, of course, builds successfully as the function func has, by default, external linkage (often referred to as global scope). So as long as main has a valid declaration, the main.cpp file can be compiled, and the linker resolves the exported/imported symbols.

C++20 Module

The early proposals for module support, P1103R3, uses the term complete module where

A complete module can be defined in a single source file.

I quite like this term, and therefore I’m going to use it for single-file modules (until something better comes along).

First, we need to create our Named module. A complete module file will, typically, have two, possibly, three sections (called fragments)

  • A global module fragment – this is where we include things we need (optional)
  • The main module purview – Where we can export types and behaviour
  • A private fragment – this ends the portion of the module interface that can affect the behaviour of other translation units (optional).

The private module fragment can only appear in single-file modules. The current version of GCC (gcc version 11.1.0) does not support private fragments, so I’m not going to ignore them for this post.

The C++ standard does not define file extensions; this is toolchain specific. With GCC, the file name suffix determines how the file is treated for any given input file.

GCC interprets the following file extensions as C++ source code which must be preprocessed:

  • file.cc
  • file.cp
  • file.cxx
  • file.cpp
  • file.c++
  • file.C

We have always preferred the .cpp extension for C++ source files in our projects and when teaching C++. So to that end, in the following examples, I will use .cpp for regular C++ source files and .cxx for module files. This is nothing more than a personal preference.

Notably, Microsoft has chosen to use the extension .ixx for module interfaces (see link). We could use file.ixx but, with GCC, would need to use the -x c++ file.ixx directive to specify the file should be treated as a C++ file. Rather than use the additional complication, using .cxx means it will be treated as a standard C++ file by GCC.

To make the original file func.cpp into a module (func.cxx), we add the line

export module MODULE-NAME;

e.g.

// func.cxx
#include <iostream>

export module mod;

void func() {
    std::cout << "hello, world!\n";
}

However, this will not yet compile. The include statement needs to be in the global fragment. The global fragment must precede the main purview and is simply introduced using the keyword module, e.g.

// func.cxx
module;

#include <iostream>

export module mod;

void func() {
    std::cout << "hello, world!\n";
}

We can now import the module mod into main:

// main.cpp
import mod;

int main(){
    func();
}

Next we can compile func.cxx

$ g++ -c -std=c++20 -fmodules-ts func.cxx 

Note, in GCC C++20, modules are, currently, not enabled by just specifying c++20; you must also supply the directives-fmodules-ts.

As expected, the compilation generates an object file func.o. However, you will also notice that a subdirectory, gcm.cache is created, with the file mod.gcm. This is the generated module interface file used by the compile.

If we go ahead and compile main.cpp

$ g++ -c -std=c++20 -fmodules-ts  main.cpp 
main.cpp: In function 'int main()':
main.cpp:5:5: error: 'func' was not declared in this scope
    5 |     func();
      |     ^~~~

We get the error that func was not declared. If we tried to declare it in main.cpp (as before) it would build but fail to link.

So this gives us our first significant change:

In modules, all declarations and definitions are private unless exported

Officially they have Module Linkage, which differs from internal linkage. This only becomes apparent when using a multi-file module and partitions (cover in the follow-on post).

To fix this, we export the function, e.g.

// func.cxx
module;

#include <iostream>

export module mod;

export void func() {
    std::cout << "hello, world!\n";
}

The project now successfully compiles and links:

$ g++ -c -std=c++20 -fmodules-ts func.cxx
$ g++ -c -std=c++20 -fmodules-ts main.cpp 
$ g++ main.o func.o -o App
$ ./App 
hello, world!

One final detail, we still can separate out a declaration from a definition, e.g.

// func.cxx
module;

#include <iostream>

export module mod;

export void func();

void func() {
    std::cout << "hello, world!\n";
}

I’m not sure this is of much benefit, but I guess comes down to style.

Separate Interface and Implementation files

At some point, or because of personal preference, we may choose to split our module into multiple files to make it more manageable and help rebuild times.

Each file pertaining to a module is called a Module Unit. We are going to create two units:

  • A Primary Module Interface Unit (PMIU)
  • A Module Implementation Unit

Primary Module Interface Unit

Each named module must have one, and only one, Primary Module Interface Unit. This is the revised file func.cxx that contains the statement:

// func.cxx

export module MODULE-NAME;

and our other export statements, e.g.


export module mod;

export void func();

That’s all we need; we have named a module mod that exports a single function func. We can go ahead and compile this unit:

$ g++ -c -std=c++20 -fmodules-ts func.cxx

As before, this generates func.o and gmc.cache\mod.gmc.

Module Implementation Unit

There currently is no idiomatic naming conversion, so I have gone with func_impl.cxx, but it could be any filename/extension you prefer. You cannot use func.ixx as it will also generate an object file func.o which will overwrite that func.cxx generated object file.

An implementation unit contains the line:

module MODULE-NAME;

Note it does not have the export keyword. This implicitly makes anything declared/defined in the PMIU available in the implementation unit (the opposite is not true). Note, implementation units cannot have any export statements.

// func_impl.cxx
module;

#include <iostream>

module mod;

void func() {
    std::cout << "hello, world!\n";
}

And there we have it. This implementation unit can now be compiled:

$ g++ -c -std=c++20 -fmodules-ts func_impl.cxx 

Which generates func_impl.o – we have two module object files as part of our linkage, but of course, this also means the changes to the module implementation do not require a recompile of clients (mimicking the behaviour of included headers).

$ g++ main.o func.o func_impl.o -o App
$ ./App 
hello, world!

With GCC, the interface unit must be compiled before the implementation unit.

export

So far, we have only exported a single function. Assuming we have multiple functions, we want to export, the standard allows for several options.

Export per function

For each function we want to export, we simply prepend the export keyword to add it to the interface.

// func.cxx

export module mod;

export void func();
export void func(int);

Export block

Alternatively, we can group many declarations into an export block, e.g.

// func.cxx

export module mod;

export {
    void func();
    void func(int);
}

Namespace

As mentioned earlier, C++ namespaces are orthogonal to modules. Again, I will admit this initially also caused me some confusion. Not, in so much, as the syntax, more the general philosophy of using namespaces and modules; where each one fits in an architectural structure. This is possibly skewed by my early background in Ada, where the two (the package) very much align.

From a practical perspective, namespaces behave as before, so there is no real change to their use, e.g.

// func.cxx

export module mod;

namespace X {
    export void func();
    export void func(int);
}
// func_impl.cxx
module;

#include <iostream>

module mod;

namespace X {
    void func() {
        std::cout << "hello, world!\n";
    }

    void func(int p) {
        std::cout << "hello, " << p << '\n';
    }
}
// main.cpp
import mod;

int main(){
    X::func();
    X::func(42);
}

Export namespace

Alternatively, if we export a namespace, all declarations within that namespace are automatically included in the module’s interface, e.g.

// func.cxx

export module mod;

export namespace X {
    void func();
    void func(int);
}

Exporting Types, etc.

Anything we require as part of the module interface must be exported. For example, if function takes an object reference as a parameter, the normal type definition visibility rules apply, e.g.

// func.cxx
export module mod;

export class S {
public:
    S() = default;
    explicit S(int p):val{p}{}
    int get_val() const;
 private:
    int val{};
};

export void func(const S&);

Or

// func.cxx
export module mod;

export  {
    class S {
    public:
        S() = default;
        explicit S(int p):val{p}{}
        int get_val() const;
    private:
        int val{};
    };

    void func(const S&);
}
// func_impl.cxx
module;

#include <iostream>

module mod;  // implicitly import everything in PMIU

void func(const S& ptr) {
    std::cout << "hello, " << ptr.get_val() << '\n';
}

int S::get_val() const {
    return val;
}
// main.cpp
import mod;

int main(){
    S s{10};
    func(s);
}

There are a whole host of rules and exceptions regarding exporting items, such as templates and ADL evaluation. I’m not going to get into those here as it becomes very use-specific.

Includes

In the example, we can see we’ve used the traditional preprocessor directive #include to include the standard library header iostream. The standard permits the following:

import <iostream>;
import "header.h";

This is supported by GCC11, but there are some hoops you have to jump through to first make a user-defined header importable (see the -fmodule-header directive).

If you have a header-only file, e.g.

// header.h
#ifndef _HEADER_
#define _HEADER_

constexpr int life = 42;

#endif

and you want to import; you first need to compile it, e.g.

$ g++ -c -std=c++20 -fmodule-header header.h 

This generates a header.h.gcm file. The header can now be imported using the directive

import "header.h";

note the ;

In addition, Microsoft has already wrapped the standard library up in a module structure, so you may see the following:

import std.core

in Microsoft specific examples.

Summary

Hopefully, this will give you a feel for the foundations of C++20 modules and enough to go and experiment. I believe that the single-file and interface/implementation models will accommodate most people initial uses of modules.

In the follow-up post, I will cover the more complex ability to split a module’s interface into multiple files (called partitions). Using partitions, on the surface looks straightforward, but appears to open a pandoras’ box of fun and games.

My initial reaction to modules is that they are overly complex, but I put that down to my background and mental model having used module concepts in other languages. In addition, much of the coverage of modules does delve down into the complications of using partitions without laying out the basics.

I’m sure over the next couple of years, as support for modules improves, we’ll find an idiomatic approach to using modules; even if we can just get a consistent file naming convention.

In the deeply embedded space, we have only recently seen the release of GCC10 for Arm, so I can imagine it may be sometime before GCC11 can be used in our target project. Until then, I will continue to experiment with modules and partitions on the host.

The example code can be found here

Next post: C++20 Module Partitions

 

Latest posts by Niall Cooling (see all)
Dislike (0)
Website | + posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.

About Niall Cooling

Co-Founder and Director of Feabhas since 1995. Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking. His current interest lie in IoT Security and Agile for Embedded Systems.
This entry was posted in C/C++ Programming and tagged , . Bookmark the permalink.

6 Responses to C++20 modules with GCC11

  1. _rf says:

    Just a heads-up while I've only read the introduction:

    > I’d recommend reading this post first to understand the challenges.

    seems to be missing a link 😉
    [plus I kind of expected the "this" to be clickable too, I suppose?]

    Like (1)
    Dislike (0)
  2. Thanks, added link

    Like (0)
    Dislike (0)
  3. Paul Topping says:

    When I read that you were going to give us the "why?" of modules, I was overjoyed, but you never really got to it. I guess you suggested that it would reduce build times but many people do projects where build time is not that big an issue. Perhaps modules don't give much of an advantage except on very large projects. You mention that headers can be a mess but fail to mention how modules fixes this. Perhaps I just missed it.

    To me, a module system implies that one can reduce dependencies among various parts of a larger program. If so, what dependencies are eliminated? I expect to see sentences of the form, "Now that we've broken our example program into several modules, I can change X, rebuild, and all the build system has to do is Y, which is much less than our example without modules." Am I expecting too much?

    If this sounds harsh, I don't mean it to be. I imagine people working with modules are too close to them to see how hard it is for people like me to wrap their heads around them.

    Like (2)
    Dislike (0)
  4. Hi Paul,
    Thanks for the feedback. In retrospect you're right (constructive criticism is always welcome) - I didn't bring out the "why" clearly enough.
    In terms of Software Engineering, then the most important change is defaulting of module linkage to "private" over "public". This should help reduce module coupling (see our webinar for further coverage). It may seem trivial, but it ideally forces the module designer to actually think about "why" an artifact is being exported. This, in itself, won't stop poor software (or to that matter really effect existing good software), but think "marginal gains". It will also reduce the prospect of link-time error based on ODR (One Definition Rule0.
    We should, hopefully, see some build time improvements - mostly through the elimination of repeated large header parsing across translation units (there is a paper from Google correlating post-processed translation unit sizes to build times, but I can't find a reference at the moment - most enlightening).

    - Niall

    Like (1)
    Dislike (0)
  5. Paul Topping says:

    Thanks for the reply. So far I have two potential advantages of using modules in a project:

    * Faster build times.

    * Better interface control through hiding of symbols. If I understand this correctly, if I have some public symbols in my module, they still can't be accessed by code outside the module if they aren't placed in the module interface. If I have this right, it would be good to see examples. Are their practical cases where a symbol needs to be public within the module but it is undesirable to expose it outside the module where errant code could make use of it or the symbol's name clashes with one outside the module.

    Are their any others? Perhaps avoiding symbol clashes should be its own item. Of course, the code using the module might be developed by someone different than the one that developed the module. Are their cases where packaging some code as a module makes it more useful to others who might want to access the functionality it packages? Are there recommendations for using modules to distribute code? As I understand it, modules is NOT a way to distribute binaries except in very controlled environments. It would be nice to see what the rules are for such controlled environments.

    I realize I'm outlining the article that I'm looking for and not necessarily the one you want to write. I'm still surprised no one has written such an article. While I understand the initial focus is on understanding the C++ standard but it would seem answering such questions would be part of the motivation for adding modules to the standard. Perhaps it was but the standards people are too close to the subject to believe it needs to be stated.

    Sorry for such a long comment. Thanks for the article.

    Like (0)
    Dislike (0)
  6. Hi Paul,

    I am doing a 2nd part to cover multi-file modules (partitions) which shall cover some of this.

    Symbol clashes, not really - this is still driven by namespaces and name-encoding/mangling rules.

    I might add a 3rd article discussing these items. However, we are still in the early days of modules and I'm not sure any of us are clear about how build systems, such as CMake, are going to handle modules as components. Ideally, yes, a module should act as a distributable item/artifact; but how this will happen looks like it will be *very* toolchain specific.

    - Niall.

    Like (1)
    Dislike (0)

Leave a Reply