Alibaba Cloud Linux or Alinux is currently the most widely used operating system on Alibaba Cloud. In 2021, OpenAnolis released the official version of Anolis OS 8 based on the Alinux product. In this article, Zezheng Li, a development engineer of Alibaba Cloud Intelligence Group, uses Alinux as the operating environment to explain the advantages of modules over traditional header files. He also uses several examples to show how to organize a C++ module project and use modules to encapsulate third-party libraries or transform existing projects. In addition, he will introduce the application of modules in internal projects at Alibaba Cloud, the chairman unit of the OpenAnolis community. The C++20 modules code has been running stably on the Alibaba Hologres mainline for more than one and a half years, reducing the compilation time by 42%.
Modules are one of the four important features of C++20, alongside coroutines, ranges, and concepts. It introduces the concept of modules to C++, allowing users to import modules for the organization of projects, thus greatly improving compilation speed and encapsulation.
Alibaba Cloud Compiler is a C++ compiler developed by the Alibaba Cloud Compiler team. It is developed based on the open-source version from the Clang/LLVM community. It provides robust support for coroutines and modules and actively contributes the code back to the upstream community. The compiler has made substantial contributions to enhancing Clang's support for modules.
Let's look at a basic comparison first. The following is a C++ hello world code. The program will simply print Hello world.
#include <iostream>
int main() {
std::cout<<"Hello world!"<<std::endl;
}
The traditional header file is written by using the #include
preprocessor syntax:
In the debug mode, we compile a similar demo code:
The test results are shown above, taking approximately 1.2 seconds.
As shown, when compiling similar demo code, a significant amount of time is spent processing header files. In this example, almost all of that time is consumed by the front-end tasks of the compiler, such as preprocessing and performing lexical, syntactic, and semantic analysis.
After C++20, we can use module syntax to import the standard library.
import std;
int main() {
std::cout<<"Hello world!"<<std::endl;
}
As shown, the entire standard library can be imported with just import std
, without the need to precisely include the corresponding iostream module. Although we import the entire standard library, the entire compilation process remains very fast. This is because the code in the header file has been precompiled, so there is no need to compile it again when importing the module.
The test takes about 0.03 seconds, which is an order of magnitude improvement over using header files. The use of modules greatly accelerates the compilation time. The proportion of time spent on the front end is relatively reduced because we do not need to preprocess the contents of the standard library and perform lexical, syntactic, and semantic analysis on them. We only need to deserialize the compiled module artifacts.
C++ header files have many drawbacks, and modules address these issues with significant improvements:
C++ users often complain about the slow speed of C++ compilation, one reason for which is the repeated compilation of header files.
Next, we will briefly introduce the issue of repeated compilation in C++:
#include<string>
void split(std::string& str)
{
//...
}
As shown in the above figure, if there are M source code files similar to src.cpp
, each of which contains the string header file. This is a common scenario in projects. Therefore, during the entire project build process, we have to preprocess and compile this code M times, generating M times of string-related assembly, which is very time-consuming.
Let's assume a worst-case scenario. There are N header files and M source code files in the project, and each source code file contains all N header files, so the compilation time complexity of the entire project becomes O(N*M). This is one of the reasons why large-scale C++ projects build slowly.
After modules are introduced, module files, as compilation units themselves, are compiled independently without being preprocessed into each source code file for repeated compilation. If there are M source code files and N modules, the time complexity of compilation is only O(N+M).
Previously, we discussed a scenario where the entire project was compiled and analyzed the reasons for the slow compilation of the header file in this scenario. Another common scenario is that developers perform secondary compilations during development by modifying a small amount of code and then recompiling it (the cache can be reused). In this scenario, modules still have a significant advantage in compilation speed.
Consider the following code:
#include <mylib.h>
int main() {
mylib::cout<<"Heloo!";
}
After writing this, you notice that Hello
is misspelled. To fix it, you need to modify the string and then recompile it.
#include <mylib.h>
int main() {
mylib::cout<<"Hello!";
}
Ideally, since only the string content was modified, the second compilation should be much faster than the first one. This is because the second compilation should be able to leverage the results from the previous compilation and only need to recompile the main function.
However, since the header file is not regarded as a compilation unit (source code file) that can be compiled independently, the compiler will not compile the mylib.h
independently. This means that we cannot reuse the results of the first compilation, making the second compilation time roughly the same as the first.
Modules can be compiled. In the example above, if import mylib;
is used instead of a header file, the speed of the second compilation will be much faster. This is because mylib is a code unit that can be compiled independently. When we modify the main function and compile it again, the build system will find that the content of mylib has not changed, so there is no need to recompile it.
At this time, we only need to recompile a main function, which is the fundamental reason why the recompilation speed of the module is much faster than that of the header file. Header files have to recompile all the included code, while modules can simply reuse the intermediate results from the previous independent compilation of mylib.
Another drawback of header files is that they are not "hygienic" and can be affected by external code.
As shown in the above figure, we #include
the three header files A, B, and C in sequence in src.cpp
. Since #include
only copies and pastes the contents, the contents in B will be affected by A and the contents of C may be affected by A and B, following the order in which they are included.
This leads to several issues:
• Order dependency: It may be necessary to include header files in a specific order; otherwise, errors can occur, increasing the complexity of usage.
• Hindrance to parallel compilation and precompilation: Affected by other code, A, B, and C cannot be compiled in parallel. Additionally, it is challenging to independently precompile header files because we can only determine the context of header files after it is included. This greatly limits the compilation speed.
• Potential conflicts: For example, if header file A defines a macro that happens to have the same name as one used in B, then the contents of B could be unexpectedly corrupted. A classic example is the max macro on Windows platforms interfering with the std::max function.
Modules do not mechanically copy and paste code through preprocessing. Instead, each module is compiled independently, thus ensuring that the contents of the module are not affected by external code. That is why modules are "hygienic".
Header files have another significant drawback, that is, all declarations within them are exposed to the outside. For example, consider a mylib header file that contains the header files of a third-party library and some internal implementation details. We may not want to expose these details to users. However, in header files, these symbols are visible to users, and we have no control over which symbols are exposed. As a result, they may affect users' code, and users may mistakenly use an interface that is not intended to be exposed.
For modules, we can use the export
keyword to specify which declarations should be exposed externally. Declarations not marked with export
remain hidden from external users by default, thus solving the above problem.
This section briefly describes several new declaration syntaxes introduced by modules and demonstrates how to use these syntaxes through examples.
Modules reclaim the export
keyword, and declarations marked with the export
keyword are exported for use by external code outside the modules.
Declarations without the export keyword are invisible to external users, and thus cannot be used by them.
// Hello.cppm
export inline int a; // Export a variable.
export void foo(); // Export a function declaration.
void bar(); // The declaration is not exported.
export void foo(){…} // Export a function implementation.
export class A {}; // Export a class.
export enum B {}; // Export an enumeration.
export namespace my_lib {}; // Export a namespace.
export template<typename T> C{}; // Export a template.
export using std::max; // Export using declaration.
export using D=std::vector<int>; // Export an alias.
// main.cpp
import Hello;
int main() {
foo(); // Use the exported declaration from the module.
// bar(); compilation error! Cannot use the non-exported declaration!
}
Note that certain items cannot be exported:
1. Macros: Macros only exist in the preprocessing stage, so a module cannot export a macro. Similarly, macros in external code do not affect the code inside the module.
2. using namespace declarations: Exporting such declarations as export using namespace std;
is not advisable and can easily disrupt the expected behavior of external code. As a result, C++ modules simply prohibit exporting such declarations. (Of course, you can still use them internally without the export
keyword, which will not affect the external code.)
Interestingly, as shown in the code below, in some cases, we can indirectly use invisible declarations that are not exported anonymously:
// Hello.cppm
struct my_string {
//...
};
export my_string hello();
export void hi(const my_string&);
// main.cpp
import hello;
int main() {
// my_string str = hello(); You can't write like this! Because my_string is invisible.
auto str = hello(); // However, we can use invisible types anonymously through automatic derivation of types.
hi(str);
}
A module can be composed of multiple module units, with each module unit corresponding to a code file. The file usually uses a special suffix like cppm
or ixx
.
In a module unit, there is only one module declaration, indicating which module this unit belongs to.
module Foo; // Declare a module named Foo.
export module Foo.Bar; // Declare a module named Foo.Bar.
module Foo.Bar.Gua; // Declare a module named Foo.Bar.Gua.
Note that the .
symbol in module names has no special semantic meaning from a language perspective. It does not ensure a hierarchical relationship between Foo.Bar
and Foo
. You need to manage these relationships yourself.
Based on the difference of Module Declaration
, module units can be classified into:
1. Interface unit: A module can only have one interface unit, and only in this unit can export declaration
be used to expose interfaces externally.
2. Implementation unit: A module can have multiple implementation units, in which export declaration
cannot be used. For example, in the following code, as one interface unit corresponds to two implementation units, we can split the implementation across multiple files.
// Foo.cppm
export module Foo; // interface unit
// Foo_impl1.cpp
module Foo; // implementation unit
// Foo_impl2.cpp
module Foo; // implementation unit
Generally speaking, you do not need to separate the declaration from the implementation in modules as you would in traditional header-based projects. However, there are still some special cases. For example, for assembly, we may want to separate the declaration from the implementation and choose different implementations on different platforms through the build tool.
We can still simulate the separation of declaration and implementation in traditional C++ projects through interface units and implementation units:
// Interface.cppm
// Interface Unit
export module thread;
class thread_context;
void switch_in(thread_context* to);
void switch_out(thread_context* from);
// Impl.cpp
// Implementation Unit
module thread;
class thread_context;
{
//define something
}
void switch_in(thread_context* to)
{
//do something
}
void switch_out(thread_context* from)
{
//do something
}
For example, the above code simulates the context switching of a thread.
The interface file of the module declares the thread_context
class, and switch_in
and switch_out
functions. The corresponding implementation unit of the module on the right side implements the details of the thread_context
class and the functions.
Since the corresponding implementation details may vary across different platforms, we can prepare multiple different implementation units and select the appropriate one for compilation based on the target platform.
We can split a module into multiple partitions. Note that a partition is not an independent module and cannot be used alone.
export module Foo.Bar:part1; // Declare a partition belonging to the Foo.Bar module.
export module Foo.bar:part1:part2; // Invalid! Partitions cannot be nested!
module Foo.Bar:part1; // Partitions also have the distinction between interface units and implementation units.
Next, we will introduce the import
declaration of the module, through which we can import other modules.
// main.cpp
import std; // Import the standard library module.
import foo; // Import the foo.bar module.
import foo.bar:part1; // Invalid! Cannot import a partition module of another module.
// foo.cppm
export import foo.bar; // Import the foo.bar module and re-export it to external users.
import std; // Import the std module, but do not expose it to external users.
// foo.bar.cppm
export import :part1; // Import the std module, but do not expose it to external users.
After introducing Export Declaration
, Module Declaration
, and Import Declaration
, we can finally present a complete example of module code and analyze its basic structure.
/*------------Global Module Fragment ------------*/
module;
#include "util.hpp" // Include the header file in module code
/*-------------Module Declaration----------------*/
export module http.client; // The name of this module is http.client.
/*-------------Import Declaration----------------*/
import std; // Import other modules.
import asio;
export import cppjson;
import openssl;
/*------------------ User code ----------------------*/
namespace http
{
namespace detail
{
class helper // Not prefixed with export, so it will not be exposed to external users.
{
//………
}
}
// Export Declaration
export enum class status
{
OK,
NotFound,
//………
}
export class client
{
tcp::socket soc;
//………
};
export int foo();
}
As shown in the preceding code, module code can be divided into several sections.
Unlike the named modules
, a Header Unit
can also be imported and behaves similarly to modules, but it is fundamentally different because it is merely a serializable header file.
The module syntax mentioned above mainly comes from Microsoft's proposal in 2014. However, Google identified a critical flaw in this design, that is, it could not export macros. Therefore, Google put forward a solution: the Header Unit
.
For pure compatibility purposes, the effect of the Header Unit
is exactly the same as the #include
header file (code will introduce macros and be affected by macros). However, this means the compiler can only perform limited preprocessing on the Header Unit
, significantly restricting potential improvements of compilation speed.
import <iostream>;
// The effect is exactly the same as that of # include <iostream>.
// Macros are introduced, and code in iostream may also be affected by macros.
int main()
{
std::cout<<"Hello world"<<std::endl;
}
Here is a summary of the types of modular units:
1. Interface Unit: A module can only have one interface unit, where symbols are declared to be exposed to external users.
2. Implementation Unit: A module can have multiple implementation units, where the code can be implemented.
3. Partition Unit: A main module can contain multiple partition units, which are not independent and cannot be exported for external use. They are part of the main module. Note that the partition unit can also be divided into interface units and implementation units, which are orthogonal (so there are 2*2=4 possible scenarios).
4. Header Unit: A special unit that is essentially not a module but a serializable header file.
Modularization brings many changes:
The figure above shows a traditional C++ project architecture, where each translation unit (*.cpp) is compiled independently without affecting others. The header file is included and compiled multiple times.
Since main.cpp
uses the contents from foo
and bar
, it will include the header files of foo
and bar
. However, each cpp file still does not interfere with each other during compilation. Units can be compiled in parallel without any interdependencies between them.
However, there are dependencies between module code, as one translation unit may depend on the compilation results of other translation units.
The figure above illustrates the transition from a header-based project to a module-based project.
Since main.cpp
uses the content from foo and bar, each module needs to be compiled in advance and cannot be directly included. Therefore, we need to compile the foo and bar modules first, completing the compiler's front-end processing, before we start compiling main.cpp
. After modules are introduced, there are dependencies between code units, which is the biggest change in the C++ project.
Module wrapper is a special technique that aims to convert old-style header files into standard C++ module files through a simple module encapsulation layer, thus achieving compatibility between the two.
One approach to this technique is to use export using declaration
. We can quickly encapsulate existing header files into modules without affecting old code.
// iostream.cppm
module;
#include<iostream>
export module iostream;
namespace std {
export using std::cin;
export using std::cout;
export using std::endl;
}
// main.cpp
import iostream;
int main() {
std::cout<<"Hello Module Wrapper"<<std::endl;
}
In the code above, we made a simple standard library module by including standard library header files in the global module fragment
and then using export using declaration
to export these declarations.
Through this technique, you can encapsulate the existing header file code into the module by introducing a simple intermediate layer without modifying the header file. Many modular standard libraries are implemented in this way. For example, the async_simple library uses this technique to simply encapsulate a standard library module implementation.
Here is another approach to encapsulation. Through export extern c++
and module control macros, the approach achieves exporting all the symbols in the header files at once, without manually enumerating these declarations. This approach ensures compatibility with both header-based and module-based code. For example, the fmt library uses this approach to support both headers and modules.
// hello.hpp
#ifdef HELLO_USE_MODULE // Control this macro to determine whether to use modules.
import std;
#else
#include<iostream>
#include<vector>
#endif
void hello() {
// ...
}
// hello.cppm
module;
export module hello;
export extern "C++" {
#define HELLO_USE_MODULE
#include<hello.hpp>
}
First, we modify the original header file. In the hello.hpp
, use the control macro HELLO_USE_MODULE
to determine whether to use a module or a header file.
Next, in the interface file of the hello module, by using the export extern c++
syntax, including the corresponding header file in the code segment, and enabling the corresponding macro, we can include the entire hello.hpp
code within the module and export it for external use.
In the module refactoring process introduced earlier, there are repetitive tasks involved. To reduce the workload, we have experimentally provided some automation tools. These tools may be integrated into the Clang mainline in the future.
The tools can perform the following tasks:
• Automatically insert macros to control the project whether to use headers or modules.
• Automatically scan dependencies between header files (though not perfectly accurate).
• Automatically generate module wrapper files.
For more information, see clang-modules-wrapper
async_simple is an asynchronous component library provided by Alibaba Cloud. It supports C++20 stackless coroutines and C++ stackful coroutines, along with numerous asynchronous tools.
The modularization of the async_simple library started at the end of 2021, resulting in significant improvements in compilation speed.
The development teams of Alibaba Cloud computing platform Hologres and Alibaba Cloud Compiler have been actively collaborating and experimenting with modules since 2022.
Now, Hologres uses modules as the default development mode and has launched images compiled based on modules, which have been running stably for five months.
Hologres might be the first commercial project in the world to use modules in a large scale, with the following transformation results:
Alibaba Cloud Compiler is a C++ compiler that is developed by Alibaba Cloud and can be used on the Alibaba Cloud Linux system. Alibaba Cloud Compiler inherits all options and parameters from Clang/LLVM-13, is deeply optimized for Alibaba Cloud infrastructure, and provides additional features to deliver a better user experience.
Alibaba Cloud Compiler provides strong support for coroutines and modules, and actively integrates this code into the upstream community, making significant contributions to Clang's support for C++ modules.
After installing the Alibaba Cloud Linux 3 system on Alibaba Cloud ECS, you can install the ACC compiler through the package manager:
sudo yum install -y alibaba-cloud-compiler
For more information, see https://www.alibabacloud.com/help/en/alinux/getting-started/install-and-use-alibaba-cloud-compiler
This article includes five programming tasks. If you are interested in the practice, you can attempt these exercises after reading this article. Specifically, WorkShop 1 and WorkShop 2 can be completed right now. WorkShop 3 and WorkShop 4 can be completed after reading the section Introduction to Module Syntax. WorkShop 5 can be completed after reading the section Modularization.
Prepare the compilation environment:
yum install -y alibaba-cloud-compiler
.WorkShop 1: GoodBye Head File.
The link is provided below:
https://github.com/poor-circle/workshop/blob/master/work1/任务说明.md
WorkShop 2: Hello world, C++modules.
The link is provided below:
https://github.com/poor-circle/workshop/blob/master/work2/任务说明.md
WorkShop 3: Write a single module.
The link is provided below:
https://github.com/poor-circle/workshop/blob/master/work3/任务说明.md
WorkShop 4: Write multiple module units.
The link is provided below:
https://github.com/poor-circle/workshop/blob/master/work4/任务说明.md
WorkShop 5: Convert a traditional header project to a module.
The link is provided below:
https://github.com/poor-circle/workshop/blob/master/work5/任务说明.md
At the 2024 OpenAnolis Conference, the author was invited to participate in a technical sharing workshop. In the workshop, he guided participants through hands-on exercises where they used the Alibaba Cloud Compiler on Anolis OS to transform a traditional C++ project into a module-based project, experiencing the benefits and conveniences of using modules. Modules not only eliminate the need to include header files but also improve compilation speed.
How to Deeply Use yaLanTingLibs Compile-time Reflection Library in Development
90 posts | 5 followers
FollowAlibaba Clouder - April 8, 2019
amap_tech - December 2, 2019
Alibaba Cloud Native Community - September 7, 2020
OpenAnolis - April 11, 2022
Alibaba Tech - August 29, 2019
Alibaba Cloud Data Intelligence - November 28, 2024
90 posts | 5 followers
FollowAlibaba Cloud Linux is a free-to-use, native operating system that provides a stable, reliable, and high-performance environment for your applications.
Learn MoreExplore Web Hosting solutions that can power your personal website or empower your online business.
Learn MoreA low-code development platform to make work easier
Learn MoreExplore how our Web Hosting solutions help small and medium sized companies power their websites and online businesses.
Learn MoreMore Posts by OpenAnolis