"Confidential Computing" for Everyone: Getting Started with Occlum and Its Related Technologies

By Hongliang Tian, Senior technical expert of Ant Group and head of Occlum open source.

Cloud computing, big data, and artificial intelligence; we are in an era of data explosion. How can we enjoy and use the value generated by massive data while ensuring data security and user privacy? This is undoubtedly a common concern for users, enterprises, and regulatory authorities.

Confidential computing has emerged in recent years and aims to solve this problem. Confidential computing keeps data encrypted and strongly isolated at all times using trusted execution environment (TEE) technology, thus ensuring the security and privacy of users' data. Confidential computing can solve the trust problem in many scenarios, including data fusion and joint analysis between multiple untrusted organizations, confidentiality protection of smart contracts on the blockchain, and public cloud platforms' defense against external or internal attacks, and security protection of highly sensitive information (such as cryptographic materials and medical files).

Confidential computing relies on TEE technology, such as Intel SGX, the most mature cloud TEE technology, but it brings additional functional limitations and compatibility problems. This causes a huge obstacle to the developers of confidential computing. Application development is difficult.

This article analyzes the challenges and pain points currently encountered by SGX application developers and how Occlum, an in-house open-source TEE OS system developed by Ant Group, lowers the threshold for SGX application development to help everyone take advantage of confidential computing.

1. Why Is SGX Application Development Difficult?

SGX application is a partition-based model. The area protected by SGX TEE (the green part above) can be embedded in the (untrusted) application of the user mode (the red part above). The area protected by SGX TEE is called an enclave. The Intel CPU that supports SGX guarantees that the protected content in the enclave is encrypted in memory and is strongly isolated from the outside world. If external codes want to enter the enclave to execute its trusted code, it must pass through a specified entry point, which can perform access control and security checks to ensure the enclave cannot be abused by the outside world.

Since SGX applications are based on this partitioned architecture, application developers usually need to use some SGX SDKs, such as Intel SGX SDK, Open Enclave SDK, Google Asylo, or Apache Rust SGX SDK. However, no matter which SDK is used, developers will encounter the following difficulties in their development:

Developers have to classify the components of the target application: They need to decide which components should be placed inside the enclave, which should be placed outside the enclave, and how the two groups communicate. Determining an efficient, reasonable, and safe division plan is a challenging task for complex applications, not to mention the efforts required to implement the division.
Restricted to a certain programming language: No matter which SDK is used for development, a developer will be limited to the language supported by the SDK. This usually refers to C/C++ when using Intel SGX SDK, Open Enclave SDK, or Google Asylo. Developers cannot use Java, Python, Go, and other more friendly programming languages.
Only very limited functions can be obtained: Due to hardware limitations and security considerations, developers cannot directly access the (untrusted) OS outside the enclave from the enclave. Due to the lack of OS support in the enclave, various SDKs can only provide a small subset of functions in ordinary and untrusted environments. This causes a problem that many existing software libraries or tools cannot run in an enclave.

The dilemma above is quite tricky when developing applications for SGX, restricting the popularity and acceptance of SGX, and confidential computing.

2. Learn Occlum's Three Commands

Occlum is an open-source TEE OS of Ant Group, which can lower the development threshold of SGX applications. We need to learn three commands in Occlum: new, build, and run. This section uses Occlum to run a Hello World program in SGX.

Here is a very simple Hello World program:

$ cat hello_world.c
#include <stdio.h>
int main() {
    printf("Hello World!\n");
    return 0;
}

First, we compile the program with the GCC toolchain (occlum-gcc) provided by Occlum and verify that it works properly on Linux:

$ occlum-gcc hello_world.c -o hello_world
$ ./hello_world
Hello World!

Then, we create an Occlum instance directory (use occlum new command) for this program:

$ occlum new occlum_hello
$ cd occlum_hello

The command creates a directory named occlum_hello and prepares some necessary files (such as configuration file Occlum.json ) and subdirectories (such as image/) in the directory.

Next, we will make an Occlum enclave file and a trusted image (using occlum build command) based on the newly compiled hello_world:

$ cp ../hello_world image/bin
$ occlum build

Finally, we run the hello_world in SGX (using occlum run command):

$ occlum run /bin/hello_world
Hello World!

More complex programs can also be ported into SGX through Occlum using a process similar to the one listed above. Users can freely choose their programming language, such as Java, Python, and Go, to modify the application code (or only modify a small amount of application code) without understanding the dichotomous programming model of SGX. Occlum allows application developers to focus their efforts on writing applications rather than porting them for SGX.

3. TEE OS - Works Similarly to Docker

After understanding Occlum's basic usage, readers will naturally be curious about the technical principle of Occlum. Why is Occlum's user interface designed like this? What is the technical architecture behind the simple interfaces? This section tries to answer these questions.

One of Occlum's design concepts is Enclave-as-a-Container. In the cloud-native era, containers are of paramount importance, and containers are everywhere. The most common implementation of containers is Linux-based cgroup and namespace (such as Docker), but there are also virtualization-based implementations (such as Kata). We have observed that TEE or enclave can also be used as a container implementation method. Therefore, we purposefully designed Occlum's user interface to be close to Docker and OCI standards to provide a consistent user experience. In addition to the aforementioned new, build, and run commands, Occlum provides commands, such as start, exec, stop, and kill, which have a similar meaning to the commands with the same names in Docker.

Complex implementation details are behind a simple user interface. In order to describe the technical principles of Occlum at a higher level, we will discuss them from the perspectives of a trusted development environment and untrusted deployment environment.

In a trusted development environment (the upper part in the figure above), users use occlum build to package and make trusted images. Merkel Tree is used to ensure that the trusted images cannot be tampered with by attackers after being uploaded to an untrusted deployment environment. The content of the trusted image is the rootfs loaded when Occlum starts. The organizational structure is similar to the usual Unix operating system, and the content is determined by the user.

In an untrusted deployment environment (the lower part in the figure above), users use occlum run to start a new Occlum enclave. Occlum TEE OS in the enclave will load and execute corresponding applications from trusted images. Occlum provides Linux-compatible system calls to applications, so applications can run in an enclave without modification (or only a few modifications). The memory state of applications is protected by an enclave, and the file I/O of applications is automatically encrypted and decrypted by Occlum. This way, the confidentiality and integrity of the data in the internal and external storage of the application can be protected at the same time.

4. More Efficient, More Powerful, More Secure, and More Content

In addition to providing container-like, user-friendly interfaces, Occlum has three main features:

Efficient Multi-Process Support: Occlum implements a lightweight process. Compared with the most advanced open-source TEE OS (Graphene-SGX), the process startup speed is 10-1000 times faster, and the throughput of inter-process communication is three times higher (please see our paper for details).
Powerful File Systems: Occlum supports a variety of file systems that protect integrity and confidentiality, memory file systems, and host file systems to meet various file I/O requirements of applications.
Memory Security Guarantee: As the world's first TEE OS developed in the Rust language, Occlum reduces the number of memory security problems. (According to statistics, 50% of Linux's security vulnerabilities are related to memory security.) Therefore, Occlum is more trustworthy.

The links below provide more information:

Community

"Confidential Computing" for Everyone: Getting Started with Occlum and Its Related Technologies

1. Why Is SGX Application Development Difficult?

2. Learn Occlum's Three Commands

3. TEE OS - Works Similarly to Docker

4. More Efficient, More Powerful, More Secure, and More Content

Read previous post:

Read next post:

OpenAnolis

You may also like

Comments

OpenAnolis

Related Products

Managed Service for Prometheus

ECS(Elastic Compute Service)

Super Computing Cluster

Elastic High Performance Computing