The rise and development of confidential containers
The confidential container is a Sandbox project of CNCF, which is used to solve the data security problems in the cloud native scenario, meet the innovative IP protection requirements such as data compliance, data privacy protection, algorithms and models, and the use requirements such as data availability but not visibility, as well as solve the trust and dependence problems of cloud vendors.
The confidential container has the following characteristics:
1. Safety. The confidential container is based on the hardware trusted execution environment to protect the data in the container. Cloud vendors and third parties with high permissions cannot directly steal and tamper with the data in the container.
2. Ease of use. User applications can be migrated from traditional container environment to confidential container environment without any modification.
3. Be able to solve the problem of trust and dependency between tenants and cloud vendors. Tenant data is no longer transparent to cloud vendors.
4. Self-certification. Users can verify the authenticity of the current container environment through remote authentication and other means.
The security of the secret container depends on the trusted execution environment of the hardware to a large extent. It is based on the hardware to protect the confidentiality, integrity and security of the running data.
In recent years, many hardware manufacturers have also launched their own TEE technology solutions, such as Intel ® SGX and TDX, which means that we can build confidential container technology based on a variety of hardware platforms.
Alibaba Cloud is a core participant in the confidential containers project. While participating in the development of open source projects, Alibaba Cloud has also been promoting commercial solutions for confidential containers. At present, it has completed the construction of two solutions for confidential containers. One is a POD-level secret container, and the other is a process-level secret container. POD-level confidential container refers to putting the contents of container POD into TEE for protection; Process-level confidential container refers to the container process running sensitive business that is placed in the TEE for protection.
While using CPU TEE to protect the security of running data, we also combine a series of security technologies, such as image security, storage security, remote authentication, network security, to provide users with a full link security guarantee from application deployment to execution.
At the same time, we introduced the confidential container to the dragon lizard community and built an open source, out-of-the-box solution based on the dragon lizard open source ecosystem. At present, we have completed the adaptation of ANCK, KVM, and Rund security containers to confidential containers. The purpose of building open source solutions is to find more landing scenarios for confidential containers through more convenient and in-depth cooperation between the open source community and partners.
Intel and Alibaba Cloud are fully aware that in addition to paying attention to basic software, application and ecology are also very important to promote the development and popularization of confidential container technology. The core value and capability of confidential computing is to provide protection for high-value businesses or sensitive data. BigDL PPML is a typical application.
BigDL is an open source AI solution platform of Intel, which can facilitate data scientists and data engineers to easily develop a set of end-to-end distributed AI applications. In addition, BigDL has launched PPML (Privacy Protection Machine Learning) specifically for confidential computing, which can realize end-to-end full-link protection for distributed AI applications.
The PPML architecture is shown in the figure above. Intel provided in the K8s cluster at the bottom ® TDX and Intel ® SGX trusted execution environment. Through a series of hardware and software underlying security technologies, users can use standard AI and big data processing software such as Apache Spark, Apache Flink, TensorFlow, PyTorch and other familiar tools to develop their own applications without exposing private data.
On top of this, PPML also provides two distributed pipelines, Orca and DLlib. Orca is based on the AI framework API, which enhances the processing ability of distributed big data, while DLlib can help programmers transform distributed deep learning applications into Spark applications. In addition, BigDL also provides trusted big data analysis, trusted machine learning, deep learning and federated learning applications.
As shown in the figure above, BigDL PPML is based on the trusted Kubernetes cluster environment. Through the confidential container technology, it can build a TDX-based distributed executable environment, so as to ensure the security of business, data and models in the use and calculation process, including invisibility and immutability.
From the perspective of data flow, all data are stored in the data lake and data warehouse in an encrypted way. BigDL PPML loads these confidential data, obtains the data key through remote authentication and key management system, puts it in a trusted execution environment for decryption, and then uses the computing framework of big data and artificial intelligence to carry out distributed preprocessing, model training and model reasoning for the data. Finally, write the final result, data or model back to the distributed storage in a secure or encrypted way. In addition, data is transmitted between nodes and containers by TLS, so as to achieve the privacy protection and data security of the whole link.
Running BigDL PPML workload using TDX secret container requires only two simple steps: first, build and encrypt the PPML image, and then push the encrypted image to the image warehouse. Secondly, to deploy PPML workload in Kubernetes, developers only need to specify the required secret container runtime and configured high-performance storage volume in the standard YAML file, and then use the standard Kubernetes command to pull it up.
If you look deeper, Kubernetes will schedule the workload to the target host with the ability to run the secret container:
First, TDX TEE is started when the secret container on the host is running,
Secondly, in the TDX trusted execution environment, perform remote authentication and obtain the key required to verify/decrypt the container image. The image service downloads the container image, uses the key to verify and decrypt the container image; In terms of data, users use standard K8S CSI drivers such as open-local to mount high-performance local LVM volumes for the container. The confidential container will automatically perform transparent encrypted storage to protect user input and output data.
Finally, start the BigDL PPML workload related container. A BigDL PPML driver and multiple workers are running on the K8S cluster in a distributed manner, so that you can conduct reliable cloud native big data analysis and artificial intelligence applications based on TDX.
Intel and Alibaba Cloud have always maintained close cooperation. Both of us are initiators of the CoCo upstream community. We have jointly defined, designed and implemented many key features of the CoCo software stack, such as image download in TEE, image signature verification and decryption, trusted temporary storage and measurable runtime environment, all of which ensure the strong security attributes of the CoCo project
In addition, we also have close cooperation in the dragon lizard community, including the joint implementation of TDX-based end-to-end solutions for confidential containers, including remote certification and reference applications. For example, we chose the open-local driver of the dragon lizard community, the first supported trusted storage, and the other supported the new feature of kata's direct volume, etc
Finally, Intel is also working closely with Alibaba Cloud's partners to develop Alibaba Cloud secret containers.
The confidential container has the following characteristics:
1. Safety. The confidential container is based on the hardware trusted execution environment to protect the data in the container. Cloud vendors and third parties with high permissions cannot directly steal and tamper with the data in the container.
2. Ease of use. User applications can be migrated from traditional container environment to confidential container environment without any modification.
3. Be able to solve the problem of trust and dependency between tenants and cloud vendors. Tenant data is no longer transparent to cloud vendors.
4. Self-certification. Users can verify the authenticity of the current container environment through remote authentication and other means.
The security of the secret container depends on the trusted execution environment of the hardware to a large extent. It is based on the hardware to protect the confidentiality, integrity and security of the running data.
In recent years, many hardware manufacturers have also launched their own TEE technology solutions, such as Intel ® SGX and TDX, which means that we can build confidential container technology based on a variety of hardware platforms.
Alibaba Cloud is a core participant in the confidential containers project. While participating in the development of open source projects, Alibaba Cloud has also been promoting commercial solutions for confidential containers. At present, it has completed the construction of two solutions for confidential containers. One is a POD-level secret container, and the other is a process-level secret container. POD-level confidential container refers to putting the contents of container POD into TEE for protection; Process-level confidential container refers to the container process running sensitive business that is placed in the TEE for protection.
While using CPU TEE to protect the security of running data, we also combine a series of security technologies, such as image security, storage security, remote authentication, network security, to provide users with a full link security guarantee from application deployment to execution.
At the same time, we introduced the confidential container to the dragon lizard community and built an open source, out-of-the-box solution based on the dragon lizard open source ecosystem. At present, we have completed the adaptation of ANCK, KVM, and Rund security containers to confidential containers. The purpose of building open source solutions is to find more landing scenarios for confidential containers through more convenient and in-depth cooperation between the open source community and partners.
Intel and Alibaba Cloud are fully aware that in addition to paying attention to basic software, application and ecology are also very important to promote the development and popularization of confidential container technology. The core value and capability of confidential computing is to provide protection for high-value businesses or sensitive data. BigDL PPML is a typical application.
BigDL is an open source AI solution platform of Intel, which can facilitate data scientists and data engineers to easily develop a set of end-to-end distributed AI applications. In addition, BigDL has launched PPML (Privacy Protection Machine Learning) specifically for confidential computing, which can realize end-to-end full-link protection for distributed AI applications.
The PPML architecture is shown in the figure above. Intel provided in the K8s cluster at the bottom ® TDX and Intel ® SGX trusted execution environment. Through a series of hardware and software underlying security technologies, users can use standard AI and big data processing software such as Apache Spark, Apache Flink, TensorFlow, PyTorch and other familiar tools to develop their own applications without exposing private data.
On top of this, PPML also provides two distributed pipelines, Orca and DLlib. Orca is based on the AI framework API, which enhances the processing ability of distributed big data, while DLlib can help programmers transform distributed deep learning applications into Spark applications. In addition, BigDL also provides trusted big data analysis, trusted machine learning, deep learning and federated learning applications.
As shown in the figure above, BigDL PPML is based on the trusted Kubernetes cluster environment. Through the confidential container technology, it can build a TDX-based distributed executable environment, so as to ensure the security of business, data and models in the use and calculation process, including invisibility and immutability.
From the perspective of data flow, all data are stored in the data lake and data warehouse in an encrypted way. BigDL PPML loads these confidential data, obtains the data key through remote authentication and key management system, puts it in a trusted execution environment for decryption, and then uses the computing framework of big data and artificial intelligence to carry out distributed preprocessing, model training and model reasoning for the data. Finally, write the final result, data or model back to the distributed storage in a secure or encrypted way. In addition, data is transmitted between nodes and containers by TLS, so as to achieve the privacy protection and data security of the whole link.
Running BigDL PPML workload using TDX secret container requires only two simple steps: first, build and encrypt the PPML image, and then push the encrypted image to the image warehouse. Secondly, to deploy PPML workload in Kubernetes, developers only need to specify the required secret container runtime and configured high-performance storage volume in the standard YAML file, and then use the standard Kubernetes command to pull it up.
If you look deeper, Kubernetes will schedule the workload to the target host with the ability to run the secret container:
First, TDX TEE is started when the secret container on the host is running,
Secondly, in the TDX trusted execution environment, perform remote authentication and obtain the key required to verify/decrypt the container image. The image service downloads the container image, uses the key to verify and decrypt the container image; In terms of data, users use standard K8S CSI drivers such as open-local to mount high-performance local LVM volumes for the container. The confidential container will automatically perform transparent encrypted storage to protect user input and output data.
Finally, start the BigDL PPML workload related container. A BigDL PPML driver and multiple workers are running on the K8S cluster in a distributed manner, so that you can conduct reliable cloud native big data analysis and artificial intelligence applications based on TDX.
Intel and Alibaba Cloud have always maintained close cooperation. Both of us are initiators of the CoCo upstream community. We have jointly defined, designed and implemented many key features of the CoCo software stack, such as image download in TEE, image signature verification and decryption, trusted temporary storage and measurable runtime environment, all of which ensure the strong security attributes of the CoCo project
In addition, we also have close cooperation in the dragon lizard community, including the joint implementation of TDX-based end-to-end solutions for confidential containers, including remote certification and reference applications. For example, we chose the open-local driver of the dragon lizard community, the first supported trusted storage, and the other supported the new feature of kata's direct volume, etc
Finally, Intel is also working closely with Alibaba Cloud's partners to develop Alibaba Cloud secret containers.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00