All Products
Search
Document Center

AnalyticDB:Overview

Last Updated:Jan 20, 2026

The AnalyticDB for MySQL confidential engine powered by Apache Spark is one of the first products certified by the China Academy of Information and Communications Technology (CAICT) for its performance and security in trusted execution environments (TEEs). The engine encrypts sensitive data to prevent data breaches and is ideal for privacy-preserving computing. This topic describes the scenarios and benefits of the Apache Spark based confidential engine and compares the features of the Basic and High-performance editions.

Scenarios

The AnalyticDB for MySQL (Enterprise Edition, Basic Edition, and Data Lakehouse Edition) confidential engine powered by Apache Spark encrypts sensitive data to prevent data breaches and meet compliance requirements. The Apache Spark based confidential engine is often used to resolve data security issues in scenarios such as secure data storage and computing, sensitive data compliance, and secure data sharing. Common scenarios include the following:

  • Secure data storage and computing: In untrusted environments, such as third-party platforms, the Apache Spark based confidential engine provides data protection for key data analytics applications, such as investment and financial analysis. This ensures that data is secure during storage and computation and reduces the risk of plaintext data breaches.

  • Sensitive data compliance: In untrusted environments, such as third-party platforms, the Apache Spark based confidential engine provides security protection for application services to protect end-user sensitive data. For example, private data such as personally identifiable information (PII) and genetic data must meet end-to-end encryption compliance requirements when managed by a third party.

  • Secure data sharing: You can control key ownership to manage data usage rights and access frequency. This enables secure data sharing and prevents data breaches. The following figure shows this scenario.

    image

Editions

The Apache Spark based confidential engine is available in two editions: Basic Edition and High-performance Edition. The differences are as follows:

  • Basic Edition: The Basic Edition of the Apache Spark based confidential engine transmits and stores sensitive data as ciphertext. Only key owners can decrypt the data, which prevents data breaches. You must use client tools for encryption and decryption to convert data between plaintext and ciphertext.

  • High-performance Edition (Recommended): Building on the data encryption capabilities of the Basic Edition, the High-performance Edition of the Apache Spark based confidential engine integrates Apache Gluten and Velox to provide vectorization. This ensures secure data transmission and storage while improving data processing efficiency.

The following compares the Basic Edition and the High-performance Edition of the always-confidential Apache Spark compute engine:

Edition

Confidential data format

Performance (compared to open source Apache Spark)

Compatibility (compared to open source Apache Spark)

Tool dependency

Key mechanism

Basic Edition

EncBlocksSource format

0.5 times

  • SQL syntax compatible

  • Data format incompatible

Depends on client tools provided by Apache Spark to encrypt and decrypt data.

Supports two types of keys: master encryption key (MEK) and data key (DK). For more information, see Keys and Encryption.

High-performance Edition

Parquet modular encryption format

1.9 times

  • SQL syntax compatible

  • Data format compatible

No dependencies. You can use any tool that supports Parquet modular encryption to encrypt and decrypt data.

Supports three types of keys: master encryption key (MEK), key encryption key (KEK), and data key (DK). For more information, see Keys and Encryption.

Benefits

Rich features and ease of use

  • Supports all standard SQL operators. Confidential computing applications can be used with simple configurations and do not require SQL modifications.

  • Usage is consistent with open source Apache Spark.

  • The High-performance Edition supports hybrid processing of data at different privacy levels, including mixed-join computations on plaintext tables, plaintext and ciphertext tables, and ciphertext tables.

  • Computation results can be encrypted for output to enhance data security.

Data control

  • Key management supports Bring-Your-Own-Key (BYOK), giving you full control over your keys. The High-performance Edition introduces the Parquet modular encryption format, which lets you use your own keys to encrypt and decrypt data for complete data control. For more information, see Keys.

  • In the High-performance Edition, encryption keys are managed by the application. During computation, the keys are held by the InMemoryKMS class in the default application and are destroyed after the computation is complete.

High performance

  • The High-performance Edition of the Apache Spark based confidential engine is 4 times faster than the Basic Edition and 1.9 times faster than open source Apache Spark 3.2.0.

  • Flexible encryption methods support encrypting individual data columns in data files, which reduces unnecessary data I/O overhead.

Keys and Encryption

Keys

Master encryption key (MEK)

A master encryption key (MEK) is a high-level encryption key used to encrypt and protect other keys in a system or dataset. In a hierarchical key management structure, the MEK is at the top level. It is not used for daily encryption operations. Instead, it is used to encrypt and decrypt lower-level keys, such as key encryption keys (KEKs) and data keys (DKs). This layered approach simplifies key management and improves the security of key storage.

MEKs are randomly generated by common tools such as OpenSSL. The Basic Edition of the Apache Spark based confidential engine supports only a 16 byte string (hex-encoded) as the master key. The High-performance Edition supports a 16, 24, or 32 byte string (Base64-encoded) as the master key. Use the following OpenSSL commands to randomly generate a master key:

# Randomly generate a 16-byte-long, hex-encoded key
$openssl rand -hex 16
# Randomly generate a 24-byte-long, base64-encoded key
$openssl rand -base64 24
Warning

The MEK is the root credential for accessing encrypted data. If the key is lost, you can no longer access your existing data. Keep your MEK secure.

Key encryption key (KEK)

A key encryption key (KEK) is randomly generated by the system or derived from an MEK and is protected by the higher-level key. A KEK can encrypt or decrypt a data key (DK) to improve key security.

Data Encryption Key (DEK)

A data key (DK) is typically generated by the system or derived from a master encryption key (MEK) and is protected by a higher-level key. A DK is used to encrypt or decrypt data in files.

Encryption and decryption

Basic Edition

Data encryption

During data encryption, the Basic Edition of the Apache Spark based confidential engine retrieves the MEK from the application configuration and automatically generates a random DK for the dataset. The DK is used to encrypt the data in the file, and the MEK is used to encrypt the DK.

Data decryption

During data decryption, the Basic Edition of the Apache Spark based confidential engine retrieves the MEK from the application configuration and then extracts the DK from the file's metadata. The DK is decrypted using the MEK. If the MEK is managed by the application, the decryption occurs locally.

High-performance Edition

Data encryption

During data encryption, the High-performance Edition of the Apache Spark based confidential engine retrieves the MEK from the application configuration and randomly generates a KEK and a DK for each column or file. The KEK is used to encrypt the DK, and the DK is used to encrypt the data in the file. Each KEK has a unique 16 byte identifier. Both the KEK and the DK are stored in the file's metadata after encryption.

Data decryption

During data decryption, the High-performance Edition of the Apache Spark based confidential engine extracts the MEK from the application configuration and then extracts the KEK and the DK from the file's metadata. The KEK is decrypted using the MEK. If the MEK is managed by the application, the decryption occurs locally.

Notes

  • When you use the BYOK key management method, you must keep your keys secure. If a key is lost, the data cannot be decrypted.

  • Different compute engines may process data with different precision. If you encounter problems when using the Apache Spark based confidential engine, submit a ticket.

References