All Products
Search
Document Center

AnalyticDB:Use the High-performance Edition of Apache Spark based confidential engine

Last Updated:Jan 20, 2026

The High-performance Edition of the Apache Spark based confidential compute engine on AnalyticDB for MySQL builds on the capabilities of the Basic Edition. It supports the Parquet modular encryption feature and is compatible with compute engines such as the community editions of Spark, Hadoop, and Hive. This edition ensures secure data transmission and storage and improves data processing efficiency. This topic describes how to use the High-performance Edition of the Apache Spark based confidential compute engine to encrypt data and perform SQL queries on ciphertext tables.

Prerequisites

Data preparation

The data file that you want to encrypt must be in the Parquet format. To complete the following steps, you can download the Spark confidential computing sample data.

Procedure

You can encrypt plaintext data in the AnalyticDB for MySQL console or using an encryption tool. If the data that you want to encrypt is stored on an on-premises device, you can use an encryption tool to encrypt the data. If the data is stored in a cloud database, you can encrypt the data in the AnalyticDB for MySQL console. The encryption methods involve different operations:

  • Encrypt data in the console: Upload plaintext data to OSS, and then encrypt it.

  • Encrypt data using an encryption tool: Encrypt data locally, and then upload the ciphertext to OSS.

Encrypt data and create a ciphertext table in the console

  1. Upload the plaintext data from the Data preparation section to an OSS bucket. This example uses oss://testBucketName/adb/Spark/customer. For more information, see Simple upload.

  2. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

  3. In the navigation pane on the left, choose Job Development > SQL Development.

  4. In the SQLConsole window, select the Spark engine and a job resource group.

  5. Execute the following statements to create a ciphertext table.

    1. Enable confidential computing, set the customer master keys, and create a database.

      -- Enable native computing.
      SET spark.adb.native.enabled=true;
      -- Configure resources.
      SET spark.driver.resourceSpec=medium;
      SET spark.executor.instances=2;
      SET spark.executor.resourceSpec=medium;
      SET spark.app.name=Spark SQL Encryption Test;
      -- Enable ciphertext read and write, and set the list of master keys, KMS Client, and CryptoFactory. After this is enabled, the engine supports both plaintext and ciphertext.
      SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
      SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
      SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
      -- Create a database.
      CREATE database IF NOT EXISTS adb_external_db;

      Parameter description:

      Parameter

      Description

      spark.hadoop.parquet.encryption.key.list

      The list of customer master keys (CMKs). A CMK corresponds to a key ID. Separate multiple CMKs with commas (,). Separate each key ID and CMK with a colon (:). Format: <Key ID 1>:<Base64-encoded CMK 1>,<Key ID 2>:<Base64-encoded CMK 2>. For more information, see Introduction to keys.

      This example uses kf:MDEyMzQ1Njc4OTAxMjdy****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****

      Warning

      You can use a general-purpose tool, such as OpenSSL, to randomly generate a customer master key. A customer master key is the root credential for accessing encrypted data. If the key is lost, you can no longer access the existing data. Keep your customer master key secure.

      spark.hadoop.parquet.encryption.kms.client.class

      The KMS client class name. Set this to io.glutenproject.encryption.InMemoryKMS.

      spark.hadoop.parquet.crypto.factory.class

      The CryptoFactory class name. Set this to org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory.

    2. Create an external table named customer to store plaintext data. The LOCATION parameter specifies the OSS path of the plaintext data. In this example, oss://testBucketName/adb/Spark/customer is used.

      SET spark.adb.native.enabled=true;
      SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
      SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
      SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
      CREATE TABLE IF NOT EXISTS adb_external_db.customer 
       (
          c_custkey long,
          c_name       string,
          c_address    string,
          c_nationkey long,
          c_phone      string,
          c_acctbal    decimal(12, 2),
          c_mktsegment string,
          c_comment    string
      )
      USING parquet 
      LOCATION 'oss://testBucketName/adb/Spark/customer';
      Note
      • If a plaintext table already exists in the adb_external_db database, you can skip this step.

      • If the data is stored in another cloud database, create a corresponding external table. For the syntax to create an external table, see CREATE EXTERNAL TABLE.

    3. Create an external table named enc_customer to store ciphertext data. In this example, the enc_customer external table is stored in oss://testBucketName/adb/Spark/enc_customer.

      SET spark.adb.native.enabled=true;
      SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
      SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
      SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
      CREATE TABLE IF NOT EXISTS adb_external_db.enc_customer
      USING Parquet
      OPTIONS (
       'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone',
       'parquet.encryption.footer.key'='kf'
      )
      LOCATION 'oss://testBucketName/adb/Spark/enc_customer'
      AS
      SELECT *
      FROM adb_external_db.customer;

      Parameter description:

      Parameter

      Required

      Description

      parquet.encryption.column.keys

      Yes

      Encrypts columns using the CMK that corresponds to the key ID. One CMK can encrypt multiple columns. Separate the key ID and column name with a colon (:). Separate multiple columns with commas (,). Separate different CMKs with semicolons (;).

      parquet.encryption.footer.key

      Yes

      The footer key. It is used to encrypt information such as the metadata of the Parquet file.

      Note

      The footer is a data structure at the end of a Parquet file. It typically stores file metadata, such as the version number, group metadata, column metadata, and key metadata.

      Important

      The parquet.encryption.column.keys and parquet.encryption.footer.key parameters must be set at the same time. Otherwise, the file is not encrypted.

    4. (Optional) Delete the external table customer.

      DROP TABLE IF EXISTS adb_external_db.customer;
      Important

      The DROP TABLE statement deletes only the metadata of the customer external table. To prevent plaintext data leakage, you must manually delete the corresponding data files from OSS.

  6. Create an external table named enc_customer_output and write the SQL computation results from the enc_customer table to the enc_customer_output external table. The data for the enc_customer_output external table is stored in oss://testBucketName/adb/Spark/enc_customer_output.

    SET spark.adb.native.enabled=true;
    SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
    SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
    SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
    CREATE TABLE IF NOT EXISTS adb_external_db.enc_customer_output
    USING Parquet
    OPTIONS (
     'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone',
     'parquet.encryption.footer.key'='kf'
    )
    LOCATION 'oss://testBucketName/adb/Spark/enc_customer_output'
    AS
    SELECT *
    FROM adb_external_db.enc_customer
    WHERE 
    c_custkey < 15;
  7. Decrypt the computation results.

    1. Create an external table named customer_output. Decrypt the data from the enc_customer_output table and write the decrypted data to the customer_output external table. The data for the customer_output external table is stored at oss://testBucketName/adb/Spark/customer_output.

      SET spark.adb.native.enabled=true;
      SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
      SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
      SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
      CREATE TABLE IF NOT EXISTS adb_external_db.customer_output
      USING Parquet
      LOCATION 'oss://testBucketName/adb/Spark/customer_output'
      AS
      SELECT *
      FROM adb_external_db.enc_customer_output;
    2. Query the data in the customer_output table.

      SELECT * FROM adb_external_db.customer_output;

Encrypt data using an encryption tool and create a ciphertext table

  1. Use an encryption tool to encrypt the locally stored plaintext data into a ciphertext dataset. For more information about the encryption tool, see Spark encryption tool.

    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.functions._
    import org.apache.spark.SparkConf
    
    // Initialize SparkSession and enter the encryption and decryption parameters.
    val conf = new SparkConf()
    .set("spark.hadoop.parquet.encryption.kms.client.class", "org.apache.parquet.crypto.keytools.mocks.InMemoryKMS")
    .set("spark.hadoop.parquet.encryption.key.list", "kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****")
    .set("spark.hadoop.parquet.crypto.factory.class", "org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory")
    
    val spark = SparkSession.builder().appName("SquareDataFrame").config(conf).getOrCreate()
    
    // Read the plaintext customer file.
    val df = spark.read.parquet("customer")
    // Encrypt the plaintext customer file. The name column is encrypted with kc1, and the footer is encrypted with kf. The resulting ciphertext file is enc_customer.
    df.write
    .option("parquet.encryption.column.keys" , "kc1:c_name")
    .option("parquet.encryption.footer.key" , "kf")
    // The local path where the ciphertext dataset is stored.
    .parquet("enc_customer")

    Parameter description:

    Parameter

    Required

    Description

    spark.hadoop.parquet.encryption.kms.client.class

    Yes

    The KMS client class name.

    • When encrypting data locally, set this to org.apache.parquet.crypto.keytools.mocks.InMemoryKMS.

    • When creating a ciphertext table in the console, set this to io.glutenproject.encryption.InMemoryKMS.

    spark.hadoop.parquet.encryption.key.list

    Yes

    The list of CMKs. A CMK corresponds to a key ID. Separate multiple CMKs with commas (,). Separate each key ID and CMK with a colon (:). Format: <Key ID 1>:<Base64-encoded CMK 1>,<Key ID 2>:<Base64-encoded CMK 2>. For more information, see Introduction to keys.

    This example uses kf:MDEyMzQ1Njc4OTAxMjdy****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****

    Warning

    You can use a general-purpose tool, such as OpenSSL, to randomly generate a customer master key. A customer master key is the root credential for accessing encrypted data. If the key is lost, you can no longer access the existing data. Keep your customer master key secure.

    spark.hadoop.parquet.crypto.factory.class

    Yes

    The CryptoFactory class name. Set this to org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory.

    parquet.encryption.column.keys

    Yes

    Encrypts columns using the CMK that corresponds to the key ID. One CMK can encrypt multiple columns. Separate the key ID and column name with a colon (:). Separate multiple columns with commas (,). Separate different CMKs with semicolons (;).

    parquet.encryption.footer.key

    Yes

    The footer key. It is used to encrypt information such as the metadata of the Parquet file.

    Note

    The footer is a data structure at the end of a Parquet file. It typically stores file metadata, such as the version number, group metadata, column metadata, and key metadata.

  2. Upload the ciphertext dataset enc_customer.parquet to OSS. This example uses oss://testBucketName/adb/Spark/enc_customer.parquet. For more information, see Simple upload.

  3. Create a ciphertext table.

    1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

    2. In the navigation pane on the left, choose Job Development > SQL Development.

    3. On the SQLConsole tab, select the Spark engine and a job resource group.

    4. Execute the following statements to create a ciphertext table.

      1. You can enable native computing and create a database.

        -- Enable native computing.
        SET spark.adb.native.enabled=true;
        -- Configure resources.
        SET spark.driver.resourceSpec=medium;
        SET spark.executor.instances=2;
        SET spark.executor.resourceSpec=medium;
        -- Enable ciphertext read and write, and set the list of master keys, KMS Client, and CryptoFactory. After this is enabled, the engine supports both plaintext and ciphertext.
        SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
        SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
        SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
        -- Create a database.
        CREATE DATABASE IF NOT EXISTS adb_external_db;
      2. Create the external table enc_customer. The LOCATION parameter specifies the OSS path of the enc_customer ciphertext dataset. This example uses oss://testBucketName/adb/Spark/enc_customer.parquet.

        SET spark.adb.native.enabled=true;
        SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
        SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
        SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
        CREATE TABLE IF NOT EXISTS adb_external_db.enc_customer 
        USING parquet
        LOCATION  'oss://testBucketName/adb/Spark/enc_customer';
  4. Create an external table named enc_customer_output, and write the SQL query results from the enc_customer table to the enc_customer_output external table. The data for the enc_customer_output table is stored in oss://testBucketName/adb/Spark/enc_customer_output.

    SET spark.adb.native.enabled=true;
    SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
    SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
    SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
    CREATE TABLE IF NOT EXISTS adb_external_db.enc_customer_output
    USING Parquet
    OPTIONS (
     'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone',
     'parquet.encryption.footer.key'='kf'
    )
    LOCATION 'oss://testBucketName/adb/Spark/enc_customer_output'
    AS
    SELECT *
    FROM adb_external_db.enc_customer
    WHERE 
    c_custkey < 15;
  5. Download and decrypt the ciphertext results.

    1. Download the ciphertext computation results from the OSS path oss://testBucketName/adb/Spark/enc_customer_output to your local device. For more information, see Download files.

    2. Decrypt the ciphertext result dataset and save the decrypted file as customer_output.

      // Decrypt the ciphertext dataset.
      val conf = new SparkConf()
      .set("spark.hadoop.parquet.encryption.kms.client.class", "org.apache.parquet.crypto.keytools.mocks.InMemoryKMS")
      .set("spark.hadoop.parquet.encryption.key.list", "kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****")
      .set("spark.hadoop.parquet.crypto.factory.class", "org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory")
      val spark = SparkSession.builder().appName("SquareDataFrame").config(conf).getOrCreate()
      val df2 = spark.read.parquet("enc_customer_output")
      // Download the decrypted file to your local device.
      df2.write
      .parquet("customer_output")