The High-performance Edition of the Apache Spark based confidential compute engine on AnalyticDB for MySQL builds on the capabilities of the Basic Edition. It supports the Parquet modular encryption feature and is compatible with compute engines such as the community editions of Spark, Hadoop, and Hive. This edition ensures secure data transmission and storage and improves data processing efficiency. This topic describes how to use the High-performance Edition of the Apache Spark based confidential compute engine to encrypt data and perform SQL queries on ciphertext tables.
Prerequisites
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.
An Object Storage Service (OSS) bucket is created in the same region as the AnalyticDB for MySQL cluster.
A database account is created for the AnalyticDB for MySQL cluster.
If you use an Alibaba Cloud account, you need to only create a privileged account.
If you use a Resource Access Management (RAM) user, you must create a privileged account and a standard account and associate the standard account with the RAM user.
The AliyunADBFullAccess, AliyunADBSparkProcessingDataRole, and read and write permissions for AnalyticDB for MySQL databases and tables are granted to a RAM user. For more information, see Account authorization.
Data preparation
The data file that you want to encrypt must be in the Parquet format. To complete the following steps, you can download the Spark confidential computing sample data.
Procedure
You can encrypt plaintext data in the AnalyticDB for MySQL console or using an encryption tool. If the data that you want to encrypt is stored on an on-premises device, you can use an encryption tool to encrypt the data. If the data is stored in a cloud database, you can encrypt the data in the AnalyticDB for MySQL console. The encryption methods involve different operations:
Encrypt data in the console: Upload plaintext data to OSS, and then encrypt it.
Encrypt data using an encryption tool: Encrypt data locally, and then upload the ciphertext to OSS.
Encrypt data and create a ciphertext table in the console
Upload the plaintext data from the Data preparation section to an OSS bucket. This example uses
oss://testBucketName/adb/Spark/customer. For more information, see Simple upload.Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the navigation pane on the left, choose .
In the SQLConsole window, select the Spark engine and a job resource group.
Execute the following statements to create a ciphertext table.
Enable confidential computing, set the customer master keys, and create a database.
-- Enable native computing. SET spark.adb.native.enabled=true; -- Configure resources. SET spark.driver.resourceSpec=medium; SET spark.executor.instances=2; SET spark.executor.resourceSpec=medium; SET spark.app.name=Spark SQL Encryption Test; -- Enable ciphertext read and write, and set the list of master keys, KMS Client, and CryptoFactory. After this is enabled, the engine supports both plaintext and ciphertext. SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****; SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS; SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory; -- Create a database. CREATE database IF NOT EXISTS adb_external_db;Parameter description:
Parameter
Description
spark.hadoop.parquet.encryption.key.list
The list of customer master keys (CMKs). A CMK corresponds to a key ID. Separate multiple CMKs with commas (,). Separate each key ID and CMK with a colon (:). Format:
<Key ID 1>:<Base64-encoded CMK 1>,<Key ID 2>:<Base64-encoded CMK 2>. For more information, see Introduction to keys.This example uses
kf:MDEyMzQ1Njc4OTAxMjdy****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****WarningYou can use a general-purpose tool, such as OpenSSL, to randomly generate a customer master key. A customer master key is the root credential for accessing encrypted data. If the key is lost, you can no longer access the existing data. Keep your customer master key secure.
spark.hadoop.parquet.encryption.kms.client.class
The KMS client class name. Set this to
io.glutenproject.encryption.InMemoryKMS.spark.hadoop.parquet.crypto.factory.class
The CryptoFactory class name. Set this to
org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory.Create an external table named
customerto store plaintext data. The LOCATION parameter specifies the OSS path of the plaintext data. In this example,oss://testBucketName/adb/Spark/customeris used.SET spark.adb.native.enabled=true; SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****; SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS; SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory; CREATE TABLE IF NOT EXISTS adb_external_db.customer ( c_custkey long, c_name string, c_address string, c_nationkey long, c_phone string, c_acctbal decimal(12, 2), c_mktsegment string, c_comment string ) USING parquet LOCATION 'oss://testBucketName/adb/Spark/customer';NoteIf a plaintext table already exists in the
adb_external_dbdatabase, you can skip this step.If the data is stored in another cloud database, create a corresponding external table. For the syntax to create an external table, see CREATE EXTERNAL TABLE.
Create an external table named
enc_customerto store ciphertext data. In this example, theenc_customerexternal table is stored inoss://testBucketName/adb/Spark/enc_customer.SET spark.adb.native.enabled=true; SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****; SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS; SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory; CREATE TABLE IF NOT EXISTS adb_external_db.enc_customer USING Parquet OPTIONS ( 'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone', 'parquet.encryption.footer.key'='kf' ) LOCATION 'oss://testBucketName/adb/Spark/enc_customer' AS SELECT * FROM adb_external_db.customer;Parameter description:
Parameter
Required
Description
parquet.encryption.column.keys
Yes
Encrypts columns using the CMK that corresponds to the key ID. One CMK can encrypt multiple columns. Separate the key ID and column name with a colon (:). Separate multiple columns with commas (,). Separate different CMKs with semicolons (;).
parquet.encryption.footer.key
Yes
The footer key. It is used to encrypt information such as the metadata of the Parquet file.
NoteThe footer is a data structure at the end of a Parquet file. It typically stores file metadata, such as the version number, group metadata, column metadata, and key metadata.
ImportantThe parquet.encryption.column.keys and parquet.encryption.footer.key parameters must be set at the same time. Otherwise, the file is not encrypted.
(Optional) Delete the external table
customer.DROP TABLE IF EXISTS adb_external_db.customer;ImportantThe DROP TABLE statement deletes only the metadata of the
customerexternal table. To prevent plaintext data leakage, you must manually delete the corresponding data files from OSS.
Create an external table named
enc_customer_outputand write the SQL computation results from theenc_customertable to theenc_customer_outputexternal table. The data for theenc_customer_outputexternal table is stored inoss://testBucketName/adb/Spark/enc_customer_output.SET spark.adb.native.enabled=true; SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****; SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS; SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory; CREATE TABLE IF NOT EXISTS adb_external_db.enc_customer_output USING Parquet OPTIONS ( 'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone', 'parquet.encryption.footer.key'='kf' ) LOCATION 'oss://testBucketName/adb/Spark/enc_customer_output' AS SELECT * FROM adb_external_db.enc_customer WHERE c_custkey < 15;Decrypt the computation results.
Create an external table named
customer_output. Decrypt the data from theenc_customer_outputtable and write the decrypted data to thecustomer_outputexternal table. The data for thecustomer_outputexternal table is stored atoss://testBucketName/adb/Spark/customer_output.SET spark.adb.native.enabled=true; SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****; SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS; SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory; CREATE TABLE IF NOT EXISTS adb_external_db.customer_output USING Parquet LOCATION 'oss://testBucketName/adb/Spark/customer_output' AS SELECT * FROM adb_external_db.enc_customer_output;Query the data in the
customer_outputtable.SELECT * FROM adb_external_db.customer_output;
Encrypt data using an encryption tool and create a ciphertext table
Use an encryption tool to encrypt the locally stored plaintext data into a ciphertext dataset. For more information about the encryption tool, see Spark encryption tool.
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ import org.apache.spark.SparkConf // Initialize SparkSession and enter the encryption and decryption parameters. val conf = new SparkConf() .set("spark.hadoop.parquet.encryption.kms.client.class", "org.apache.parquet.crypto.keytools.mocks.InMemoryKMS") .set("spark.hadoop.parquet.encryption.key.list", "kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****") .set("spark.hadoop.parquet.crypto.factory.class", "org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory") val spark = SparkSession.builder().appName("SquareDataFrame").config(conf).getOrCreate() // Read the plaintext customer file. val df = spark.read.parquet("customer") // Encrypt the plaintext customer file. The name column is encrypted with kc1, and the footer is encrypted with kf. The resulting ciphertext file is enc_customer. df.write .option("parquet.encryption.column.keys" , "kc1:c_name") .option("parquet.encryption.footer.key" , "kf") // The local path where the ciphertext dataset is stored. .parquet("enc_customer")Parameter description:
Parameter
Required
Description
spark.hadoop.parquet.encryption.kms.client.class
Yes
The KMS client class name.
When encrypting data locally, set this to
org.apache.parquet.crypto.keytools.mocks.InMemoryKMS.When creating a ciphertext table in the console, set this to
io.glutenproject.encryption.InMemoryKMS.
spark.hadoop.parquet.encryption.key.list
Yes
The list of CMKs. A CMK corresponds to a key ID. Separate multiple CMKs with commas (,). Separate each key ID and CMK with a colon (:). Format:
<Key ID 1>:<Base64-encoded CMK 1>,<Key ID 2>:<Base64-encoded CMK 2>. For more information, see Introduction to keys.This example uses
kf:MDEyMzQ1Njc4OTAxMjdy****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****WarningYou can use a general-purpose tool, such as OpenSSL, to randomly generate a customer master key. A customer master key is the root credential for accessing encrypted data. If the key is lost, you can no longer access the existing data. Keep your customer master key secure.
spark.hadoop.parquet.crypto.factory.class
Yes
The CryptoFactory class name. Set this to
org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory.parquet.encryption.column.keys
Yes
Encrypts columns using the CMK that corresponds to the key ID. One CMK can encrypt multiple columns. Separate the key ID and column name with a colon (:). Separate multiple columns with commas (,). Separate different CMKs with semicolons (;).
parquet.encryption.footer.key
Yes
The footer key. It is used to encrypt information such as the metadata of the Parquet file.
NoteThe footer is a data structure at the end of a Parquet file. It typically stores file metadata, such as the version number, group metadata, column metadata, and key metadata.
Upload the ciphertext dataset
enc_customer.parquetto OSS. This example usesoss://testBucketName/adb/Spark/enc_customer.parquet. For more information, see Simple upload.Create a ciphertext table.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the navigation pane on the left, choose .
On the SQLConsole tab, select the Spark engine and a job resource group.
Execute the following statements to create a ciphertext table.
You can enable native computing and create a database.
-- Enable native computing. SET spark.adb.native.enabled=true; -- Configure resources. SET spark.driver.resourceSpec=medium; SET spark.executor.instances=2; SET spark.executor.resourceSpec=medium; -- Enable ciphertext read and write, and set the list of master keys, KMS Client, and CryptoFactory. After this is enabled, the engine supports both plaintext and ciphertext. SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****; SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS; SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory; -- Create a database. CREATE DATABASE IF NOT EXISTS adb_external_db;Create the external table
enc_customer. The LOCATION parameter specifies the OSS path of theenc_customerciphertext dataset. This example usesoss://testBucketName/adb/Spark/enc_customer.parquet.SET spark.adb.native.enabled=true; SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****; SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS; SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory; CREATE TABLE IF NOT EXISTS adb_external_db.enc_customer USING parquet LOCATION 'oss://testBucketName/adb/Spark/enc_customer';
Create an external table named
enc_customer_output, and write the SQL query results from theenc_customertable to theenc_customer_outputexternal table. The data for theenc_customer_outputtable is stored inoss://testBucketName/adb/Spark/enc_customer_output.SET spark.adb.native.enabled=true; SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****; SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS; SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory; CREATE TABLE IF NOT EXISTS adb_external_db.enc_customer_output USING Parquet OPTIONS ( 'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone', 'parquet.encryption.footer.key'='kf' ) LOCATION 'oss://testBucketName/adb/Spark/enc_customer_output' AS SELECT * FROM adb_external_db.enc_customer WHERE c_custkey < 15;Download and decrypt the ciphertext results.
Download the ciphertext computation results from the OSS path
oss://testBucketName/adb/Spark/enc_customer_outputto your local device. For more information, see Download files.Decrypt the ciphertext result dataset and save the decrypted file as
customer_output.// Decrypt the ciphertext dataset. val conf = new SparkConf() .set("spark.hadoop.parquet.encryption.kms.client.class", "org.apache.parquet.crypto.keytools.mocks.InMemoryKMS") .set("spark.hadoop.parquet.encryption.key.list", "kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****") .set("spark.hadoop.parquet.crypto.factory.class", "org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory") val spark = SparkSession.builder().appName("SquareDataFrame").config(conf).getOrCreate() val df2 = spark.read.parquet("enc_customer_output") // Download the decrypted file to your local device. df2.write .parquet("customer_output")