All Products
Search
Document Center

AnalyticDB:Performance test of the High-performance Edition Spark confidential engine

Last Updated:Jun 21, 2024

This topic describes the performance test processes and results of the High-performance Edition Spark confidential engine and AnalyticDB for MySQL Spark. In scenarios that involve large amounts of data, the performance of the High-performance Edition Spark confidential engine is 1.9 times the performance of AnalyticDB for MySQL Spark 3.2.0.

Important

The test follows only parts of the TPC-H benchmark test specifications. The results of the test are not equivalent to and cannot be compared with the results that are obtained from tests in which the TPC-H benchmark test specifications are completely followed.

Prerequisites

  • An AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster is created in the same region as an Object Storage Service (OSS) bucket.

Test environment

Configuration item

Description

Remarks

Region and zone

Hangzhou Zone K

None

Network type

Virtual Private Cloud (VPC)

None

Edition

Data Lakehouse Edition (V3.0)

None

Spark resource specifications

2xlarge (16 cores and 64 GB of memory, 8 hosts)

None

Execution engine

  • AnalyticDB for MySQL Spark 3.2.0

  • High-performance Edition Spark confidential engine

Compare the performance between AnalyticDB for MySQL Spark and the High-performance Edition Spark confidential engine.

Test data

Plaintext data: 1 TB

Ciphertext data: 1 TB

The plaintext data and ciphertext data are in the Parquet format.

Storage

OSS bucket

None

Test data

In this example, a TPC-H test dataset is used.

  1. Download the DBGEN tool that is used to generate TPC-H test data from the TPC official website.

  2. Construct test data. For more information, see Construct data.

    ./dbgen -s $scale -C $chunks -S $i -f

    A TPC-H test dataset contains eight tables: customer, lineitem, nation, orders, part, partsupp, region, and supplier.

  3. Convert the format of the tables into the Parquet format. In this example, the customer table is used.

      val df = spark.read
      .format("csv")
      .option("header", "false")
      .option("delimiter", "|")
      .schema(StructType {
        StructField("c_custkey", IntegerType, true) ::
          StructField("c_name", StringType, true) ::
          StructField("c_address", StringType, true) ::
          StructField("c_nationkey", IntegerType, true) ::
          StructField("c_phone", StringType, true) ::
          StructField("c_acctbal", FloatType, true) ::
          StructField("c_mktsegment", StringType, true) ::
          StructField("c_comment", StringType, true) ::
          Nil
      })
      .load("customer.tbl")
      df.write.parquet("parquet/customer")
    Note

    The preceding sample code provides an example on how to convert the format of the customer table into the Parquet format. You can convert the format of other tables in the same manner.

  4. Upload the TPC-H test dataset to an OSS bucket. In this example, oss://testBucketName/adb/Spark is used.

Test case 1: Use AnalyticDB for MySQL Spark 3.2.0 to process plaintext data

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and the job resource group.

  4. Create a database.

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    CREATE DATABASE adb_plain_db;
  5. Create an OSS external table to store plaintext data. The LOCATION parameter specifies the OSS path of the TPC-H test dataset. In this example, the customer table is used, and the OSS path is oss://testBucketName/adb/Spark/customer.

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    USE adb_plain_db; 
    CREATE TABLE IF NOT EXISTS customer 
     (
     c_custkey long,
     c_name string,
     c_address string,
     c_nationkey long,
     c_phone string,
     c_acctbal decimal(12, 2),
     c_mktsegment string,
     c_comment string
    )
    USING parquet 
    LOCATION 'oss://testBucketName/adb/Spark/customer';
  6. Perform SQL queries and record the total execution duration.

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    SELECT
     l_returnflag,
     l_linestatus,
     sum(l_quantity) AS sum_qty,
     sum(l_extendedprice) AS sum_base_price,
     sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
     avg(l_quantity) AS avg_qty,
     avg(l_extendedprice) AS avg_price,
     avg(l_discount) AS avg_disc,
     count(*) AS count_order
    FROM
     lineitem
    WHERE
     l_shipdate <= date'1998-09-02' - interval 1 day
    GROUP BY
     l_returnflag,
     l_linestatus
    ORDER BY
     l_returnflag,
     l_linestatus;
    
    SELECT
     s_acctbal,
     s_name,
     n_name,
     p_partkey,
     p_mfgr,
     s_address,
     s_phone,
     s_comment
    FROM
     part,
     supplier,
     partsupp,
     nation,
     region
    WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND p_size = 15
     AND p_type LIKE '%BRASS'
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE'
     AND ps_supplycost = (
     SELECT
     min(ps_supplycost)
     FROM
     partsupp,
     supplier,
     nation,
     region
     WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE')
    ORDER BY
     s_acctbal DESC,
     n_name,
     s_name,
     p_partkey
    LIMIT 100;
    
    
    SELECT
     l_orderkey,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     o_orderdate,
     o_shippriority
    FROM
     customer,
     orders,
     lineitem
    WHERE
     c_mktsegment = 'BUILDING'
     AND c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate < date'1995-03-15'
     AND l_shipdate > date'1995-03-15'
    GROUP BY
     l_orderkey,
     o_orderdate,
     o_shippriority
    ORDER BY
     revenue DESC,
     o_orderdate
    LIMIT 10;
    
    SELECT
     o_orderpriority,
     count(*) AS order_count
    FROM
     orders
    WHERE
     o_orderdate >= date'1993-07-01'
     AND o_orderdate < date'1993-07-01' + interval 3 month
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem
     WHERE
     l_orderkey = o_orderkey
     AND l_commitdate < l_receiptdate)
    GROUP BY
     o_orderpriority
    ORDER BY
     o_orderpriority;
    
    SELECT
     n_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     customer,
     orders,
     lineitem,
     supplier,
     nation,
     region
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND l_suppkey = s_suppkey
     AND c_nationkey = s_nationkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'ASIA'
     AND o_orderdate >= date'1994-01-01'
     AND o_orderdate < date'1994-01-01' + interval 1 year
    GROUP BY
     n_name
    ORDER BY
     revenue DESC;
    
    SELECT
     sum(l_extendedprice * l_discount) AS revenue
    FROM
     lineitem
    WHERE
     l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year
     AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
     AND l_quantity < 24;
    
    SELECT
     supp_nation,
     cust_nation,
     l_year,
     sum(volume) AS revenue
    FROM (
     SELECT
     n1.n_name AS supp_nation,
     n2.n_name AS cust_nation,
     extract(year FROM l_shipdate) AS l_year,
     l_extendedprice * (1 - l_discount) AS volume
     FROM
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2
     WHERE
     s_suppkey = l_suppkey
     AND o_orderkey = l_orderkey
     AND c_custkey = o_custkey
     AND s_nationkey = n1.n_nationkey
     AND c_nationkey = n2.n_nationkey
     AND ((n1.n_name = 'FRANCE'
     AND n2.n_name = 'GERMANY')
     OR (n1.n_name = 'GERMANY'
     AND n2.n_name = 'FRANCE'))
     AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
    GROUP BY
     supp_nation,
     cust_nation,
     l_year
    ORDER BY
     supp_nation,
     cust_nation,
     l_year;
    
    SELECT
     o_year,
     sum(
     CASE WHEN nation = 'BRAZIL' THEN
     volume
     ELSE
     0
     END) / sum(volume) AS mkt_share
    FROM (
     SELECT
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) AS volume,
     n2.n_name AS nation
     FROM
     part,
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2,
     region
     WHERE
     p_partkey = l_partkey
     AND s_suppkey = l_suppkey
     AND l_orderkey = o_orderkey
     AND o_custkey = c_custkey
     AND c_nationkey = n1.n_nationkey
     AND n1.n_regionkey = r_regionkey
     AND r_name = 'AMERICA'
     AND s_nationkey = n2.n_nationkey
     AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
     AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
    GROUP BY
     o_year
    ORDER BY
     o_year;
    
    SELECT
     nation,
     o_year,
     sum(amount) AS sum_profit
    FROM (
     SELECT
     n_name AS nation,
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
     FROM
     part,
     supplier,
     lineitem,
     partsupp,
     orders,
     nation
     WHERE
     s_suppkey = l_suppkey
     AND ps_suppkey = l_suppkey
     AND ps_partkey = l_partkey
     AND p_partkey = l_partkey
     AND o_orderkey = l_orderkey
     AND s_nationkey = n_nationkey
     AND p_name LIKE '%green%') AS profit
    GROUP BY
     nation,
     o_year
    ORDER BY
     nation,
     o_year DESC;
    
    SELECT
     c_custkey,
     c_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     c_acctbal,
     n_name,
     c_address,
     c_phone,
     c_comment
    FROM
     customer,
     orders,
     lineitem,
     nation
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
     AND l_returnflag = 'R'
     AND c_nationkey = n_nationkey
    GROUP BY
     c_custkey,
     c_name,
     c_acctbal,
     c_phone,
     n_name,
     c_address,
     c_comment
    ORDER BY
     revenue DESC
    LIMIT 20;
    
    SELECT
     ps_partkey,
     sum(ps_supplycost * ps_availqty) AS value
    FROM
     partsupp,
     supplier,
     nation
    WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY'
    GROUP BY
     ps_partkey
    HAVING
     sum(ps_supplycost * ps_availqty) > (
     SELECT
     sum(ps_supplycost * ps_availqty) * 0.0001000000
     FROM
     partsupp,
     supplier,
     nation
     WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY')
    ORDER BY
     value DESC;
    
    SELECT
     l_shipmode,
     sum(
     CASE WHEN o_orderpriority = '1-URGENT'
     OR o_orderpriority = '2-HIGH' THEN
     1
     ELSE
     0
     END) AS high_line_count,
     sum(
     CASE WHEN o_orderpriority <> '1-URGENT'
     AND o_orderpriority <> '2-HIGH' THEN
     1
     ELSE
     0
     END) AS low_line_count
    FROM
     orders,
     lineitem
    WHERE
     o_orderkey = l_orderkey
     AND l_shipmode IN ('MAIL', 'SHIP')
     AND l_commitdate < l_receiptdate
     AND l_shipdate < l_commitdate
     AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
    GROUP BY
     l_shipmode
    ORDER BY
     l_shipmode;
    
    SELECT
     c_count,
     count(*) AS custdist
    FROM (
     SELECT
     c_custkey,
     count(o_orderkey)
     FROM
     customer
     LEFT OUTER JOIN orders ON c_custkey = o_custkey
     AND o_comment NOT LIKE '%special%requests%'
    GROUP BY
     c_custkey) AS c_orders (c_custkey,
     c_count)
    GROUP BY
     c_count
    ORDER BY
     custdist DESC,
     c_count DESC;
    
    SELECT
     100.00 * sum(
     CASE WHEN p_type LIKE 'PROMO%' THEN
     l_extendedprice * (1 - l_discount)
     ELSE
     0
     END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
    FROM
     lineitem,
     part
    WHERE
     l_partkey = p_partkey
     AND l_shipdate >= date'1995-09-01'
     AND l_shipdate < date'1995-09-01' + interval 1 month;
    
    
    SELECT
     s_suppkey,
     s_name,
     s_address,
     s_phone,
     total_revenue
    FROM
     supplier,
     (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue0
    WHERE
     s_suppkey = supplier_no
     AND total_revenue = (
     SELECT
     max(total_revenue)
     FROM (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue1)
    ORDER BY
     s_suppkey;
    
    SELECT
     p_brand,
     p_type,
     p_size,
     count(DISTINCT ps_suppkey) AS supplier_cnt
    FROM
     partsupp,
     part
    WHERE
     p_partkey = ps_partkey
     AND p_brand <> 'Brand#45'
     AND p_type NOT LIKE 'MEDIUM POLISHED%'
     AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
     AND ps_suppkey NOT IN (
     SELECT
     s_suppkey
     FROM
     supplier
     WHERE
     s_comment LIKE '%Customer%Complaints%')
    GROUP BY
     p_brand,
     p_type,
     p_size
    ORDER BY
     supplier_cnt DESC,
     p_brand,
     p_type,
     p_size;
    
    SELECT
     sum(l_extendedprice) / 7.0 AS avg_yearly
    FROM
     lineitem,
     part
    WHERE
     p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container = 'MED BOX'
     AND l_quantity < (
     SELECT
     0.2 * avg(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = p_partkey);
    
    SELECT
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice,
     sum(l_quantity)
    FROM
     customer,
     orders,
     lineitem
    WHERE
     o_orderkey IN (
     SELECT
     l_orderkey
     FROM
     lineitem
     GROUP BY
     l_orderkey
     HAVING
     sum(l_quantity) > 300)
     AND c_custkey = o_custkey
     AND o_orderkey = l_orderkey
    GROUP BY
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice
    ORDER BY
     o_totalprice DESC,
     o_orderdate
    LIMIT 100;
    
    SELECT
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     lineitem,
     part
    WHERE (p_partkey = l_partkey
     AND p_brand = 'Brand#12'
     AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
     AND l_quantity >= 1
     AND l_quantity <= 1 + 10
     AND p_size BETWEEN 1 AND 5
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
     AND l_quantity >= 10
     AND l_quantity <= 10 + 10
     AND p_size BETWEEN 1 AND 10
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#34'
     AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
     AND l_quantity >= 20
     AND l_quantity <= 20 + 10
     AND p_size BETWEEN 1 AND 15
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON');
    
    SELECT
     s_name,
     s_address
    FROM
     supplier,
     nation
    WHERE
     s_suppkey IN (
     SELECT
     ps_suppkey
     FROM
     partsupp
     WHERE
     ps_partkey IN (
     SELECT
     p_partkey
     FROM
     part
     WHERE
     p_name LIKE 'forest%')
     AND ps_availqty > (
     SELECT
     0.5 * sum(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = ps_partkey
     AND l_suppkey = ps_suppkey
     AND l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year))
     AND s_nationkey = n_nationkey
     AND n_name = 'CANADA'
     ORDER BY
     s_name;
    
    SELECT
     s_name,
     count(*) AS numwait
    FROM
     supplier,
     lineitem l1,
     orders,
     nation
    WHERE
     s_suppkey = l1.l_suppkey
     AND o_orderkey = l1.l_orderkey
     AND o_orderstatus = 'F'
     AND l1.l_receiptdate > l1.l_commitdate
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem l2
     WHERE
     l2.l_orderkey = l1.l_orderkey
     AND l2.l_suppkey <> l1.l_suppkey)
     AND NOT EXISTS (
     SELECT
     *
     FROM
     lineitem l3
     WHERE
     l3.l_orderkey = l1.l_orderkey
     AND l3.l_suppkey <> l1.l_suppkey
     AND l3.l_receiptdate > l3.l_commitdate)
     AND s_nationkey = n_nationkey
     AND n_name = 'SAUDI ARABIA'
    GROUP BY
     s_name
    ORDER BY
     numwait DESC,
     s_name
    LIMIT 100;
    
    SELECT
     cntrycode,
     count(*) AS numcust,
     sum(c_acctbal) AS totacctbal
    FROM (
     SELECT
     substring(c_phone FROM 1 FOR 2) AS cntrycode,
     c_acctbal
     FROM
     customer
     WHERE
     substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
     AND c_acctbal > (
     SELECT
     avg(c_acctbal)
     FROM
     customer
     WHERE
     c_acctbal > 0.00
     AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
     AND NOT EXISTS (
     SELECT
     *
     FROM
     orders
     WHERE
     o_custkey = c_custkey)) AS custsale
    GROUP BY
     cntrycode
    ORDER BY
     cntrycode;
    

Test case 2: Use the High-performance Edition Spark confidential engine to process plaintext data

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and the job resource group.

  4. Enable the Spark confidential engine and create a database.

    -- Enable native computing.
    SET spark.adb.native.enabled=true;
    -- Configure Spark resource specifications.
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    -- Create a database.
    CREATE DATABASE adb_external_db;
  5. Create an OSS external table to store plaintext data. The LOCATION parameter specifies the OSS path of the TPC-H test dataset. In this example, the customer table is used, and the OSS path is oss://testBucketName/adb/Spark/customer.

    SET spark.adb.native.enabled=true;
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    USE adb_external_db;
    CREATE TABLE IF NOT EXISTS customer 
     (
     c_custkey long,
     c_name string,
     c_address string,
     c_nationkey long,
     c_phone string,
     c_acctbal decimal(12, 2),
     c_mktsegment string,
     c_comment string
    )
    USING parquet 
    LOCATION 'oss://testBucketName/adb/Spark/customer';
  6. Perform SQL queries and record the total execution duration.

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    SELECT
     l_returnflag,
     l_linestatus,
     sum(l_quantity) AS sum_qty,
     sum(l_extendedprice) AS sum_base_price,
     sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
     avg(l_quantity) AS avg_qty,
     avg(l_extendedprice) AS avg_price,
     avg(l_discount) AS avg_disc,
     count(*) AS count_order
    FROM
     lineitem
    WHERE
     l_shipdate <= date'1998-09-02' - interval 1 day
    GROUP BY
     l_returnflag,
     l_linestatus
    ORDER BY
     l_returnflag,
     l_linestatus;
    
    SELECT
     s_acctbal,
     s_name,
     n_name,
     p_partkey,
     p_mfgr,
     s_address,
     s_phone,
     s_comment
    FROM
     part,
     supplier,
     partsupp,
     nation,
     region
    WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND p_size = 15
     AND p_type LIKE '%BRASS'
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE'
     AND ps_supplycost = (
     SELECT
     min(ps_supplycost)
     FROM
     partsupp,
     supplier,
     nation,
     region
     WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE')
    ORDER BY
     s_acctbal DESC,
     n_name,
     s_name,
     p_partkey
    LIMIT 100;
    
    
    SELECT
     l_orderkey,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     o_orderdate,
     o_shippriority
    FROM
     customer,
     orders,
     lineitem
    WHERE
     c_mktsegment = 'BUILDING'
     AND c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate < date'1995-03-15'
     AND l_shipdate > date'1995-03-15'
    GROUP BY
     l_orderkey,
     o_orderdate,
     o_shippriority
    ORDER BY
     revenue DESC,
     o_orderdate
    LIMIT 10;
    
    SELECT
     o_orderpriority,
     count(*) AS order_count
    FROM
     orders
    WHERE
     o_orderdate >= date'1993-07-01'
     AND o_orderdate < date'1993-07-01' + interval 3 month
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem
     WHERE
     l_orderkey = o_orderkey
     AND l_commitdate < l_receiptdate)
    GROUP BY
     o_orderpriority
    ORDER BY
     o_orderpriority;
    
    SELECT
     n_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     customer,
     orders,
     lineitem,
     supplier,
     nation,
     region
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND l_suppkey = s_suppkey
     AND c_nationkey = s_nationkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'ASIA'
     AND o_orderdate >= date'1994-01-01'
     AND o_orderdate < date'1994-01-01' + interval 1 year
    GROUP BY
     n_name
    ORDER BY
     revenue DESC;
    
    SELECT
     sum(l_extendedprice * l_discount) AS revenue
    FROM
     lineitem
    WHERE
     l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year
     AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
     AND l_quantity < 24;
    
    SELECT
     supp_nation,
     cust_nation,
     l_year,
     sum(volume) AS revenue
    FROM (
     SELECT
     n1.n_name AS supp_nation,
     n2.n_name AS cust_nation,
     extract(year FROM l_shipdate) AS l_year,
     l_extendedprice * (1 - l_discount) AS volume
     FROM
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2
     WHERE
     s_suppkey = l_suppkey
     AND o_orderkey = l_orderkey
     AND c_custkey = o_custkey
     AND s_nationkey = n1.n_nationkey
     AND c_nationkey = n2.n_nationkey
     AND ((n1.n_name = 'FRANCE'
     AND n2.n_name = 'GERMANY')
     OR (n1.n_name = 'GERMANY'
     AND n2.n_name = 'FRANCE'))
     AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
    GROUP BY
     supp_nation,
     cust_nation,
     l_year
    ORDER BY
     supp_nation,
     cust_nation,
     l_year;
    
    SELECT
     o_year,
     sum(
     CASE WHEN nation = 'BRAZIL' THEN
     volume
     ELSE
     0
     END) / sum(volume) AS mkt_share
    FROM (
     SELECT
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) AS volume,
     n2.n_name AS nation
     FROM
     part,
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2,
     region
     WHERE
     p_partkey = l_partkey
     AND s_suppkey = l_suppkey
     AND l_orderkey = o_orderkey
     AND o_custkey = c_custkey
     AND c_nationkey = n1.n_nationkey
     AND n1.n_regionkey = r_regionkey
     AND r_name = 'AMERICA'
     AND s_nationkey = n2.n_nationkey
     AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
     AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
    GROUP BY
     o_year
    ORDER BY
     o_year;
    
    SELECT
     nation,
     o_year,
     sum(amount) AS sum_profit
    FROM (
     SELECT
     n_name AS nation,
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
     FROM
     part,
     supplier,
     lineitem,
     partsupp,
     orders,
     nation
     WHERE
     s_suppkey = l_suppkey
     AND ps_suppkey = l_suppkey
     AND ps_partkey = l_partkey
     AND p_partkey = l_partkey
     AND o_orderkey = l_orderkey
     AND s_nationkey = n_nationkey
     AND p_name LIKE '%green%') AS profit
    GROUP BY
     nation,
     o_year
    ORDER BY
     nation,
     o_year DESC;
    
    SELECT
     c_custkey,
     c_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     c_acctbal,
     n_name,
     c_address,
     c_phone,
     c_comment
    FROM
     customer,
     orders,
     lineitem,
     nation
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
     AND l_returnflag = 'R'
     AND c_nationkey = n_nationkey
    GROUP BY
     c_custkey,
     c_name,
     c_acctbal,
     c_phone,
     n_name,
     c_address,
     c_comment
    ORDER BY
     revenue DESC
    LIMIT 20;
    
    SELECT
     ps_partkey,
     sum(ps_supplycost * ps_availqty) AS value
    FROM
     partsupp,
     supplier,
     nation
    WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY'
    GROUP BY
     ps_partkey
    HAVING
     sum(ps_supplycost * ps_availqty) > (
     SELECT
     sum(ps_supplycost * ps_availqty) * 0.0001000000
     FROM
     partsupp,
     supplier,
     nation
     WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY')
    ORDER BY
     value DESC;
    
    SELECT
     l_shipmode,
     sum(
     CASE WHEN o_orderpriority = '1-URGENT'
     OR o_orderpriority = '2-HIGH' THEN
     1
     ELSE
     0
     END) AS high_line_count,
     sum(
     CASE WHEN o_orderpriority <> '1-URGENT'
     AND o_orderpriority <> '2-HIGH' THEN
     1
     ELSE
     0
     END) AS low_line_count
    FROM
     orders,
     lineitem
    WHERE
     o_orderkey = l_orderkey
     AND l_shipmode IN ('MAIL', 'SHIP')
     AND l_commitdate < l_receiptdate
     AND l_shipdate < l_commitdate
     AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
    GROUP BY
     l_shipmode
    ORDER BY
     l_shipmode;
    
    SELECT
     c_count,
     count(*) AS custdist
    FROM (
     SELECT
     c_custkey,
     count(o_orderkey)
     FROM
     customer
     LEFT OUTER JOIN orders ON c_custkey = o_custkey
     AND o_comment NOT LIKE '%special%requests%'
    GROUP BY
     c_custkey) AS c_orders (c_custkey,
     c_count)
    GROUP BY
     c_count
    ORDER BY
     custdist DESC,
     c_count DESC;
    
    SELECT
     100.00 * sum(
     CASE WHEN p_type LIKE 'PROMO%' THEN
     l_extendedprice * (1 - l_discount)
     ELSE
     0
     END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
    FROM
     lineitem,
     part
    WHERE
     l_partkey = p_partkey
     AND l_shipdate >= date'1995-09-01'
     AND l_shipdate < date'1995-09-01' + interval 1 month;
    
    
    SELECT
     s_suppkey,
     s_name,
     s_address,
     s_phone,
     total_revenue
    FROM
     supplier,
     (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue0
    WHERE
     s_suppkey = supplier_no
     AND total_revenue = (
     SELECT
     max(total_revenue)
     FROM (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue1)
    ORDER BY
     s_suppkey;
    
    SELECT
     p_brand,
     p_type,
     p_size,
     count(DISTINCT ps_suppkey) AS supplier_cnt
    FROM
     partsupp,
     part
    WHERE
     p_partkey = ps_partkey
     AND p_brand <> 'Brand#45'
     AND p_type NOT LIKE 'MEDIUM POLISHED%'
     AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
     AND ps_suppkey NOT IN (
     SELECT
     s_suppkey
     FROM
     supplier
     WHERE
     s_comment LIKE '%Customer%Complaints%')
    GROUP BY
     p_brand,
     p_type,
     p_size
    ORDER BY
     supplier_cnt DESC,
     p_brand,
     p_type,
     p_size;
    
    SELECT
     sum(l_extendedprice) / 7.0 AS avg_yearly
    FROM
     lineitem,
     part
    WHERE
     p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container = 'MED BOX'
     AND l_quantity < (
     SELECT
     0.2 * avg(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = p_partkey);
    
    SELECT
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice,
     sum(l_quantity)
    FROM
     customer,
     orders,
     lineitem
    WHERE
     o_orderkey IN (
     SELECT
     l_orderkey
     FROM
     lineitem
     GROUP BY
     l_orderkey
     HAVING
     sum(l_quantity) > 300)
     AND c_custkey = o_custkey
     AND o_orderkey = l_orderkey
    GROUP BY
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice
    ORDER BY
     o_totalprice DESC,
     o_orderdate
    LIMIT 100;
    
    SELECT
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     lineitem,
     part
    WHERE (p_partkey = l_partkey
     AND p_brand = 'Brand#12'
     AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
     AND l_quantity >= 1
     AND l_quantity <= 1 + 10
     AND p_size BETWEEN 1 AND 5
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
     AND l_quantity >= 10
     AND l_quantity <= 10 + 10
     AND p_size BETWEEN 1 AND 10
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#34'
     AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
     AND l_quantity >= 20
     AND l_quantity <= 20 + 10
     AND p_size BETWEEN 1 AND 15
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON');
    
    SELECT
     s_name,
     s_address
    FROM
     supplier,
     nation
    WHERE
     s_suppkey IN (
     SELECT
     ps_suppkey
     FROM
     partsupp
     WHERE
     ps_partkey IN (
     SELECT
     p_partkey
     FROM
     part
     WHERE
     p_name LIKE 'forest%')
     AND ps_availqty > (
     SELECT
     0.5 * sum(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = ps_partkey
     AND l_suppkey = ps_suppkey
     AND l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year))
     AND s_nationkey = n_nationkey
     AND n_name = 'CANADA'
     ORDER BY
     s_name;
    
    SELECT
     s_name,
     count(*) AS numwait
    FROM
     supplier,
     lineitem l1,
     orders,
     nation
    WHERE
     s_suppkey = l1.l_suppkey
     AND o_orderkey = l1.l_orderkey
     AND o_orderstatus = 'F'
     AND l1.l_receiptdate > l1.l_commitdate
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem l2
     WHERE
     l2.l_orderkey = l1.l_orderkey
     AND l2.l_suppkey <> l1.l_suppkey)
     AND NOT EXISTS (
     SELECT
     *
     FROM
     lineitem l3
     WHERE
     l3.l_orderkey = l1.l_orderkey
     AND l3.l_suppkey <> l1.l_suppkey
     AND l3.l_receiptdate > l3.l_commitdate)
     AND s_nationkey = n_nationkey
     AND n_name = 'SAUDI ARABIA'
    GROUP BY
     s_name
    ORDER BY
     numwait DESC,
     s_name
    LIMIT 100;
    
    SELECT
     cntrycode,
     count(*) AS numcust,
     sum(c_acctbal) AS totacctbal
    FROM (
     SELECT
     substring(c_phone FROM 1 FOR 2) AS cntrycode,
     c_acctbal
     FROM
     customer
     WHERE
     substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
     AND c_acctbal > (
     SELECT
     avg(c_acctbal)
     FROM
     customer
     WHERE
     c_acctbal > 0.00
     AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
     AND NOT EXISTS (
     SELECT
     *
     FROM
     orders
     WHERE
     o_custkey = c_custkey)) AS custsale
    GROUP BY
     cntrycode
    ORDER BY
     cntrycode;
    

Test case 3: Use the High-performance Edition Spark confidential engine to process ciphertext data

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and the job resource group.

  4. Enable the Spark confidential engine, specify a list of master encryption keys (MEKs), and then create a database.

    -- Enable native computing.
    SET spark.adb.native.enabled=true;
    -- Specify a list of MEKs, a Key Management Service (KMS) client class, and a CryptoFactory class. After you enable the Spark confidential engine, the engine can support plaintext and ciphertext data.
    SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
    SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
    SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
    -- Configure Spark resource specifications.
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    -- Create a database.
    CREATE database IF NOT EXISTS adb_encryption_db;
  5. Create an OSS external table to store plaintext data. The LOCATION parameter specifies the OSS path of the TPC-H test dataset. In this example, the customer_tmp table is used, and the OSS path is oss://testBucketName/adb/Spark/customer.

    SET spark.adb.native.enabled=true;
    SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
    SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
    SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    USE adb_encryption_db;
    CREATE TABLE IF NOT EXISTS customer_tmp
     (
     c_custkey long,
     c_name string,
     c_address string,
     c_nationkey long,
     c_phone string,
     c_acctbal decimal(12, 2),
     c_mktsegment string,
     c_comment string
    )
    USING parquet 
    LOCATION 'oss://testBucketName/adb/Spark/customer';
  6. Create an OSS external table to store ciphertext data. In this example, the customer external table is stored in the oss://testBucketName/adb/Spark/enc_customer path.

    SET spark.adb.native.enabled=true;
    SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
    SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
    SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    USE adb_encryption_db;
    CREATE TABLE IF NOT EXISTS customer
    USING Parquet
    OPTIONS (
     'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone',
     'parquet.encryption.footer.key'='kf'
    )
    LOCATION 'oss://testBucketName/adb/Spark/enc_customer'
    AS
    SELECT *
    FROM customer_tmp;
  7. Perform SQL queries and record the total execution duration.

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    SELECT
     l_returnflag,
     l_linestatus,
     sum(l_quantity) AS sum_qty,
     sum(l_extendedprice) AS sum_base_price,
     sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
     avg(l_quantity) AS avg_qty,
     avg(l_extendedprice) AS avg_price,
     avg(l_discount) AS avg_disc,
     count(*) AS count_order
    FROM
     lineitem
    WHERE
     l_shipdate <= date'1998-09-02' - interval 1 day
    GROUP BY
     l_returnflag,
     l_linestatus
    ORDER BY
     l_returnflag,
     l_linestatus;
    
    SELECT
     s_acctbal,
     s_name,
     n_name,
     p_partkey,
     p_mfgr,
     s_address,
     s_phone,
     s_comment
    FROM
     part,
     supplier,
     partsupp,
     nation,
     region
    WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND p_size = 15
     AND p_type LIKE '%BRASS'
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE'
     AND ps_supplycost = (
     SELECT
     min(ps_supplycost)
     FROM
     partsupp,
     supplier,
     nation,
     region
     WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE')
    ORDER BY
     s_acctbal DESC,
     n_name,
     s_name,
     p_partkey
    LIMIT 100;
    
    
    SELECT
     l_orderkey,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     o_orderdate,
     o_shippriority
    FROM
     customer,
     orders,
     lineitem
    WHERE
     c_mktsegment = 'BUILDING'
     AND c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate < date'1995-03-15'
     AND l_shipdate > date'1995-03-15'
    GROUP BY
     l_orderkey,
     o_orderdate,
     o_shippriority
    ORDER BY
     revenue DESC,
     o_orderdate
    LIMIT 10;
    
    SELECT
     o_orderpriority,
     count(*) AS order_count
    FROM
     orders
    WHERE
     o_orderdate >= date'1993-07-01'
     AND o_orderdate < date'1993-07-01' + interval 3 month
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem
     WHERE
     l_orderkey = o_orderkey
     AND l_commitdate < l_receiptdate)
    GROUP BY
     o_orderpriority
    ORDER BY
     o_orderpriority;
    
    SELECT
     n_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     customer,
     orders,
     lineitem,
     supplier,
     nation,
     region
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND l_suppkey = s_suppkey
     AND c_nationkey = s_nationkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'ASIA'
     AND o_orderdate >= date'1994-01-01'
     AND o_orderdate < date'1994-01-01' + interval 1 year
    GROUP BY
     n_name
    ORDER BY
     revenue DESC;
    
    SELECT
     sum(l_extendedprice * l_discount) AS revenue
    FROM
     lineitem
    WHERE
     l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year
     AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
     AND l_quantity < 24;
    
    SELECT
     supp_nation,
     cust_nation,
     l_year,
     sum(volume) AS revenue
    FROM (
     SELECT
     n1.n_name AS supp_nation,
     n2.n_name AS cust_nation,
     extract(year FROM l_shipdate) AS l_year,
     l_extendedprice * (1 - l_discount) AS volume
     FROM
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2
     WHERE
     s_suppkey = l_suppkey
     AND o_orderkey = l_orderkey
     AND c_custkey = o_custkey
     AND s_nationkey = n1.n_nationkey
     AND c_nationkey = n2.n_nationkey
     AND ((n1.n_name = 'FRANCE'
     AND n2.n_name = 'GERMANY')
     OR (n1.n_name = 'GERMANY'
     AND n2.n_name = 'FRANCE'))
     AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
    GROUP BY
     supp_nation,
     cust_nation,
     l_year
    ORDER BY
     supp_nation,
     cust_nation,
     l_year;
    
    SELECT
     o_year,
     sum(
     CASE WHEN nation = 'BRAZIL' THEN
     volume
     ELSE
     0
     END) / sum(volume) AS mkt_share
    FROM (
     SELECT
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) AS volume,
     n2.n_name AS nation
     FROM
     part,
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2,
     region
     WHERE
     p_partkey = l_partkey
     AND s_suppkey = l_suppkey
     AND l_orderkey = o_orderkey
     AND o_custkey = c_custkey
     AND c_nationkey = n1.n_nationkey
     AND n1.n_regionkey = r_regionkey
     AND r_name = 'AMERICA'
     AND s_nationkey = n2.n_nationkey
     AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
     AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
    GROUP BY
     o_year
    ORDER BY
     o_year;
    
    SELECT
     nation,
     o_year,
     sum(amount) AS sum_profit
    FROM (
     SELECT
     n_name AS nation,
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
     FROM
     part,
     supplier,
     lineitem,
     partsupp,
     orders,
     nation
     WHERE
     s_suppkey = l_suppkey
     AND ps_suppkey = l_suppkey
     AND ps_partkey = l_partkey
     AND p_partkey = l_partkey
     AND o_orderkey = l_orderkey
     AND s_nationkey = n_nationkey
     AND p_name LIKE '%green%') AS profit
    GROUP BY
     nation,
     o_year
    ORDER BY
     nation,
     o_year DESC;
    
    SELECT
     c_custkey,
     c_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     c_acctbal,
     n_name,
     c_address,
     c_phone,
     c_comment
    FROM
     customer,
     orders,
     lineitem,
     nation
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
     AND l_returnflag = 'R'
     AND c_nationkey = n_nationkey
    GROUP BY
     c_custkey,
     c_name,
     c_acctbal,
     c_phone,
     n_name,
     c_address,
     c_comment
    ORDER BY
     revenue DESC
    LIMIT 20;
    
    SELECT
     ps_partkey,
     sum(ps_supplycost * ps_availqty) AS value
    FROM
     partsupp,
     supplier,
     nation
    WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY'
    GROUP BY
     ps_partkey
    HAVING
     sum(ps_supplycost * ps_availqty) > (
     SELECT
     sum(ps_supplycost * ps_availqty) * 0.0001000000
     FROM
     partsupp,
     supplier,
     nation
     WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY')
    ORDER BY
     value DESC;
    
    SELECT
     l_shipmode,
     sum(
     CASE WHEN o_orderpriority = '1-URGENT'
     OR o_orderpriority = '2-HIGH' THEN
     1
     ELSE
     0
     END) AS high_line_count,
     sum(
     CASE WHEN o_orderpriority <> '1-URGENT'
     AND o_orderpriority <> '2-HIGH' THEN
     1
     ELSE
     0
     END) AS low_line_count
    FROM
     orders,
     lineitem
    WHERE
     o_orderkey = l_orderkey
     AND l_shipmode IN ('MAIL', 'SHIP')
     AND l_commitdate < l_receiptdate
     AND l_shipdate < l_commitdate
     AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
    GROUP BY
     l_shipmode
    ORDER BY
     l_shipmode;
    
    SELECT
     c_count,
     count(*) AS custdist
    FROM (
     SELECT
     c_custkey,
     count(o_orderkey)
     FROM
     customer
     LEFT OUTER JOIN orders ON c_custkey = o_custkey
     AND o_comment NOT LIKE '%special%requests%'
    GROUP BY
     c_custkey) AS c_orders (c_custkey,
     c_count)
    GROUP BY
     c_count
    ORDER BY
     custdist DESC,
     c_count DESC;
    
    SELECT
     100.00 * sum(
     CASE WHEN p_type LIKE 'PROMO%' THEN
     l_extendedprice * (1 - l_discount)
     ELSE
     0
     END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
    FROM
     lineitem,
     part
    WHERE
     l_partkey = p_partkey
     AND l_shipdate >= date'1995-09-01'
     AND l_shipdate < date'1995-09-01' + interval 1 month;
    
    
    SELECT
     s_suppkey,
     s_name,
     s_address,
     s_phone,
     total_revenue
    FROM
     supplier,
     (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue0
    WHERE
     s_suppkey = supplier_no
     AND total_revenue = (
     SELECT
     max(total_revenue)
     FROM (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue1)
    ORDER BY
     s_suppkey;
    
    SELECT
     p_brand,
     p_type,
     p_size,
     count(DISTINCT ps_suppkey) AS supplier_cnt
    FROM
     partsupp,
     part
    WHERE
     p_partkey = ps_partkey
     AND p_brand <> 'Brand#45'
     AND p_type NOT LIKE 'MEDIUM POLISHED%'
     AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
     AND ps_suppkey NOT IN (
     SELECT
     s_suppkey
     FROM
     supplier
     WHERE
     s_comment LIKE '%Customer%Complaints%')
    GROUP BY
     p_brand,
     p_type,
     p_size
    ORDER BY
     supplier_cnt DESC,
     p_brand,
     p_type,
     p_size;
    
    SELECT
     sum(l_extendedprice) / 7.0 AS avg_yearly
    FROM
     lineitem,
     part
    WHERE
     p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container = 'MED BOX'
     AND l_quantity < (
     SELECT
     0.2 * avg(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = p_partkey);
    
    SELECT
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice,
     sum(l_quantity)
    FROM
     customer,
     orders,
     lineitem
    WHERE
     o_orderkey IN (
     SELECT
     l_orderkey
     FROM
     lineitem
     GROUP BY
     l_orderkey
     HAVING
     sum(l_quantity) > 300)
     AND c_custkey = o_custkey
     AND o_orderkey = l_orderkey
    GROUP BY
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice
    ORDER BY
     o_totalprice DESC,
     o_orderdate
    LIMIT 100;
    
    SELECT
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     lineitem,
     part
    WHERE (p_partkey = l_partkey
     AND p_brand = 'Brand#12'
     AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
     AND l_quantity >= 1
     AND l_quantity <= 1 + 10
     AND p_size BETWEEN 1 AND 5
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
     AND l_quantity >= 10
     AND l_quantity <= 10 + 10
     AND p_size BETWEEN 1 AND 10
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#34'
     AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
     AND l_quantity >= 20
     AND l_quantity <= 20 + 10
     AND p_size BETWEEN 1 AND 15
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON');
    
    SELECT
     s_name,
     s_address
    FROM
     supplier,
     nation
    WHERE
     s_suppkey IN (
     SELECT
     ps_suppkey
     FROM
     partsupp
     WHERE
     ps_partkey IN (
     SELECT
     p_partkey
     FROM
     part
     WHERE
     p_name LIKE 'forest%')
     AND ps_availqty > (
     SELECT
     0.5 * sum(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = ps_partkey
     AND l_suppkey = ps_suppkey
     AND l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year))
     AND s_nationkey = n_nationkey
     AND n_name = 'CANADA'
     ORDER BY
     s_name;
    
    SELECT
     s_name,
     count(*) AS numwait
    FROM
     supplier,
     lineitem l1,
     orders,
     nation
    WHERE
     s_suppkey = l1.l_suppkey
     AND o_orderkey = l1.l_orderkey
     AND o_orderstatus = 'F'
     AND l1.l_receiptdate > l1.l_commitdate
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem l2
     WHERE
     l2.l_orderkey = l1.l_orderkey
     AND l2.l_suppkey <> l1.l_suppkey)
     AND NOT EXISTS (
     SELECT
     *
     FROM
     lineitem l3
     WHERE
     l3.l_orderkey = l1.l_orderkey
     AND l3.l_suppkey <> l1.l_suppkey
     AND l3.l_receiptdate > l3.l_commitdate)
     AND s_nationkey = n_nationkey
     AND n_name = 'SAUDI ARABIA'
    GROUP BY
     s_name
    ORDER BY
     numwait DESC,
     s_name
    LIMIT 100;
    
    SELECT
     cntrycode,
     count(*) AS numcust,
     sum(c_acctbal) AS totacctbal
    FROM (
     SELECT
     substring(c_phone FROM 1 FOR 2) AS cntrycode,
     c_acctbal
     FROM
     customer
     WHERE
     substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
     AND c_acctbal > (
     SELECT
     avg(c_acctbal)
     FROM
     customer
     WHERE
     c_acctbal > 0.00
     AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
     AND NOT EXISTS (
     SELECT
     *
     FROM
     orders
     WHERE
     o_custkey = c_custkey)) AS custsale
    GROUP BY
     cntrycode
    ORDER BY
     cntrycode;
    

Test results

The following table describes the test results of the preceding test cases.

Test case

Total execution duration (minutes)

Use AnalyticDB for MySQL Spark 3.2.0 to process plaintext data

72

Use the High-performance Edition Spark confidential engine to process plaintext data

34

Use the High-performance Edition Spark confidential engine to process ciphertext data

37

image

The preceding test results indicate that the performance of the High-performance Edition Spark confidential engine is 1.9 times the performance of AnalyticDB for MySQL Spark 3.2.0 in scenarios that involve large amounts of data. The High-performance Edition Spark confidential engine ensures secure data transmission and storage and improves data processing efficiency.