All Products
Search
Document Center

AnalyticDB:Use Spark SQL to read and write C-Store tables

Last Updated:Dec 24, 2024

AnalyticDB for MySQL allows you to use Spark SQL to read and write C-Store tables, the tables that use the XUANWU engine, in real-time or batch mode. This topic describes how to use Spark SQL to read and write C-Store tables in real-time or batch mode.

Prerequisites

  • An AnalyticDB for MySQL Data Lakehouse Edition cluster is created.

    Note

    An AnalyticDB for MySQL Data Lakehouse Edition cluster that has more than 0 AnalyticDB compute units (ACUs) of reserved storage resources is created.

  • A job resource group is created for the AnalyticDB for MySQL Data Lakehouse Edition cluster. For more information, see Create a resource group.

  • A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition cluster.

Step 1: Go to the SQL Development page

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and the job resource group.

Step 2: Create a database and a C-Store table

Note

You can execute the following SQL statements in batch or interactive mode. For more information, see Spark SQL execution modes.

  1. Execute the following statement to create a database:

    CREATE DATABASE spark_create_adb_db_test;
  2. Execute the following statement to create a C-Store table. For more information about the Spark SQL syntax, see Use Spark SQL to create a C-Store table.

    CREATE TABLE spark_create_adb_db_test.test_adb_tbl (
      id int,
      name string COMMENT 'test_name',
      age int
    ) using adb TBLPROPERTIES('primaryKey'='id,age',
                    'distributeType'='HASH', 
                    'distributeColumns'='id', 
                    'partitionType'='value', 
                    'partitionColumn'='age',   
                    'partitionCount'='120'     
                   );

Step 3: Read and write the C-Store table

Note
  • You can execute the following SQL statement in batch or interactive mode. For more information, see Spark SQL execution modes.

  • When you execute a Spark SQL statement, the system returns a message indicating the success or failure of the execution but does not return data. To view data, you can go to the Spark JAR Development page and click Logs in the Actions column corresponding to your application on the Applications tab. For more information, see the "View information about a Spark application" section of the Spark editor topic.

To read and write data in batch mode, you need only to select the Spark engine and a job resource group. You do not need to configure other parameters. To read and write data in real-time mode, you must execute the SET statement to specify an interactive resource group for executing SQL statements.

Read and write data in batch mode

Important

When you read and write data in batch mode, take note of the following items:

  • You can use Spark SQL to execute SELECT and INSERT statements only on partitioned C-Store tables, but not non-partitioned tables. For information about how to create a partitioned table, see CREATE TABLE.

  • You cannot use Spark SQL to execute UPDATE or DELETE statements on C-Store tables, including partitioned tables and non-partitioned tables.

  • Before you query hot data, you must use the XIHE engine to execute the following SQL statements to modify relevant configuration parameters. Otherwise, SQL statements fail to be executed.

    SET adb_config CSTORE_HOT_TABLE_ALLOW_SINGLE_REPLICA_BUILD=true;
    SET adb_config ELASTIC_ENABLE_HOT_PARTITION_HAS_HDD_REPLICA=true;
    SET adb_config ELASTIC_PRODUCT_ENABLE_MIXED_STORAGE_POLICY=true;

    After you modify the configuration parameters, you must forcibly trigger a BUILD job. For more information, see BUILD.

  1. Write data to the C-Store table.

    AnalyticDB for MySQL allows you to write data to C-Store tables in batch mode by using the INSERT OVERWRITE statement. You can use one of the following methods to write data to the C-Store table:

    • Method 1: Execute the INSERT OVERWRITE statement to write data to a static partition

      INSERT OVERWRITE spark_create_adb_db_test.test_adb_tbl partition(age=10) VALUES (1, 'bom');
    • Method 2: Execute the INSERT OVERWRITE statement to write data to a dynamic partition

      INSERT OVERWRITE spark_create_adb_db_test.test_adb_tbl partition (age) VALUES (1, 'bom', 10);
  2. Read data from the C-Store table.

    SELECT * FROM spark_create_adb_db_test.test_adb_tbl;

Read and write data in real-time mode

AnalyticDB for MySQL allows you to write data to C-Store tables in real-time mode by using the INSERT INTO statement.

-- Specify the database account, password, and internal endpoint of the cluster.
conf spark.adb.username=user;
conf spark.adb.password=password;
conf spark.adb.endpoint=amv-bp1a74zh1aqi2a6910000****.ads.aliyuncs.com:3306;
-- Enable Elastic Network Interface (ENI).
SET spark.adb.eni.enabled=true;
SET spark.adb.eni.vswitchId=vsw-bp12ldm83z4zu9k4d****;
SET spark.adb.eni.securityGroupId=sg-bp1cdm3acizrgq6x****;
-- Specify that the Java Database Connectivity (JDBC) mode is used.
SET spark.adb.useJdbc = true;
-- Specify an interactive resource group for executing SQL statements.
SET spark.adb.resourceGroup=user_default;
-- Insert data into the table.
INSERT INTO spark_create_adb_db_test.test_adb_tbl VALUES (1, 'adb', 20);
-- Read data from the table.
SELECT * FROM spark_create_adb_db_test.test_adb_tbl;

The following table describes the parameters.

Parameter

Description

spark.adb.username

The name of the database account of the AnalyticDB for MySQL cluster.

spark.adb.password

The password of the database account.

spark.adb.endpoint

The internal endpoint and port number of the AnalyticDB for MySQL cluster. Example: amv-bp1a74zh1aqi2a6910000****.ads.aliyuncs.com:3306.

spark.adb.eni.enabled

Specifies whether to enable ENI.

To access data, you must set the spark.adb.eni.enabled parameter to true.

spark.adb.eni.vswitchId

The vSwitch ID of the AnalyticDB for MySQL cluster.

spark.adb.eni.securityGroupId

The security group ID of the AnalyticDB for MySQL cluster. You can select an existing security group or create another security group.

Important

The security group must reside in the same virtual private cloud (VPC) as the AnalyticDB for MySQL Data Lakehouse Edition cluster.

spark.adb.useJdbc

Specifies whether to use the JDBC mode to access data. Valid values:

  • true

  • false (default)

To write data in real-time mode, you must set this parameter to true.

spark.adb.resourceGroup

The interactive resource group that is used to execute SQL statements.

Note

When you create an AnalyticDB for MySQL Data Lakehouse Edition cluster, an interactive resource group named user_default is created. You can select the resource group or create another interactive resource group. For information about how to create a resource group, see Create a resource group.

Spark configuration parameters

The following table describes the supported configuration parameters when you use Spark SQL to read and write C-Store tables.

Parameter

Description

Default value

spark.adb.write.batchSize

The maximum number of data records that can be written per batch in JDBC mode. The value must be an integer that is greater than 0.

Note

This parameter is supported only if you write data in real-time mode.

600

spark.adb.write.arrow.maxMemoryBufferSize

The maximum size of the write memory buffer. Unit: MB. The value must be an integer that is greater than 0.

Note

This parameter is supported only if you write data in batch mode.

1024

spark.adb.write.arrow.maxRecordSizePerBatch

The maximum number of data records that can be written per batch in SDK mode. The value must be an integer that is greater than 0.

Note

This parameter is supported only if you write data in batch mode.

500