AnalyticDB for MySQL allows you to use Spark SQL to read and write C-Store tables, the tables that use the XUANWU engine, in real-time or batch mode. This topic describes how to use Spark SQL to read and write C-Store tables in real-time or batch mode.
Prerequisites
An AnalyticDB for MySQL Data Lakehouse Edition cluster is created.
NoteAn AnalyticDB for MySQL Data Lakehouse Edition cluster that has more than 0 AnalyticDB compute units (ACUs) of reserved storage resources is created.
A job resource group is created for the AnalyticDB for MySQL Data Lakehouse Edition cluster. For more information, see Create a resource group.
A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition cluster.
Step 1: Go to the SQL Development page
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose
.On the SQLConsole tab, select the Spark engine and the job resource group.
Step 2: Create a database and a C-Store table
You can execute the following SQL statements in batch or interactive mode. For more information, see Spark SQL execution modes.
Execute the following statement to create a database:
CREATE DATABASE spark_create_adb_db_test;
Execute the following statement to create a C-Store table. For more information about the Spark SQL syntax, see Use Spark SQL to create a C-Store table.
CREATE TABLE spark_create_adb_db_test.test_adb_tbl ( id int, name string COMMENT 'test_name', age int ) using adb TBLPROPERTIES('primaryKey'='id,age', 'distributeType'='HASH', 'distributeColumns'='id', 'partitionType'='value', 'partitionColumn'='age', 'partitionCount'='120' );
Step 3: Read and write the C-Store table
You can execute the following SQL statement in batch or interactive mode. For more information, see Spark SQL execution modes.
When you execute a Spark SQL statement, the system returns a message indicating the success or failure of the execution but does not return data. To view data, you can go to the Spark JAR Development page and click Logs in the Actions column corresponding to your application on the Applications tab. For more information, see the "View information about a Spark application" section of the Spark editor topic.
To read and write data in batch mode, you need only to select the Spark engine and a job resource group. You do not need to configure other parameters. To read and write data in real-time mode, you must execute the SET statement to specify an interactive resource group for executing SQL statements.
Read and write data in batch mode
When you read and write data in batch mode, take note of the following items:
You can use Spark SQL to execute SELECT and INSERT statements only on partitioned C-Store tables, but not non-partitioned tables. For information about how to create a partitioned table, see CREATE TABLE.
You cannot use Spark SQL to execute UPDATE or DELETE statements on C-Store tables, including partitioned tables and non-partitioned tables.
Before you query hot data, you must use the XIHE engine to execute the following SQL statements to modify relevant configuration parameters. Otherwise, SQL statements fail to be executed.
SET adb_config CSTORE_HOT_TABLE_ALLOW_SINGLE_REPLICA_BUILD=true; SET adb_config ELASTIC_ENABLE_HOT_PARTITION_HAS_HDD_REPLICA=true; SET adb_config ELASTIC_PRODUCT_ENABLE_MIXED_STORAGE_POLICY=true;
After you modify the configuration parameters, you must forcibly trigger a BUILD job. For more information, see BUILD.
Write data to the C-Store table.
AnalyticDB for MySQL allows you to write data to C-Store tables in batch mode by using the INSERT OVERWRITE statement. You can use one of the following methods to write data to the C-Store table:
Method 1: Execute the INSERT OVERWRITE statement to write data to a static partition
INSERT OVERWRITE spark_create_adb_db_test.test_adb_tbl partition(age=10) VALUES (1, 'bom');
Method 2: Execute the INSERT OVERWRITE statement to write data to a dynamic partition
INSERT OVERWRITE spark_create_adb_db_test.test_adb_tbl partition (age) VALUES (1, 'bom', 10);
Read data from the C-Store table.
SELECT * FROM spark_create_adb_db_test.test_adb_tbl;
Read and write data in real-time mode
AnalyticDB for MySQL allows you to write data to C-Store tables in real-time mode by using the INSERT INTO statement.
-- Specify the database account, password, and internal endpoint of the cluster.
conf spark.adb.username=user;
conf spark.adb.password=password;
conf spark.adb.endpoint=amv-bp1a74zh1aqi2a6910000****.ads.aliyuncs.com:3306;
-- Enable Elastic Network Interface (ENI).
SET spark.adb.eni.enabled=true;
SET spark.adb.eni.vswitchId=vsw-bp12ldm83z4zu9k4d****;
SET spark.adb.eni.securityGroupId=sg-bp1cdm3acizrgq6x****;
-- Specify that the Java Database Connectivity (JDBC) mode is used.
SET spark.adb.useJdbc = true;
-- Specify an interactive resource group for executing SQL statements.
SET spark.adb.resourceGroup=user_default;
-- Insert data into the table.
INSERT INTO spark_create_adb_db_test.test_adb_tbl VALUES (1, 'adb', 20);
-- Read data from the table.
SELECT * FROM spark_create_adb_db_test.test_adb_tbl;
The following table describes the parameters.
Parameter | Description |
spark.adb.username | The name of the database account of the AnalyticDB for MySQL cluster. |
spark.adb.password | The password of the database account. |
spark.adb.endpoint | The internal endpoint and port number of the AnalyticDB for MySQL cluster. Example: |
spark.adb.eni.enabled | Specifies whether to enable ENI. To access data, you must set the |
spark.adb.eni.vswitchId | The vSwitch ID of the AnalyticDB for MySQL cluster. |
spark.adb.eni.securityGroupId | The security group ID of the AnalyticDB for MySQL cluster. You can select an existing security group or create another security group. Important The security group must reside in the same virtual private cloud (VPC) as the AnalyticDB for MySQL Data Lakehouse Edition cluster. |
spark.adb.useJdbc | Specifies whether to use the JDBC mode to access data. Valid values:
To write data in real-time mode, you must set this parameter to true. |
spark.adb.resourceGroup | The interactive resource group that is used to execute SQL statements. Note When you create an AnalyticDB for MySQL Data Lakehouse Edition cluster, an interactive resource group named |
Spark configuration parameters
The following table describes the supported configuration parameters when you use Spark SQL to read and write C-Store tables.
Parameter | Description | Default value |
spark.adb.write.batchSize | The maximum number of data records that can be written per batch in JDBC mode. The value must be an integer that is greater than 0. Note This parameter is supported only if you write data in real-time mode. | 600 |
spark.adb.write.arrow.maxMemoryBufferSize | The maximum size of the write memory buffer. Unit: MB. The value must be an integer that is greater than 0. Note This parameter is supported only if you write data in batch mode. | 1024 |
spark.adb.write.arrow.maxRecordSizePerBatch | The maximum number of data records that can be written per batch in SDK mode. The value must be an integer that is greater than 0. Note This parameter is supported only if you write data in batch mode. | 500 |