Use Spark SQL in an AnalyticDB for MySQL cluster to read from and write to C-Store tables (XUANWU engine) in batch or real-time mode.
Prerequisites
Before you begin, make sure you have:
-
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster:
-
Data Lakehouse Edition: must have more than 0 ACUs of reserved storage resources
-
Enterprise Edition: must have more than 0 ACUs of reserved resources
-
Basic Edition: must have more than 0 ACUs of reserved resources
-
-
A job resource group created for the cluster
-
A database account created for the cluster
Choose a mode
Two modes are available depending on your use case:
| Mode | Statement | Resource group required |
|---|---|---|
| Batch mode | INSERT OVERWRITE |
Spark engine and a job resource group |
| Real-time mode | INSERT INTO |
Spark engine, job resource group, and an interactive resource group (specified via SET command) |
Step 1: Go to SQL development
-
Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, find your cluster, and click the cluster ID.
-
In the left-side navigation pane, choose Job Development > SQL Development.
-
On the SQLConsole tab, select the Spark engine and a job resource group.
Step 2: Create a database and a C-Store table
Run the following statements in batch processing mode or interactive mode. For details, see Spark SQL execution modes.
-
Create a database:
CREATE DATABASE spark_create_adb_db_test; -
Create a C-Store table. For the full table creation syntax, see Create an internal table using Spark SQL.
CREATE TABLE spark_create_adb_db_test.test_adb_tbl ( id int, name string COMMENT 'test_name', age int ) using adb TBLPROPERTIES('primaryKey'='id,age', 'distributeType'='HASH', 'distributeColumns'='id', 'partitionType'='value', 'partitionColumn'='age', 'partitionCount'='120', 'storagePolicy'='COLD' );
Step 3: Read and write data
When a Spark SQL statement runs, the system returns only a success or failure message — no data is returned inline. To view table data, check Logs on the Application List tab of the Spark Jar Development page. For details, see View Spark application information.
Batch mode (INSERT OVERWRITE)
Usage notes
-
Spark SQL supports
SELECTandINSERTfor partitioned tables only. Non-partitioned tables are not supported. For how to create partitioned tables, see CREATE TABLE. -
UPDATEandDELETEare not supported for C-Store tables, regardless of partitioning. -
To query hot data, complete the following steps before running any query — queries fail if you skip this.
-
Use the XIHE engine to enable the hot partition configuration:
SET adb_config ELASTIC_ENABLE_HOT_PARTITION_HAS_HDD_REPLICA=true; -
Trigger a BUILD job to rebuild the index:
-
Build specific partitions (requires cluster version V3.1.6.0 or later): > Important: To view and update the minor version, go to the Configuration Information section on the Cluster Information page.
BUILD TABLE <table_name> force partitions='partition1,partition2'; -
Build the entire table (disabled by default — rebuilding all partitions is time-consuming; submit a ticket to enable):
submit a ticketBUILD TABLE <table_name> force = true;
-
-
Check the BUILD job status:
SELECT table_name, schema_name, status FROM INFORMATION_SCHEMA.KEPLER_META_BUILD_TASK ORDER BY create_time DESC LIMIT 10;Wait for the job to complete before running queries.
-
Write and read data
Write data to the C-Store table using INSERT OVERWRITE. Two partition methods are available:
-
Static partition — specify the partition value in the
PARTITIONclause:INSERT OVERWRITE spark_create_adb_db_test.test_adb_tbl partition(age=10) VALUES (1, 'bom'); -
Dynamic partition — include the partition column value in the
VALUESlist:INSERT OVERWRITE spark_create_adb_db_test.test_adb_tbl partition (age) VALUES (1, 'bom', 10);
Read data from the table:
SELECT * FROM spark_create_adb_db_test.test_adb_tbl;
Real-time mode (INSERT INTO)
Use the INSERT INTO statement to write data in real-time mode. Three connection methods are available: JDBC, View, and Catalog.
All three methods require Elastic Network Interface (ENI)-based access, which routes traffic through your cluster's VPC using a vSwitch and security group.
Catalog method
Spark parameters
Configure the following parameters to tune read and write performance.
| Parameter | Description | Default |
|---|---|---|
spark.adb.write.batchSize |
Number of records written per batch. Must be a positive integer. Applies to real-time mode only | 600 |
spark.adb.write.arrow.maxMemoryBufferSize |
Maximum memory buffer size for write operations, in MB. Must be a positive integer. Applies to batch mode only | 1024 |
spark.adb.write.arrow.maxRecordSizePerBatch |
Maximum number of records per batch write. Must be a positive integer. Applies to batch mode only | 500 |
What's next
-
Spark SQL execution modes — learn when to use batch processing mode vs. interactive mode
-
Create an internal table using Spark SQL — full C-Store table creation syntax reference
-
Create and manage a resource group — set up job and interactive resource groups