All Products
Search
Document Center

AnalyticDB:Use Spark SQL to read from and write to C-Store tables

Last Updated:Mar 30, 2026

Use Spark SQL in an AnalyticDB for MySQL cluster to read from and write to C-Store tables (XUANWU engine) in batch or real-time mode.

Prerequisites

Before you begin, make sure you have:

  • An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster:

    • Data Lakehouse Edition: must have more than 0 ACUs of reserved storage resources

    • Enterprise Edition: must have more than 0 ACUs of reserved resources

    • Basic Edition: must have more than 0 ACUs of reserved resources

  • A job resource group created for the cluster

  • A database account created for the cluster

Choose a mode

Two modes are available depending on your use case:

Mode Statement Resource group required
Batch mode INSERT OVERWRITE Spark engine and a job resource group
Real-time mode INSERT INTO Spark engine, job resource group, and an interactive resource group (specified via SET command)

Step 1: Go to SQL development

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, find your cluster, and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and a job resource group.

Step 2: Create a database and a C-Store table

Run the following statements in batch processing mode or interactive mode. For details, see Spark SQL execution modes.
  1. Create a database:

    CREATE DATABASE spark_create_adb_db_test;
  2. Create a C-Store table. For the full table creation syntax, see Create an internal table using Spark SQL.

    CREATE TABLE spark_create_adb_db_test.test_adb_tbl (
      id int,
      name string COMMENT 'test_name',
      age int
    ) using adb TBLPROPERTIES('primaryKey'='id,age',
                    'distributeType'='HASH',
                    'distributeColumns'='id',
                    'partitionType'='value',
                    'partitionColumn'='age',
                    'partitionCount'='120',
                    'storagePolicy'='COLD'
                   );

Step 3: Read and write data

When a Spark SQL statement runs, the system returns only a success or failure message — no data is returned inline. To view table data, check Logs on the Application List tab of the Spark Jar Development page. For details, see View Spark application information.

Batch mode (INSERT OVERWRITE)

Usage notes

  • Spark SQL supports SELECT and INSERT for partitioned tables only. Non-partitioned tables are not supported. For how to create partitioned tables, see CREATE TABLE.

  • UPDATE and DELETE are not supported for C-Store tables, regardless of partitioning.

  • To query hot data, complete the following steps before running any query — queries fail if you skip this.

    1. Use the XIHE engine to enable the hot partition configuration:

      SET adb_config ELASTIC_ENABLE_HOT_PARTITION_HAS_HDD_REPLICA=true;
    2. Trigger a BUILD job to rebuild the index:

      • Build specific partitions (requires cluster version V3.1.6.0 or later): > Important: To view and update the minor version, go to the Configuration Information section on the Cluster Information page.

        BUILD TABLE <table_name> force partitions='partition1,partition2';
      • Build the entire table (disabled by default — rebuilding all partitions is time-consuming; submit a ticket to enable):

        BUILD TABLE <table_name> force = true;
        submit a ticket
    3. Check the BUILD job status:

      SELECT table_name, schema_name, status FROM INFORMATION_SCHEMA.KEPLER_META_BUILD_TASK ORDER BY create_time DESC LIMIT 10;

      Wait for the job to complete before running queries.

Write and read data

Write data to the C-Store table using INSERT OVERWRITE. Two partition methods are available:

  • Static partition — specify the partition value in the PARTITION clause:

    INSERT OVERWRITE spark_create_adb_db_test.test_adb_tbl partition(age=10) VALUES (1, 'bom');
  • Dynamic partition — include the partition column value in the VALUES list:

    INSERT OVERWRITE spark_create_adb_db_test.test_adb_tbl partition (age) VALUES (1, 'bom', 10);

Read data from the table:

SELECT * FROM spark_create_adb_db_test.test_adb_tbl;

Real-time mode (INSERT INTO)

Use the INSERT INTO statement to write data in real-time mode. Three connection methods are available: JDBC, View, and Catalog.

All three methods require Elastic Network Interface (ENI)-based access, which routes traffic through your cluster's VPC using a vSwitch and security group.

JDBC method

Configure connection parameters, then write and read directly using SQL statements.

-- Connection credentials and internal endpoint
conf spark.adb.username=<database-account>;
conf spark.adb.password=<password>;
conf spark.adb.endpoint=<cluster-internal-endpoint>:3306;
-- Enable ENI-based access
SET spark.adb.eni.enabled=true;
SET spark.adb.eni.vswitchId=<vswitch-id>;
SET spark.adb.eni.securityGroupId=<security-group-id>;
-- Use JDBC mode for real-time writes
SET spark.adb.useJdbc=true;
-- Set the interactive resource group for SQL execution
SET spark.adb.resourceGroup=user_default;
-- Write data
INSERT INTO spark_create_adb_db_test.test_adb_tbl VALUES (1, 'adb', 20);
-- Read data
SELECT * FROM spark_create_adb_db_test.test_adb_tbl;
Parameter Description
spark.adb.username Database account for the cluster
spark.adb.password Password for the database account
spark.adb.endpoint Internal endpoint and port of the cluster, in the format <endpoint>:3306
spark.adb.eni.enabled Set to true to enable ENI-based access
spark.adb.eni.vswitchId ID of the vSwitch that the cluster belongs to
spark.adb.eni.securityGroupId ID of the security group for the cluster. The security group must be in the same VPC as the cluster
spark.adb.useJdbc Whether to use JDBC mode. Set to true for real-time writes. Default: false
spark.adb.resourceGroup Interactive resource group for SQL execution. The user_default group is created automatically when you create a cluster. To create a new resource group, see Create and manage a resource group

View method

Create a temporary view that maps to the C-Store table, then use standard SQL to read and write through the view.

-- Enable ENI-based access
SET spark.adb.eni.enabled=true;
SET spark.adb.eni.vswitchId=<vswitch-id>;
SET spark.adb.eni.securityGroupId=<security-group-id>;
-- Create a temporary view mapped to the target table
CREATE TEMPORARY VIEW table_tmp
USING org.apache.spark.sql.jdbc
OPTIONS (
  url 'jdbc:mysql://<cluster-internal-endpoint>:3306/spark_create_adb_db_test?useServerPrepStmts=false&rewriteBatchedStatements=true',
  dbtable 'spark_create_adb_db_test.test_adb_tbl',
  user '<database-account>',
  password '<password>'
);
-- Write data
INSERT INTO table_tmp VALUES (1, 'adb', 20);
-- Read data
SELECT * FROM table_tmp;
Parameter Description
spark.adb.eni.enabled Set to true to enable ENI-based access
spark.adb.eni.vswitchId ID of the vSwitch that the cluster belongs to
spark.adb.eni.securityGroupId ID of the security group for the cluster. Must be in the same VPC as the cluster
table_tmp Name of the temporary view. You can use any name
USING org.apache.spark.sql.jdbc Fixed value — specifies the JDBC data source
url JDBC connection string in the format jdbc:mysql://<endpoint>:3306/<db_name>?useServerPrepStmts=false&rewriteBatchedStatements=true. The useServerPrepStmts=false&rewriteBatchedStatements=true parameters are required for batch writes and improve write performance
dbtable Target table in the format <db_name>.<table_name>
user Database account for the cluster
password Password for the database account

Catalog method

Register the cluster as a named JDBC catalog in Spark, then query tables using the catalog prefix.

-- Enable ENI-based access
SET spark.adb.eni.enabled=true;
SET spark.adb.eni.vswitchId=<vswitch-id>;
SET spark.adb.eni.securityGroupId=<security-group-id>;
-- Register the JDBC catalog (catalog name "jdbc" is used as an example — you can customize it)
SET spark.sql.catalog.jdbc=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog;
SET spark.sql.catalog.jdbc.url=jdbc:mysql://<cluster-internal-endpoint>:3306/?useServerPrepStmts=false&rewriteBatchedStatements=true;
SET spark.sql.catalog.jdbc.user=<database-account>;
SET spark.sql.catalog.jdbc.password=<password>;
-- Switch to the catalog and query
use jdbc;
SELECT * FROM spark_create_adb_db_test.test_adb_tbl;
Parameter Description
spark.sql.catalog.jdbc Fixed value: org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog. The jdbc part of the parameter name is the catalog name and can be customized
spark.sql.catalog.jdbc.url JDBC URL in the format jdbc:mysql://<endpoint>:3306/?useServerPrepStmts=false&rewriteBatchedStatements=true
spark.sql.catalog.jdbc.user Database account for the cluster
spark.sql.catalog.jdbc.password Password for the database account

Spark parameters

Configure the following parameters to tune read and write performance.

Parameter Description Default
spark.adb.write.batchSize Number of records written per batch. Must be a positive integer. Applies to real-time mode only 600
spark.adb.write.arrow.maxMemoryBufferSize Maximum memory buffer size for write operations, in MB. Must be a positive integer. Applies to batch mode only 1024
spark.adb.write.arrow.maxRecordSizePerBatch Maximum number of records per batch write. Must be a positive integer. Applies to batch mode only 500

What's next