Common questions about writing and querying data in AnalyticDB for MySQL.
Unless a product edition is specified, questions on this page apply only to the Data Warehouse Edition.
Hudi tables
Do Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters support querying Hudi tables using JDBC?
Yes. After you create a Hudi table in an Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster, query it directly using JDBC.
Can Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters read data from Hudi tables in OSS?
Yes. Use an external table to read Hudi data from OSS. For details, see Import data using an external table in a Data Lakehouse Edition cluster.
XIHE jobs
Do Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters support automatic switchover between XIHE MPP and XIHE BSP jobs?
No. When submitting a job, manually specify whether to submit it to an Interactive resource group (XIHE Massively Parallel Processing (MPP)) or a Job resource group (XIHE BSP).
How do I choose between XIHE MPP and XIHE BSP?
XIHE BSP submits jobs asynchronously by default. The key difference is whether the client must wait for the query to finish.
Use XIHE BSP (asynchronous submission) for long-running, compute-heavy queries that produce small result sets, such as INSERT INTO SELECT, INSERT OVERWRITE SELECT, and CREATE TABLE AS SELECT.
XIHE BSP asynchronous submission has the following limits:
Result sets are capped at 10,000 rows.
Result sets, including CSV download links, are retained for up to 30 days. A maximum of 1,000 result sets can be stored at a time.
How do I check the status of a XIHE BSP job?
Submitted from Job Editor (SQL Development page): View the status on the Execution Records tab.
Submitted from another client: Query the built-in memory table:
SELECT status FROM information_schema.kepler_meta_elastic_job_list WHERE process_id='<job_id>';
To get a status summary across all XIHE BSP jobs:
SELECT status, count(*) FROM information_schema.kepler_meta_elastic_job_list GROUP BY status;Resource isolation
How do I isolate resources to reduce interference between SQL tasks?
Create separate resource groups and route SQL jobs to the appropriate group. Data Warehouse Edition clusters in elastic mode and Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters all support resource groups.
For Data Warehouse Edition resource groups, see Resource groups for Data Warehouse Edition.
For Data Lakehouse Edition resource groups, see Resource groups for Data Lakehouse Edition.
Query errors
"Query exceeded maximum time limit of 1800000.00ms"
The query exceeded the default timeout of 1,800,000 ms. Increase the timeout for a single query or for all queries in the cluster:
Single query:
/*+ QUERY_TIMEOUT=xxx */SELECT count(*) FROM t;All queries in the cluster:
SET ADB_CONFIG QUERY_TIMEOUT=xxx;
For a full list of configurable parameters, see Common configuration parameters.
"Out of Memory Pool size pre cal. available 0 require 3"
A large SQL statement is consuming excessive memory in the cluster. Wait a few minutes and run the query again.
How do I resolve the "STORAGE_INDEX_ERROR, msg:Index does not exist" error when I query data?
Cause: This error is caused by a known bug in some earlier kernel versions of AnalyticDB for MySQL. Under specific conditions, this bug can cause inconsistencies in a table's index metadata. As a result, the index cannot be found during a query, which triggers a STORAGE_INDEX_ERROR exception.
Solution: Upgrade the kernel version of your cluster. This bug is fixed in the new kernel version.
Upgrade duration: A kernel version upgrade typically takes about 30 minutes.
Service interruption: During the upgrade, your cluster experiences a transient disconnection that lasts for a few seconds.
Recommendation: We strongly recommend that you perform the upgrade during a planned maintenance window or off-peak hours. Make sure your application has an automatic reconnection mechanism to handle the transient disconnection smoothly.
"STORAGE_INDEX_ERROR, msg:Index does not exist"
This error is caused by a known bug in earlier kernel versions that can produce inconsistencies in a table's index metadata. Upgrade your cluster's kernel version to resolve it — the bug is fixed in the current version.
Keep the following in mind before upgrading:
Duration: A kernel upgrade typically takes about 30 minutes.
Service interruption: The cluster experiences a transient disconnection of a few seconds during the upgrade.
Perform the upgrade during a maintenance window or off-peak hours, and make sure your application has an automatic reconnection mechanism.
"multi-statement be found" when running multiple SQL statements consecutively
The multi-statement feature requires kernel version 3.1.9.3 or later. Check your cluster's kernel version first. If the version is earlier than 3.1.9.3, contact technical support to upgrade.
If the kernel version meets the requirement but the error persists, the feature may not be enabled on the client side. For MySQL JDBC clients, both of the following are required:
Run
SET ADB_CONFIG ALLOW_MULTI_QUERIES=true;to enable multi-statement support on the server.Set the
allowMultiQueriesJDBC connection property totrue.
"Column 'XXX' not in GROUP BY clause" when running SELECT * FROM TABLE GROUP BY KEY
SELECT * FROM table GROUP BY key is not supported. List all fields explicitly:
-- Before (unsupported)
SELECT * FROM nation GROUP BY nation.nationkey;
-- After (correct)
SELECT nation.name FROM nation GROUP BY nation.nationkey;Query behavior
Why do my query results fluctuate even though the data hasn't changed?
Two common causes:
LIMIT without ORDER BY: AnalyticDB for MySQL is a distributed database that runs queries across multiple nodes in parallel. If enough threads return rows to satisfy the LIMIT condition, the query stops early — which nodes respond first is non-deterministic, so results vary. Add an ORDER BY clause to get stable results:
-- Before (non-deterministic results) SELECT * FROM orders LIMIT 100; -- After (stable results) SELECT * FROM orders ORDER BY order_id LIMIT 100;Aggregate query with an ungrouped field: If a SELECT clause includes a field that has no aggregate function and is not in the GROUP BY clause, the database returns a random value for that field.
If neither cause applies, contact technical support.
Why does an ORDER BY query on a single table take a long time?
Data in AnalyticDB for MySQL is stored dispersedly across shards without a guaranteed sort order. An ORDER BY query must scan a large amount of data before it can sort results.
Add a clustered index on the field used in the ORDER BY clause. After adding the index, data is pre-sorted at the storage layer, so ORDER BY queries read far less data.
For the full syntax, see Add a clustered index.
A table supports only one clustered index. Remove the existing clustered index before adding a new one.
Adding a clustered index to a large table increases the time required for BUILD tasks and raises CPU utilization on storage nodes.
Why is the sum of table scan rows in the execution plan different from the total scanned rows in the query?
This discrepancy is caused by replicated tables. AnalyticDB for MySQL stores a copy of each replicated table in every shard, so scan counts are tallied once per shard, inflating the total.
Why does INSERT OVERWRITE on a table without a primary key create duplicate data?
AnalyticDB for MySQL does not automatically deduplicate tables that lack a primary key.
Why is data in the original table not overwritten after INSERT OVERWRITE SELECT?
INSERT OVERWRITE overwrites data partition by partition — it replaces existing partitions with the partitions produced by the SELECT result. If the SELECT result is empty, no new partition is produced, and the existing data is left untouched.
Make sure the SELECT statement returns at least one row so that the target partition is replaced.
Why do truncated time values appear in query results?
First, verify the results using a MySQL client. If the MySQL client shows the correct time values, the truncation is caused by post-processing in the other client you are using — check that client's result handling configuration.
IN operator
How do I handle queries with a large number of IN conditions?
The default limit is 2,000 IN conditions. Increase it as needed, up to a maximum of 5,000:
SET ADB_CONFIG max_in_items_count=3000;Exceeding 5,000 IN conditions degrades performance.
Is there a limit on IN operator values when query results are returned in JSON format?
For clusters with kernel version 3.1.4 or earlier, the IN operator is limited to 16 values in JSON result mode. Clusters with kernel version later than 3.1.4 have no limit. To check your kernel version, see How do I view the kernel version of a cluster?
Functions and encryption
How do I resolve errors with the AES_ENCRYPT function?
The first argument to AES_ENCRYPT(varbinary x, varchar y) must be of type VARBINARY. Cast string literals explicitly:
-- Before (causes an error)
SELECT CONVERT(AES_DECRYPT(AES_ENCRYPT('ABC123', 'key_string'), 'key_string'), char(10));
-- After (correct)
SELECT CONVERT(AES_DECRYPT(AES_ENCRYPT(CAST('ABC123' AS VARBINARY), 'key_string'), 'key_string'), char(10));Feature support
Can I use GZIP-compressed CSV files in OSS as an external data source?
Yes. Add compress_type=gzip to the external table definition. For the full syntax, see Non-partitioned OSS external tables.
Does AnalyticDB for MySQL support INSERT ON DUPLICATE KEY?
Yes, but only value updates are supported. Arithmetic expressions in the UPDATE clause are not.
Does AnalyticDB for MySQL support JOIN in UPDATE statements?
This feature requires kernel version 3.1.6.4 or later. For syntax details, see UPDATE.
Can I set variables in SQL?
No. AnalyticDB for MySQL does not support setting variables in SQL.
Can I use the Logstash plugin to bulk insert data with INSERT ON DUPLICATE KEY UPDATE?
Yes. When bulk inserting with INSERT ON DUPLICATE KEY UPDATE, add ON DUPLICATE KEY UPDATE only after the last VALUES() clause — not after each one.
For example, to insert three records into the student_course table:
INSERT INTO student_course(`id`, `user_id`, `nc_id`, `nc_user_id`, `nc_commodity_id`, `course_no`, `course_name`, `business_id`)
VALUES
(277943, 11056941, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--77', 'accounting'),
(277944, 11056943, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--88', 'accounting'),
(277945, 11056944, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--99', 'accounting')
ON DUPLICATE KEY UPDATE
course_name = 'Industrial Accounting Practice V9.0--77',
business_id = 'accounting';Built-in dataset
What are the prerequisites for loading a built-in dataset?
At least 24 AnalyticDB Compute Units (ACUs) of reserved storage resources.
The
user_defaultresource group must have at least 16 ACUs of reserved computing resources.
How do I check if a built-in dataset loaded successfully?
On the Job Development > SQL Development page, the dataset is loaded successfully when:
The Load Built-in Dataset button is grayed out with a
icon.The
ADB_SampleData_TPCHdatabase and its tables appear on the Databases and Tables tab.
What do I do if loading a built-in dataset fails or takes too long?
Run
DROP TABLE table_name;to delete all tables in theADB_SampleData_TPCHdatabase.Run
DROP DATABASE ADB_SampleData_TPCH;to delete the database.Load the dataset again.
How do I grant a standard database account access to the built-in dataset?
The built-in dataset follows AnalyticDB for MySQL's standard permission management. A privileged account must explicitly grant access to the standard account:
GRANT SELECT ON ADB_SampleData_TPCH.* TO <user_name>;How do I test the built-in dataset after loading?
AnalyticDB for MySQL provides sample query scripts by default. Run them from the Scripts tab on the SQL Development page. For the full list of queries, see TPC-H test dataset.
Run only SELECT queries on the ADB_SampleData_TPCH database. DDL or DML changes may corrupt the dataset loading state. If the status becomes abnormal, delete the database and reload the dataset.
How are the computing resources used by a XIHE BSP job calculated in a Data Lakehouse Edition (V3.0) cluster?
When you run a XIHE BSP job in an Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster, the resources used are divided into two parts:
AppMaster node: Occupies 1 ACU by default. This cannot be changed.
Worker nodes: Node specifications × Number of workers.
For example, with the system default configurations, a single query uses 2 normal-spec compute nodes, and the resources used are: 1 ACU + 4 ACU × 2 = 9 ACU.