High-frequency data writes—such as logging pipelines or IoT event streams—create two problems when each write is a separate import: each write incurs its own transaction overhead (SQL parsing, plan generation), and each write creates a new table version, accelerating compaction pressure in the background. The group commit feature solves both by merging multiple INSERT INTO VALUES, Stream Load, or HTTP Stream imports server-side into a single internal transaction, reducing I/O overhead and improving write throughput without requiring any client-side batching logic.
How it works
Group commit is not a separate import method. It intercepts qualifying INSERT INTO VALUES, Stream Load, and HTTP Stream requests and batches them into one internal commit. Auto commit is triggered when either threshold is reached:
The commit interval elapses (default: 10 seconds)
The accumulated data size reaches the limit (default: 64 MB)
Three modes control this behavior:
| Mode | Behavior | Use when |
|---|---|---|
off_mode | Group commit disabled. INSERT INTO VALUES, Stream Load, and HTTP Stream behave as normal. | Group commit is not needed |
sync_mode | Batches multiple imports in one transaction based on load and the group_commit_interval table property. Returns after the transaction commits. Data is visible immediately. | High-concurrency writes that require immediate data visibility |
async_mode | Writes data to write-ahead logging (WAL) logs first, then returns immediately. Commits asynchronously based on load and group_commit_interval. Data is visible after the commit. Automatically switches to sync_mode when large data volumes are detected. | High-frequency writes where low write latency is the priority |
Choosing between sync and async mode:
Use
sync_modefor high-concurrency scenarios where data must be visible immediately after import. It blocks until the transaction commits, so the returned result confirms data is persisted and immediately queryable.Use
async_modewhen write latency is the highest priority. The server acknowledges the import as soon as data is written to WAL, without waiting for the internal commit to complete. If the internal commit fails, WAL logs are used to recover the data. Data is not visible immediately after the import returns.
Create the example table
The examples in this topic use the following table:
CREATE TABLE `dt` (
`id` int(11) NOT NULL,
`name` varchar(50) NULL,
`score` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 1;Import data using JDBC
ApsaraDB for SelectDB supports the prepared statement feature of MySQL over Java Database Connectivity (JDBC). When you use prepared statements, SQL statements and their import plans are cached in session-level memory, reducing CPU overhead on repeated inserts.
Add the MySQL connector dependency to your project:
<dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.49</version> </dependency>Build the JDBC URL with server-side prepared statements enabled:
jdbc:mysql://selectdb-cn-****.selectdbfe.rds.aliyuncs.com:9030/db?useServerPrepStmts=trueSet the
group_commitsession variable. Use either approach:Append it to the JDBC URL: ``
jdbc:mysql://selectdb-cn-****.selectdbfe.rds.aliyuncs.com:9030/db?useServerPrepStmts=true&sessionVariables=group_commit=async_mode``Execute a SQL statement at connection time: ``
java try (Statement statement = conn.createStatement()) { statement.execute("SET group_commit = async_mode;"); }``
Use a prepared statement to insert rows:
private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver"; private static final String URL_PATTERN = "jdbc:mysql://%s:%d/%s?useServerPrepStmts=true"; private static final String HOST = "selectdb-cn-****.selectdbfe.rds.aliyuncs.com"; private static final int PORT = 9030; private static final String DB = "db"; private static final String TBL = "dt"; private static final String USER = "admin"; private static final String PASSWD = "***"; private static final int INSERT_BATCH_SIZE = 10; public static void main(String[] args) { groupCommitInsert(); //groupCommitInsertBatch } private static void groupCommitInsert() throws Exception { Class.forName(JDBC_DRIVER); try (Connection conn = DriverManager.getConnection(String.format(URL_PATTERN, HOST, PORT, DB), USER, PASSWD)) { // set session variable 'group_commit' try (Statement statement = conn.createStatement()) { statement.execute("SET group_commit = async_mode;"); } String query = "INSERT INTO " + TBL + " VALUES(?, ?, ?)"; try (PreparedStatement stmt = conn.prepareStatement(query)) { for (int i = 0; i < INSERT_BATCH_SIZE; i++) { stmt.setInt(1, i); stmt.setString(2, "name" + i); stmt.setInt(3, i + 10); int result = stmt.executeUpdate(); System.out.println("rows: " + result); } } } catch (Exception e) { e.printStackTrace(); } } private static void groupCommitInsertBatch() throws Exception { Class.forName(JDBC_DRIVER); // add rewriteBatchedStatements=true and cachePrepStmts=true in JDBC url // set session variables by sessionVariables=group_commit=async_mode in JDBC url try (Connection conn = DriverManager.getConnection( String.format(URL_PATTERN + "&rewriteBatchedStatements=true&cachePrepStmts=true&sessionVariables=group_commit=async_mode", HOST, PORT, DB), USER, PASSWD)) { String query = "INSERT INTO " + TBL + " VALUES(?, ?, ?)"; try (PreparedStatement stmt = conn.prepareStatement(query)) { for (int j = 0; j < 5; j++) { // 10 rows per insert for (int i = 0; i < INSERT_BATCH_SIZE; i++) { stmt.setInt(1, i); stmt.setString(2, "name" + i); stmt.setInt(3, i + 10); stmt.addBatch(); } int[] result = stmt.executeBatch(); } } } catch (Exception e) { e.printStackTrace(); } }
Import data using INSERT INTO
Enable group commit by setting the group_commit session variable before running INSERT INTO statements.
Asynchronous mode — data is batched and committed in the background:
-- Enable group commit in async mode. The default value is off_mode.
mysql> SET group_commit = async_mode;
-- The returned label starts with "group_commit", confirming group commit is active.
mysql> INSERT INTO dt VALUES(1, 'Bob', 90), (2, 'Alice', 99);
Query OK, 2 rows affected (0.05 sec)
{'label':'group_commit_a145ce07f1c972fc-bd2c54597052a9ad', 'status':'PREPARE', 'txnId':'181508'}
-- Consecutive inserts sharing the same label and txnId are batched into one import job.
mysql> INSERT INTO dt(id, name) VALUES(3, 'John');
Query OK, 1 row affected (0.01 sec)
{'label':'group_commit_a145ce07f1c972fc-bd2c54597052a9ad', 'status':'PREPARE', 'txnId':'181508'}
-- Data is not visible immediately after the import returns.
mysql> SELECT * FROM dt;
Empty SET (0.01 sec)
-- After ~10 seconds (controlled by group_commit_interval), data becomes visible.
mysql> SELECT * FROM dt;
+------+-------+-------+
| id | name | score |
+------+-------+-------+
| 1 | Bob | 90 |
| 2 | Alice | 99 |
| 3 | John | NULL |
+------+-------+-------+
3 rows in set (0.02 sec)Synchronous mode — returns only after the transaction commits; data is immediately visible:
-- Enable group commit in sync mode.
mysql> SET group_commit = sync_mode;
-- The commit interval is controlled by group_commit_interval. The call blocks until the transaction commits.
mysql> INSERT INTO dt VALUES(4, 'Bob', 90), (5, 'Alice', 99);
Query OK, 2 rows affected (10.06 sec)
{'label':'group_commit_d84ab96c09b60587_ec455a33cb0e9e87', 'status':'PREPARE', 'txnId':'3007', 'query_id':'fc6b94085d704a94-a69bfc9a202e66e2'}
-- Data is visible immediately.
mysql> SELECT * FROM dt;
+------+-------+-------+
| id | name | score |
+------+-------+-------+
| 1 | Bob | 90 |
| 2 | Alice | 99 |
| 3 | John | NULL |
| 4 | Bob | 90 |
| 5 | Alice | 99 |
+------+-------+-------+
5 rows in set (0.03 sec)Disable group commit:
mysql> SET group_commit = off_mode;Import data using Stream Load
For logging or HTTP-based pipelines, pass the group_commit header to activate group commit on Stream Load requests. For more information about Stream Load, see Stream Load.
Create a file named
data.csv:6,Amy,60 7,Ross,98Run the import with the appropriate mode header: Asynchronous mode:
# Pass group_commit:async_mode as a request header. curl --location-trusted -u {user}:{passwd} -T data.csv \ -H "group_commit:async_mode" \ -H "column_separator:," \ http://{selectdbHost}:{selectdbHttpPort}/api/db/dt/_stream_loadExpected response:
{ "TxnId": 7009, "Label": "group_commit_c84d2099208436ab_96e33fda01eddba8", "Comment": "", "GroupCommit": true, "Status": "Success", "Message": "OK", "NumberTotalRows": 2, "NumberLoadedRows": 2, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 19, "LoadTimeMs": 35, "StreamLoadPutTimeMs": 5, "ReadDataTimeMs": 0, "WriteDataTimeMs": 26 }Synchronous mode:
# Pass group_commit:sync_mode as a request header. curl --location-trusted -u {user}:{passwd} -T data.csv \ -H "group_commit:sync_mode" \ -H "column_separator:," \ http://{selectdbHost}:{selectdbHttpPort}/api/db/dt/_stream_loadExpected response:
{ "TxnId": 3009, "Label": "group_commit_d941bf17f6efcc80_ccf4afdde9881293", "Comment": "", "GroupCommit": true, "Status": "Success", "Message": "OK", "NumberTotalRows": 2, "NumberLoadedRows": 2, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 19, "LoadTimeMs": 10044, "StreamLoadPutTimeMs": 4, "ReadDataTimeMs": 0, "WriteDataTimeMs": 10038 }"GroupCommit": truein the response confirms that group commit is active. The label always starts withgroup_commit.
Configure auto commit thresholds
Adjust the commit interval or data size thresholds per table using ALTER TABLE.
Commit interval
The default commit interval is 10 seconds.
-- Change the commit interval to 2 seconds.
ALTER TABLE dt SET ("group_commit_interval_ms" = "2000");Trade-offs:
| Setting | Pros | Cons |
|---|---|---|
| Shorter interval (e.g., 2 seconds) | Lower data visibility latency | More frequent commits, faster version growth, higher background compaction pressure |
| Longer interval (e.g., 30 seconds) | Larger commit batches, lower system overhead | Higher data visibility latency |
Set the interval based on how much latency your application can tolerate between a write and when the data becomes queryable. If your system is under high compaction pressure, increase the interval.
Data size threshold
The default data size threshold for auto commit is 64 MB.
-- Change the data size threshold to 128 MB.
ALTER TABLE dt SET ("group_commit_data_bytes" = "134217728");Limitations
INSERT INTO VALUES degradation
When group commit is enabled, the following INSERT INTO VALUES statements are automatically downgraded to non-group-commit mode:
Written inside an explicit transaction:
BEGIN;INSERT INTO VALUES;COMMITA label is specified:
INSERT INTO dt WITH LABEL {label} VALUESVALUES contains an expression: e.g.,
INSERT INTO dt VALUES (1 + 100)Written using column update
The target table does not support light schema changes
Stream Load and HTTP Stream degradation
The following Stream Load and HTTP Stream jobs are automatically downgraded to non-group-commit mode:
A label is specified using
-H "label:my_label"Two-phase commit (2PC) mode is used
Written using column update
The target table does not support light schema changes
Unique Key model
Group commit does not guarantee commit order with the Unique Key model. To ensure data consistency, use group commit together with a sequence column.
max_filter_ratio support
In standard import mode, filter_ratio determines whether to commit based on failed rows versus total rows. In group commit mode, imports from multiple clients are merged into one internal import and committed once as a unit.
Group commit partially supports max_filter_ratio semantics: the semantics take effect only when the total number of imported rows does not exceed the value of the group_commit_memory_rows_for_max_filter_ratio backend (BE) configuration item. The default value is 10000.
WAL behavior in async mode
In async_mode, each import is first written to WAL logs:
If the internal commit succeeds, WAL logs are deleted immediately.
If the internal commit fails, WAL logs are used to recover the data.
The system automatically switches from async_mode to sync_mode in the following situations to protect disk space:
Imported data occupies more than 80% of a single WAL directory.
A chunked Stream Load job is submitted with an unknown total data size.
The data amount is small, but available disk space is insufficient.
Schema changes
If a heavy schema change is in its final metadata modification phase, the system rejects new group commits to ensure WAL logs remain compatible with the table schema. Affected clients receive the exception:
insert table ${table_name} is blocked on schema changeRetry the import on the client when this exception occurs.
Light schema changes (add or delete columns, change VARCHAR length, rename columns) do not block group commits. All other schema changes are heavy schema changes.