When you need to migrate millions of rows from a MySQL database into a Distributed Relational Database Service (DRDS) database, loading data directly through a JDBC program gives you full control over throughput and batch size. This guide walks you through exporting the source data to a CSV file, creating the target table with a shard key, and running a Java batch-insert program to load the data.
Reference benchmark: 8,143,801 rows inserted in ~916 seconds (~9,000 TPS) on a client with an i5 CPU, 8 GB RAM, and SSD storage, against a 4-core 4 GB DRDS instance.
This approach is suited for one-time offline imports. For ongoing incremental replication, use Data Transmission Service (DTS).
Prerequisites
Before you begin, ensure that you have:
Access to the source MySQL database with
SELECT ... INTO OUTFILEprivilegeA running DRDS instance with credentials (host, port, database name, username, and password)
Java and the MySQL Connector/J JDBC driver configured in your project
How it works
Export the source table to a CSV file using
SELECT INTO OUTFILE.Create the target table in DRDS with a shard key that matches your access patterns.
Run a Java program that reads the CSV and inserts rows in batches using
PreparedStatement.executeBatch().
Export source data
Export the source table to a CSV file on the MySQL server's local filesystem. Run the following on the MySQL client:
SELECT postingType, id, acceptedAnswer, parentId, score, tags
INTO OUTFILE '/tmp/stackoverflow.csv'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM post;The output file does not include the table schema. Create the target table in DRDS separately in the next step.
Create the target table in DRDS
Create the table in DRDS and choose a shard key. The shard key determines how DRDS distributes rows across shards. Choose a column with high cardinality and even distribution — id is a good default for this schema.
CREATE TABLE `post` (
`postingType` int NOT NULL,
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`acceptedAnswer` bigint(20) DEFAULT NULL,
`parentId` bigint(20) DEFAULT NULL,
`score` int DEFAULT NULL,
`tags` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`)
) DBPARTITION BY hash(id) ENGINE=InnoDB DEFAULT CHARSET=utf8;DBPARTITION BY hash(id) partitions the data across database shards by hashing the id column.
Import data with a Java program
Understand JDBC batch insert behavior
By default, MySQL Connector/J sends each addBatch() call as a separate INSERT statement, even when you call executeBatch(). Setting rewriteBatchedStatements=true rewrites those into a single multi-row INSERT, which significantly reduces network round trips.
Without rewriteBatchedStatements=true, the driver sends one statement per row:
INSERT INTO post (postingType, id, ...) VALUES (1, 101, ...);
INSERT INTO post (postingType, id, ...) VALUES (2, 102, ...);
INSERT INTO post (postingType, id, ...) VALUES (3, 103, ...);With rewriteBatchedStatements=true, the driver combines them:
INSERT INTO post (postingType, id, ...) VALUES (1, 101, ...), (2, 102, ...), (3, 103, ...);The following table describes the JDBC connection parameters used in this example:
| Parameter | Description | Value |
|---|---|---|
autoReconnect | Reconnects automatically after a dropped connection | true |
socketTimeout | Prevents the connection from hanging indefinitely (milliseconds) | 600000 (10 minutes) |
rewriteBatchedStatements | Rewrites addBatch() calls into a single multi-row INSERT | true |
Run the import program
Add stackoverflow.csv to your project's classpath resources directory, then run the following Java program:
import java.io.*;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.sql.*;
public class Main {
public static void main(String[] args)
throws IOException, URISyntaxException, ClassNotFoundException, SQLException {
// Load the CSV file from the classpath
URL url = Main.class.getClassLoader().getResource("stackoverflow.csv");
File dataFile = new File(url.toURI());
String sql = "INSERT INTO post (postingType, id, acceptedAnswer, parentId, score, tags) "
+ "VALUES (?, ?, ?, ?, ?, ?)";
int batchSize = 10000;
try (
Connection connection = getConnection(
"<drds-host>", // Replace with your DRDS endpoint
3306,
"<database-name>", // Replace with your database name
"<username>", // Replace with your username
"<password>" // Replace with your password
);
BufferedReader br = new BufferedReader(new FileReader(dataFile))
) {
PreparedStatement st = connection.prepareStatement(sql);
long startTime = System.currentTimeMillis();
int batchCount = 0;
String line;
while ((line = br.readLine()) != null) {
String[] data = line.split(",");
st.setInt(1, Integer.valueOf(data[0]));
st.setInt(2, Integer.valueOf(data[1]));
st.setObject(3, "".equals(data[2]) ? null : Integer.valueOf(data[2]));
st.setObject(4, "".equals(data[3]) ? null : Integer.valueOf(data[3]));
st.setObject(5, "".equals(data[4]) ? null : Integer.valueOf(data[4]));
if (data.length >= 6) {
st.setObject(6, data[5]);
}
st.addBatch();
// Flush every batchSize rows to keep memory usage bounded
if (++batchCount % batchSize == 0) {
st.executeBatch();
System.out.printf("Inserted %d rows%n", batchCount);
}
}
// Flush remaining rows
if (batchCount % batchSize != 0) {
st.executeBatch();
}
long elapsed = System.currentTimeMillis() - startTime;
System.out.printf(
"Done: %d rows in %d seconds (~%d TPS)%n",
batchCount, elapsed / 1000, batchCount / (elapsed / 1000)
);
}
}
/**
* Creates a JDBC connection to the DRDS database.
*
* @param host DRDS endpoint (for example, xxxxx.drds.aliyuncs.com)
* @param port Port number (default: 3306)
* @param database Database name
* @param username Username
* @param password Password
*/
private static Connection getConnection(
String host, int port, String database, String username, String password)
throws ClassNotFoundException, SQLException {
Class.forName("com.mysql.jdbc.Driver");
String url = String.format(
"jdbc:mysql://%s:%d/%s?autoReconnect=true&socketTimeout=600000&rewriteBatchedStatements=true",
host, port, database
);
return DriverManager.getConnection(url, username, password);
}
}Replace the following placeholders with your actual values:
| Placeholder | Description | Example |
|---|---|---|
<drds-host> | DRDS endpoint | xxxxx.drds.aliyuncs.com |
<database-name> | Target database name | mydb |
<username> | Database username | root |
<password> | Database password | — |
Tune batch size for your workload
The example uses a batch size of 10,000 rows. A larger batch size reduces the number of round trips but increases memory usage. Start with 5,000–10,000 rows per batch and adjust based on observed TPS and heap usage.
Verify the import
After the program finishes, confirm the row count on the DRDS database:
SELECT COUNT(*) FROM post;Compare the result with the row count from the source table to confirm a complete import.
What's next
For ongoing incremental replication instead of a one-time bulk import, use Data Transmission Service (DTS).