Import data to PolarDB-X programmatically using batch inserts - PolarDB

This guide shows how to write a Java program that reads data from a CSV file and imports it into PolarDB-X using batch inserts via JDBC (Java Database Connectivity).

Prerequisites

Before you begin, make sure you have:

A running PolarDB-X instance
Java Development Kit (JDK) 8 or later
mysql-connector-java-8.0.20.jar — download from Maven Central
A MySQL database with the source data you want to import

How it works

The import process follows three steps:

Export data from the source MySQL database to a CSV file.
Create the destination table in PolarDB-X.
Run a Java program that reads the CSV file and inserts rows into PolarDB-X in batches.

The example in this guide uses the following source table schema:

CREATE TABLE `sourcedb` (
    `id` int(11) NOT NULL,
    `k` int(11) NOT NULL DEFAULT '0',
    `c` char(120) NOT NULL DEFAULT '',
    `pad` char(60) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`),
    KEY `k_1` (`k`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4 dbpartition by hash(`id`);

Export data from the source database

Run the following commands to export data from a MySQL database. You can export as tab-separated text first, then convert to CSV:

mysql -h <host> -P <port> -u <username> -p<password> <db_name> -N \
  -e "SELECT id,k,c,pad FROM sourcedb;" > /home/data_1000w.txt

The values in the tab-separated file are separated by tab characters: 188092293 27267211 59775766593-64673028018-...-09474402685 01705051424-...-54211554755

To convert to CSV format, run:

mysql -h <host> -P <port> -u <username> -p<password> <db_name> -N \
  -e "SELECT id,k,c,pad FROM sourcedb;" | sed 's/\t/,/g' > /home/data_1000w.csv

CSV is the recommended format because it is easier to parse in the import program than tab-separated text. The resulting file looks like:

188092293,27267211,59775766593-64673028018-...-09474402685,01705051424-...-54211554755

Replace <host>, <port>, <username>, <password>, and <db_name> with your actual MySQL connection details.

Create the destination table in PolarDB-X

The CSV file does not contain a CREATE TABLE statement, so create the destination table in PolarDB-X before importing:

CREATE TABLE `targetdb` (
    `id` int(11) NOT NULL,
    `k` int(11) NOT NULL DEFAULT '0',
    `c` char(120) NOT NULL DEFAULT '',
    `pad` char(60) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`),
    KEY `k_1` (`k`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4 dbpartition by hash(`id`);

Import data with a Java program

The program below reads the CSV file line by line and inserts rows into PolarDB-X in batches of 1,000 using JDBC PreparedStatement.

Why batch inserts matter

By default, JDBC sends each INSERT as a separate statement:

INSERT INTO targetdb(id, k, c, pad) VALUES(1, 10, 'aaa', 'bbb');
INSERT INTO targetdb(id, k, c, pad) VALUES(2, 20, 'ccc', 'ddd');

Setting rewriteBatchedStatements=true in the JDBC URL causes the driver to rewrite batched statements into a single multi-value INSERT, which is significantly faster:

INSERT INTO targetdb(id, k, c, pad) VALUES(1, 10, 'aaa', 'bbb'), (2, 20, 'ccc', 'ddd');

Key JDBC parameters

Parameter	Effect	Recommended value
`rewriteBatchedStatements`	Rewrites batched `INSERT` calls into a single multi-value statement	`true`
`autoReconnect`	Reconnects automatically if the connection drops during a long import	`true`
`socketTimeout`	Sets the socket timeout (milliseconds) for long-running batch operations	`600000` (10 min)

Example program

Compile with: javac LoadData.java

Run with: java -cp .:mysql-connector-java-8.0.20.jar LoadData

// Requires mysql-connector-java-8.0.20.jar
// Download: https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.20/mysql-connector-java-8.0.20.jar

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;

public class LoadData {
    public static void main(String[] args) throws IOException, ClassNotFoundException, SQLException {
        File dataFile = new File("/home/data_1000w.csv");
        String sql = "INSERT INTO targetdb(id, k, c, pad) VALUES(?, ?, ?, ?)";
        int batchSize = 1000;

        try (
            Connection connection = getConnection("<host>", 3306, "<db_name>", "<username>", "<password>");
            BufferedReader br = new BufferedReader(new FileReader(dataFile))
        ) {
            PreparedStatement st = connection.prepareStatement(sql);
            long startTime = System.currentTimeMillis();
            int batchCount = 0;
            String line;

            while ((line = br.readLine()) != null) {
                String[] data = line.split(",");
                st.setInt(1, Integer.valueOf(data[0]));
                st.setInt(2, Integer.valueOf(data[1]));
                st.setObject(3, data[2]);
                st.setObject(4, data[3]);

                st.addBatch();
                if (++batchCount % batchSize == 0) {
                    st.executeBatch();
                    System.out.println(String.format("Inserted %d records", batchCount));
                }
            }

            // Flush remaining rows
            if (batchCount % batchSize != 0) {
                st.executeBatch();
            }

            long elapsed = System.currentTimeMillis() - startTime;
            System.out.println(String.format(
                "Done: %d records in %d seconds (TPS: %d)",
                batchCount, elapsed / 1000, batchCount / (elapsed / 1000)
            ));
        }
    }

    /**
     * Opens a JDBC connection to PolarDB-X.
     *
     * @param host     IP address of the PolarDB-X instance
     * @param port     Port number (default: 3306)
     * @param database Database name
     * @param username Login username
     * @param password Login password
     */
    private static Connection getConnection(String host, int port, String database,
                                            String username, String password)
            throws ClassNotFoundException, SQLException {
        Class.forName("com.mysql.cj.jdbc.Driver");
        String url = String.format(
            "jdbc:mysql://%s:%d/%s?autoReconnect=true&socketTimeout=600000&rewriteBatchedStatements=true",
            host, port, database
        );
        return DriverManager.getConnection(url, username, password);
    }
}

Replace the following placeholders with your PolarDB-X connection details:

Placeholder	Description	Example
`<host>`	IP address of the PolarDB-X instance	`192.168.1.100`
`<db_name>`	Target database name	`mydb`
`<username>`	Login username	`admin`
`<password>`	Login password	—

Optimize performance

Two adjustments have the most impact on import throughput:

Batch size: A batch size of 1,000 works well for most workloads. Increase it (for example, 5,000) if network latency is low and rows are small; decrease it if you encounter memory or timeout errors.
Multithreading: Partition the CSV file into segments and process each segment in a separate thread with its own JDBC connection. This is the most effective way to saturate PolarDB-X write capacity.