All Products
Search
Document Center

PolarDB:Import and export data using the Batch Tool

Last Updated:Mar 30, 2026

Batch Tool is a Java-based CLI utility developed by the PolarDB-X team for bulk data operations on PolarDB-X databases. Built on a producer/consumer model, it supports importing data from files, exporting data to files, and migrating data across MySQL, PolarDB-X 1.0, and PolarDB-X 2.0 data sources.

Prerequisites

Before you begin, make sure you have:

  • Java installed on your machine

  • Network access to the target PolarDB-X database

Download Batch Tool

Download the JAR package from the following link:

batch-tool.jar

To view all available options, run:

java -jar batch-tool.jar --help

Parameters

All examples in this document use UTF-8 encoding and commas (,) as field separators by default.

The table below lists all parameters. Required parameters vary by operation; see the Examples section for command patterns.

Parameter Long form Description Default
-o --operation Operation type: export, import, delete, or update
-h --host Host IP address of the database
-P --port Port number of the database
-u --user Username
-p --password Password
-D --database Database name
-t --table Target table name
-s --sep Field separator (delimiter)
-f --file Source file path(s), separated by semicolons
-dir --directory Directory containing files to import
-F --filenum Fixed number of exported files
-L --line Maximum rows per exported file
-w --where WHERE condition for filtering rows (e.g., col1>99 AND col2<100)
-col --columns Target columns to export, separated by semicolons
-format --fileFormat File format: NONE, TXT, CSV, XLS, or XLSX NONE
-comp --compress Compression: NONE or GZIP NONE
-encrypt --encrypt Encryption: NONE, AES, or SM4 NONE
-key --secretKey Secret key used for encryption
-DDL --DDL DDL mode: NONE, ONLY, or WITH NONE
-cs --charset Character set of the files
-header --header Treat the first line as column names false
-quote --quoteMode Quote mode for field values: AUTO, FORCE, or NONE AUTO
-lastSep --withLastSep Whether each line ends with the separator false
-sharding --sharding Enable sharding mode Depends on operation
-batchsize --batchSize Batch size for insert operations
-pro --producer Number of producer threads
-con --consumer Number of consumer threads
-fcon --forceConsumer Force consumer parallelism
-tps --tpsLimit Transactions per second (TPS) limit (-1 means no limit) -1
-readsize --readSize Read block size (MB)
-ringsize --ringSize Ring buffer size (must be a power of 2)
-maxConn --maxConnection Maximum connection count (Druid pool)
-minConn --minConnection Minimum connection count (Druid pool)
-maxWait --connMaxWait Maximum wait time when acquiring a connection (ms)
-initSqls --initSqls Connection initialization SQL statements (Druid)
-param --connParam JDBC connection parameters in key1=val1&key2=val2 format
-lb --loadbalance Enable JDBC load balance. Specify multiple hosts in -h as host1:port1,host2:port2 false
-O --orderby Sort order for export: asc or desc
-OC --orderCol Column names to sort by, separated by semicolons
-para --paraMerge Use parallel merge when exporting with ORDER BY false
-local --localMerge Use local merge sort false
-mask --mask JSON configuration for masking sensitive columns during export
-in --whereIn Use WHERE col IN (values) syntax false
-func --sqlFunc Use SQL functions for update operations false
-noEsc --noEscape Disable value escaping for SQL false
-i --ignore Enable insert-ignore mode and resume from breakpoint false
-error --maxError Maximum error count before the program exits
-H --historyFile History file path for breakpoint resumption
-pre --prefix Prefix for exported file names
-rfonly --readFileOnly Read and process files without executing SQL false
-perf --perfMode Enable performance mode (reduces compatibility) false
-config --configFile Path to a YAML configuration file
-help --help Help message
-v --version Print the Batch Tool version and exit

Examples

The following examples use UTF-8 encoding and commas (,) as field separators. Replace the placeholder values with your actual connection details.

Export data

The following examples export data from the customer table in the tpch database in sharding mode.

Export with default settings

The number of output files equals the number of table shards.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s ,

Export to a fixed number of files

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -F 3

-F 3 splits the output into exactly three files.

Limit rows per file

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -L 100000

-L 100000 sets a 100,000-row limit per file. Each file can contain 200 to 100,000 rows.

Filter rows with a WHERE condition

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -w "c_nationkey=10"

-w "c_nationkey=10" exports only rows where c_nationkey equals 10.

Force-quote field values

If field values contain the delimiter character, use -quote force to wrap all values in double quotes.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -quote force

Export selected columns

Use -col to export only the specified columns.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -col "c_custkey;c_name;c_nationkey"

-col "c_custkey;c_name;c_nationkey" exports only those three columns, separated by semicolons.

Export with GZIP compression

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -comp GZIP

-comp GZIP compresses each output file. Use this when exporting large tables to reduce disk usage.

Export with DDL

Use -DDL WITH to include the CREATE TABLE statement alongside the data.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -DDL WITH

To export the DDL only (no data), use -DDL ONLY.

Export with ORDER BY

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -O asc -OC "c_custkey"
  • -O asc sets the sort direction to ascending.

  • -OC "c_custkey" specifies the column to sort by.

For large tables, add -para true to use parallel merge sort, which improves export speed.

Import data

The following examples import data into the lineitem table in the tpch database in sharding mode.

Create the target table before importing. The table structure must match the structure of the source file.

Import a single file

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -f "./data/lineitem.csv"

-f "./data/lineitem.csv" specifies the source CSV file path.

Import all files in a directory

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -dir "./data/lineitem/"

-dir "./data/lineitem/" imports all files in the specified directory. Each file name must start with the target table name (e.g., lineitem0_1).

Import with breakpoint resumption

If an import is interrupted, use -i true and -H to resume from where it stopped.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -f "./data/lineitem.csv" -i true -H "./lineitem_history.txt"
  • -i true enables insert-ignore mode and breakpoint resumption.

  • -H "./lineitem_history.txt" specifies the history file that tracks progress.

Tune import throughput

For large imports, adjust concurrency and batch size to improve performance.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -dir "./data/lineitem/" -pro 4 -con 8 -batchsize 500
  • -pro 4 sets four producer threads to read data from files.

  • -con 8 sets eight consumer threads to write data to the database.

  • -batchsize 500 inserts 500 rows per batch.

Start with these values and adjust based on your database capacity and observed throughput.