How to import and export data using the Batch Tool - PolarDB

Prerequisites

Before you begin, make sure you have:

Java installed on your machine
Network access to the target PolarDB-X database

Download Batch Tool

Download the JAR package from the following link:

To view all available options, run:

java -jar batch-tool.jar --help

Parameters

All examples in this document use UTF-8 encoding and commas (,) as field separators by default.

The table below lists all parameters. Required parameters vary by operation; see the Examples section for command patterns.

Parameter	Long form	Description	Default
`-o`	`--operation`	Operation type: `export`, `import`, `delete`, or `update`	—
`-h`	`--host`	Host IP address of the database	—
`-P`	`--port`	Port number of the database	—
`-u`	`--user`	Username	—
`-p`	`--password`	Password	—
`-D`	`--database`	Database name	—
`-t`	`--table`	Target table name	—
`-s`	`--sep`	Field separator (delimiter)	—
`-f`	`--file`	Source file path(s), separated by semicolons	—
`-dir`	`--directory`	Directory containing files to import	—
`-F`	`--filenum`	Fixed number of exported files	—
`-L`	`--line`	Maximum rows per exported file	—
`-w`	`--where`	WHERE condition for filtering rows (e.g., `col1>99 AND col2<100`)	—
`-col`	`--columns`	Target columns to export, separated by semicolons	—
`-format`	`--fileFormat`	File format: `NONE`, `TXT`, `CSV`, `XLS`, or `XLSX`	`NONE`
`-comp`	`--compress`	Compression: `NONE` or `GZIP`	`NONE`
`-encrypt`	`--encrypt`	Encryption: `NONE`, `AES`, or `SM4`	`NONE`
`-key`	`--secretKey`	Secret key used for encryption	—
`-DDL`	`--DDL`	DDL mode: `NONE`, `ONLY`, or `WITH`	`NONE`
`-cs`	`--charset`	Character set of the files	—
`-header`	`--header`	Treat the first line as column names	`false`
`-quote`	`--quoteMode`	Quote mode for field values: `AUTO`, `FORCE`, or `NONE`	`AUTO`
`-lastSep`	`--withLastSep`	Whether each line ends with the separator	`false`
`-sharding`	`--sharding`	Enable sharding mode	Depends on operation
`-batchsize`	`--batchSize`	Batch size for insert operations	—
`-pro`	`--producer`	Number of producer threads	—
`-con`	`--consumer`	Number of consumer threads	—
`-fcon`	`--forceConsumer`	Force consumer parallelism	—
`-tps`	`--tpsLimit`	Transactions per second (TPS) limit (`-1` means no limit)	`-1`
`-readsize`	`--readSize`	Read block size (MB)	—
`-ringsize`	`--ringSize`	Ring buffer size (must be a power of 2)	—
`-maxConn`	`--maxConnection`	Maximum connection count (Druid pool)	—
`-minConn`	`--minConnection`	Minimum connection count (Druid pool)	—
`-maxWait`	`--connMaxWait`	Maximum wait time when acquiring a connection (ms)	—
`-initSqls`	`--initSqls`	Connection initialization SQL statements (Druid)	—
`-param`	`--connParam`	JDBC connection parameters in `key1=val1&key2=val2` format	—
`-lb`	`--loadbalance`	Enable JDBC load balance. Specify multiple hosts in `-h` as `host1:port1,host2:port2`	`false`
`-O`	`--orderby`	Sort order for export: `asc` or `desc`	—
`-OC`	`--orderCol`	Column names to sort by, separated by semicolons	—
`-para`	`--paraMerge`	Use parallel merge when exporting with ORDER BY	`false`
`-local`	`--localMerge`	Use local merge sort	`false`
`-mask`	`--mask`	JSON configuration for masking sensitive columns during export	—
`-in`	`--whereIn`	Use `WHERE col IN (values)` syntax	`false`
`-func`	`--sqlFunc`	Use SQL functions for update operations	`false`
`-noEsc`	`--noEscape`	Disable value escaping for SQL	`false`
`-i`	`--ignore`	Enable insert-ignore mode and resume from breakpoint	`false`
`-error`	`--maxError`	Maximum error count before the program exits	—
`-H`	`--historyFile`	History file path for breakpoint resumption	—
`-pre`	`--prefix`	Prefix for exported file names	—
`-rfonly`	`--readFileOnly`	Read and process files without executing SQL	`false`
`-perf`	`--perfMode`	Enable performance mode (reduces compatibility)	`false`
`-config`	`--configFile`	Path to a YAML configuration file	—
`-help`	`--help`	Help message	—
`-v`	`--version`	Print the Batch Tool version and exit	—

Examples

The following examples use UTF-8 encoding and commas (,) as field separators. Replace the placeholder values with your actual connection details.

Export data

The following examples export data from the customer table in the tpch database in sharding mode.

Export with default settings

The number of output files equals the number of table shards.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s ,

Export to a fixed number of files

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -F 3

-F 3 splits the output into exactly three files.

Limit rows per file

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -L 100000

-L 100000 sets a 100,000-row limit per file. Each file can contain 200 to 100,000 rows.

Filter rows with a WHERE condition

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -w "c_nationkey=10"

-w "c_nationkey=10" exports only rows where c_nationkey equals 10.

Force-quote field values

If field values contain the delimiter character, use -quote force to wrap all values in double quotes.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -quote force

Export selected columns

Use -col to export only the specified columns.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -col "c_custkey;c_name;c_nationkey"

-col "c_custkey;c_name;c_nationkey" exports only those three columns, separated by semicolons.

Export with GZIP compression

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -comp GZIP

-comp GZIP compresses each output file. Use this when exporting large tables to reduce disk usage.

Export with DDL

Use -DDL WITH to include the CREATE TABLE statement alongside the data.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -DDL WITH

To export the DDL only (no data), use -DDL ONLY.

Export with ORDER BY

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -O asc -OC "c_custkey"

-O asc sets the sort direction to ascending.
-OC "c_custkey" specifies the column to sort by.

For large tables, add -para true to use parallel merge sort, which improves export speed.

Import data

The following examples import data into the lineitem table in the tpch database in sharding mode.

Create the target table before importing. The table structure must match the structure of the source file.

Import a single file

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -f "./data/lineitem.csv"

-f "./data/lineitem.csv" specifies the source CSV file path.

Import all files in a directory

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -dir "./data/lineitem/"

-dir "./data/lineitem/" imports all files in the specified directory. Each file name must start with the target table name (e.g., lineitem0_1).

Import with breakpoint resumption

If an import is interrupted, use -i true and -H to resume from where it stopped.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -f "./data/lineitem.csv" -i true -H "./lineitem_history.txt"

-i true enables insert-ignore mode and breakpoint resumption.
-H "./lineitem_history.txt" specifies the history file that tracks progress.

Tune import throughput

For large imports, adjust concurrency and batch size to improve performance.

java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -dir "./data/lineitem/" -pro 4 -con 8 -batchsize 500

-pro 4 sets four producer threads to read data from files.
-con 8 sets eight consumer threads to write data to the database.
-batchsize 500 inserts 500 rows per batch.

Start with these values and adjust based on your database capacity and observed throughput.