This topic describes how to use Batch Tool to export and import data.

Introduction

Batch Tool is developed by the PolarDB-X team. You can use Batch Tool to export and import data for PolarDB-X databases.

Batch Tool is developed in Java. Batch Tool uses a producer/consumer model and integrates the features of distributed databases to support one-stop data management, such as data import from files, data export to files, and data migration across MySQL, PolarDB-X 1.0, and PolarDB-X 2.0 data sources. Batch Tool also provides experimental features, such as batch update and deletion based on text files.

Download Batch Tool

Click the following link to download the JAR package of Batch Tool:

batch-tool.jar

Parameters in Batch Tool-related commands

For more information about the Batch Tool-related commands, run the following command in a CLI: java -jar batch-tool.jar --help

usage: BatchTool [-batchsize <size>] [-col <col1;col2;col3>] [-comp <NONE | GZIP>] [-con <consumer count>]
       [-config <filepath>] [-cs <charset>] [-D <database>] [-DDL <NONE | ONLY | WITH>] [-dir <directory
       path>] [-encrypt <NONE | AES | SM4>] [-error <max error count>] [-f <filepath1;filepath2>] [-F <file
       count>] [-fcon <parallelism>] [-format <NONE | TXT | CSV | XLS | XLSX>] [-func <true | false>] [-h
       <host>] [-H <filepath>] [-header <true | false>] [-help] [-i <true | false>] [-in <true | false>]
       [-initSqls <sqls>] [-key <string-type key>] [-L <line count>] [-lastSep <true | false>] [-lb <true |
       false>] [-local <true | false>] [-mask <Json format config>] [-maxConn <max connection>] [-maxWait
       <wait time(ms)>] [-minConn <min connection>] [-noEsc <true | false>] [-o <operation>] [-O <asc | desc>]
       [-OC <col1;col2;col3>] [-p <password>] [-P <port>] [-para <true | false>] [-param
       <key1=val1&key2=val2>] [-perf <true | false>] [-pre <prefix>] [-pro <producer count>] [-quote <AUTO |
       FORCE | NONE>] [-readsize <size(MB)>] [-rfonly <true | false>] [-ringsize <size (power of 2)>] [-s
       <separator char or string>] [-sharding <true | false>] [-t <tableName>] [-tps <tps limit>] [-u
       <username>] [-v] [-w <where condition>]
     -batchsize,--batchSize <size>                          Batch size of insert.
     -col,--columns <col1;col2;col3>                        Target columns for export.
     -comp,--compress <NONE | GZIP>                         Export or import compressed file (default NONE).
     -con,--consumer <consumer count>                       Configure number of consumer threads.
     -config,--configFile <filepath>                        Use yaml config file.
     -cs,--charset <charset>                                The charset of files.
     -D,--database <database>                               Database name.
     -DDL,--DDL <NONE | ONLY | WITH>                        Export or import with DDL sql mode (default NONE).
     -dir,--directory <directory path>                      Directory path including files to import.
     -encrypt,--encrypt <NONE | AES | SM4>                  Export or import with encrypted file (default NONE).
     -error,--maxError <max error count>                    Max error count threshold, program exits when the
                                                            limit is exceeded.
     -f,--file <filepath1;filepath2>                        Source file(s).
     -F,--filenum <file count>                              Fixed number of exported files.
     -fcon,--forceConsumer <parallelism>                    Configure if allow force consumer parallelism.
     -format,--fileFormat <NONE | TXT | CSV | XLS | XLSX>   File format (default NONE).
     -func,--sqlFunc <true | false>                         Use sql function to update (default false).
     -h,--host <host>                                       Host of database.
     -H,--historyFile <filepath>                            History file name.
     -header,--header <true | false>                        Whether the header line is column names (default
                                                            false).
     -help,--help                                           Help message.
     -i,--ignore <true | false>                             Flag of insert ignore and resume breakpoint (default
                                                            false).
     -in,--whereIn <true | false>                           Using where cols in (values).
     -initSqls,--initSqls <sqls>                            Connection init sqls (druid).
     -key,--secretKey <string-type key>                     Secret key used during encryption.
     -L,--line <line count>                                 Max line limit of one single export file.
     -lastSep,--withLastSep <true | false>                  Whether line ends with separator (default false).
     -lb,--loadbalance <true | false>                       Use jdbc load balance, filling the arg in $host like
                                                            'host1:port1,host2:port2' (default false).
     -local,--localMerge <true | false>                     Use local merge sort (default false).
     -mask,--mask <Json format config>                      Masking sensitive columns while exporting data.
     -maxConn,--maxConnection <max connection>              Max connection count (druid).
     -maxWait,--connMaxWait <wait time(ms)>                 Max wait time when getting a connection.
     -minConn,--minConnection <min connection>              Min connection count (druid).
     -noEsc,--noEscape <true | false>                       Do not escape value for sql (default false).
     -o,--operation <operation>                             Batch operation type: export / import / delete /
                                                            update.
     -O,--orderby <asc | desc>                              Order by type: asc / desc.
     -OC,--orderCol <col1;col2;col3>                        Ordered column names.
     -p,--password <password>                               Password of user.
     -P,--port <port>                                       Port number of database.
     -para,--paraMerge <true | false>                       Use parallel merge when doing order by export
                                                            (default false).
     -param,--connParam <key1=val1&key2=val2>               Jdbc connection params.
     -perf,--perfMode <true | false>                        Use performance mode at the sacrifice of compatibility
                                                            (default false).
     -pre,--prefix <prefix>                                 Export file name prefix.
     -pro,--producer <producer count>                       Configure number of producer threads (export /
                                                            import).
     -quote,--quoteMode <AUTO | FORCE | NONE>               The mode of how field values are enclosed by
                                                            double-quotes when exporting table (default AUTO).
     -readsize,--readSize <size(MB)>                        Read block size.
     -rfonly,--readFileOnly <true | false>                  Only read and process file, no sql execution (default
                                                            false).
     -ringsize,--ringSize <size (power of 2)>               Ring buffer size.
     -s,--sep <separator char or string>                    Separator between fields (delimiter).
     -sharding,--sharding <true | false>                    Whether enable sharding mode (default value depends on
                                                            operation).
     -t,--table <tableName>                                 Target table.
     -tps,--tpsLimit <tps limit>                            Configure of tps limit (default -1: no limit).
     -u,--user <username>                                   User for login.
     -v,--version                                           Show batch-tool version.
     -w,--where <where condition>                           Where condition: col1>99 AND col2<100 ...

Batch Tool-related commands can be classified into the following types:

  • Commands used to configure the database connection
    • Basic connection configurations, such as host, port, user, and password
    • Connection pool configurations, such as the maximum and minimum number of connections
    • Java Database Connectivity (JDBC) connection string
  • Commands used to configure batch processing
    • Basic configurations of the batch processing feature
      • Table names and file names
      • Configurations of text import, such as delimiters, whether to use a delimiter at the end, character sets, and whether to escape quotation marks
      • Export configurations, such as the number of files and the number of rows in the files
      • Import configurations, such as INSERT IGNORE and resumable upload
      • SQL conditions, such as WHERE and ORDER BY
      • Compression algorithm, encryption algorithm, and masking algorithm
      • Supported file formats, such as CSV, EXCEL, and TXT
    • Configurations of the batch processing performance
      • Parallelism of producers and consumers
      • Ring buffer size, number of batches, and the size of the read file block
      • Pre-partition settings and local merge settings
      • Transactions per second (TPS) throttling

Examples

This section provides examples on how to use Batch Tool. In the following examples, the default character set is UTF-8 and commas (,) are used as file delimiters.

  1. Export data from the customer table in the tpch database in sharding mode.
    1. Use the default settings. The number of files equals the number of table shards.
      java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s ,
    2. Export data to three files.
      java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -F 3
    3. Export data to multiple files. Each file can contain a maximum of 100,000 rows.
      java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -L 100000
    4. Add a WHERE condition and use the default settings for other configurations.
      java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -w "c_nationkey=10"
    5. Add quotations if the text contains delimiters. Use the default settings for other configurations.
      java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -quote force
  2. Import one or more CSV files into the lineitem table in the tpch database in sharding mode. The tpch database and the lineitem table have been created.
    1. Import one CSV file.
      java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t customer -s , -f "./data/lineitem.tbl"
    2. Import all CSV files in the specified folder.
      java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t customer -s , -dir "./data/lineitem/"