本文介紹了通過Batch Tool工具匯入匯出資料的方法。
工具介紹
Batch Tool工具是PolarDB-X團隊開發的專門為PolarDB-X資料庫提供資料匯入匯出服務的工具,其採用Java語言實現,基於生產者和消費者模型並結合分散式資料庫特點,實現了一站式且高效地從檔案匯入、匯出到檔案以及跨庫的離線資料移轉(MySQL、PolarDB-X 1.0、PolarDB-X 2.0)等功能。
工具擷取
參數介紹
上傳工具包後,執行如下命令,可查看工具包全部參數說明:
java -jar batch-tool.jar --help結果如下:
usage: BatchTool [-batchsize <size>] [-col <col1;col2;col3>] [-comp <NONE | GZIP>] [-con <consumer count>]
[-config <filepath>] [-cs <charset>] [-D <database>] [-DDL <NONE | ONLY | WITH>] [-dir <directory
path>] [-encrypt <NONE | AES | SM4>] [-error <max error count>] [-f <filepath1;filepath2>] [-F <file
count>] [-fcon <parallelism>] [-format <NONE | TXT | CSV | XLS | XLSX>] [-func <true | false>] [-h
<host>] [-H <filepath>] [-header <true | false>] [-help] [-i <true | false>] [-in <true | false>]
[-initSqls <sqls>] [-key <string-type key>] [-L <line count>] [-lastSep <true | false>] [-lb <true |
false>] [-local <true | false>] [-mask <Json format config>] [-maxConn <max connection>] [-maxWait
<wait time(ms)>] [-minConn <min connection>] [-noEsc <true | false>] [-o <operation>] [-O <asc | desc>]
[-OC <col1;col2;col3>] [-p <password>] [-P <port>] [-para <true | false>] [-param
<key1=val1&key2=val2>] [-perf <true | false>] [-pre <prefix>] [-pro <producer count>] [-quote <AUTO |
FORCE | NONE>] [-readsize <size(MB)>] [-rfonly <true | false>] [-ringsize <size (power of 2)>] [-s
<separator char or string>] [-sharding <true | false>] [-t <tableName>] [-tps <tps limit>] [-u
<username>] [-v] [-w <where condition>]
-batchsize,--batchSize <size> Batch size of insert.
-col,--columns <col1;col2;col3> Target columns for export.
-comp,--compress <NONE | GZIP> Export or import compressed file (default NONE).
-con,--consumer <consumer count> Configure number of consumer threads.
-config,--configFile <filepath> Use yaml config file.
-cs,--charset <charset> The charset of files.
-D,--database <database> Database name.
-DDL,--DDL <NONE | ONLY | WITH> Export or import with DDL sql mode (default NONE).
-dir,--directory <directory path> Directory path including files to import.
-encrypt,--encrypt <NONE | AES | SM4> Export or import with encrypted file (default NONE).
-error,--maxError <max error count> Max error count threshold, program exits when the
limit is exceeded.
-f,--file <filepath1;filepath2> Source file(s).
-F,--filenum <file count> Fixed number of exported files.
-fcon,--forceConsumer <parallelism> Configure if allow force consumer parallelism.
-format,--fileFormat <NONE | TXT | CSV | XLS | XLSX> File format (default NONE).
-func,--sqlFunc <true | false> Use sql function to update (default false).
-h,--host <host> Host of database.
-H,--historyFile <filepath> History file name.
-header,--header <true | false> Whether the header line is column names (default
false).
-help,--help Help message.
-i,--ignore <true | false> Flag of insert ignore and resume breakpoint (default
false).
-in,--whereIn <true | false> Using where cols in (values).
-initSqls,--initSqls <sqls> Connection init sqls (druid).
-key,--secretKey <string-type key> Secret key used during encryption.
-L,--line <line count> Max line limit of one single export file.
-lastSep,--withLastSep <true | false> Whether line ends with separator (default false).
-lb,--loadbalance <true | false> Use jdbc load balance, filling the arg in $host like
'host1:port1,host2:port2' (default false).
-local,--localMerge <true | false> Use local merge sort (default false).
-mask,--mask <Json format config> Masking sensitive columns while exporting data.
-maxConn,--maxConnection <max connection> Max connection count (druid).
-maxWait,--connMaxWait <wait time(ms)> Max wait time when getting a connection.
-minConn,--minConnection <min connection> Min connection count (druid).
-noEsc,--noEscape <true | false> Do not escape value for sql (default false).
-o,--operation <operation> Batch operation type: export / import / delete /
update.
-O,--orderby <asc | desc> Order by type: asc / desc.
-OC,--orderCol <col1;col2;col3> Ordered column names.
-p,--password <password> Password of user.
-P,--port <port> Port number of database.
-para,--paraMerge <true | false> Use parallel merge when doing order by export
(default false).
-param,--connParam <key1=val1&key2=val2> Jdbc connection params.
-perf,--perfMode <true | false> Use performance mode at the sacrifice of compatibility
(default false).
-pre,--prefix <prefix> Export file name prefix.
-pro,--producer <producer count> Configure number of producer threads (export /
import).
-quote,--quoteMode <AUTO | FORCE | NONE> The mode of how field values are enclosed by
double-quotes when exporting table (default AUTO).
-readsize,--readSize <size(MB)> Read block size.
-rfonly,--readFileOnly <true | false> Only read and process file, no sql execution (default
false).
-ringsize,--ringSize <size (power of 2)> Ring buffer size.
-s,--sep <separator char or string> Separator between fields (delimiter).
-sharding,--sharding <true | false> Whether enable sharding mode (default value depends on
operation).
-t,--table <tableName> Target table.
-tps,--tpsLimit <tps limit> Configure of tps limit (default -1: no limit).
-u,--user <username> User for login.
-v,--version Show batch-tool version.
-w,--where <where condition> Where condition: col1>99 AND col2<100 ...說明
更多資訊,請參見快速入門Batch Tool。
使用情境
在無特殊說明的情況下,下文中匯入匯出預設指定的字元集是utf-8,檔案分隔字元是,。
假設需要匯出
tpch庫下的customer表(分庫分表模式)。預設匯出,檔案數等於表的分區數:
java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s ,匯出為三個檔案:
java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -F 3匯出為多個檔案,單個檔案最大行數為100000行,最小200行:
java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -L 100000指定
WHERE條件,預設匯出:java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -w "c_nationkey=10"如果文字欄位包含分隔字元,則指定引號模式,預設匯出:
java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o export -t customer -s , -quote force
假設需要將CSV檔案匯入到
tpch庫下的lineitem表(分庫分表模式)。指定單個檔案匯入:
java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -f "./data/lineitem.csv"說明目標表
lineitem需已建立,且目標表和源表的結構需要一致。指定檔案夾路徑下所有檔案匯入:
java -jar batch-tool.jar -P 3306 -h 127.0.XX.XX -u user_**** -p 12**** -D tpch -o import -t lineitem -s , -dir "./data/lineitem/"說明目標目錄下的檔案名稱必須以目標表名開頭。例如:
lineitem0_1。