This topic describes how to use Batch Tool to export and import data.

Introduction

Batch Tool is developed by the PolarDB-X team. You can use Batch Tool to import and export data for PolarDB-X databases.

Batch Tool is implemented based on Java statements. Batch Tool uses a producer/consumer model. Batch Tool allows you to perform operations by using multiple threads. Batch Tool also allows you to export, import, delete, and update multiple data rows at a time. If you use Batch Tool, data is transmitted in the CSV format. This allows you to perform operations on data in a convenient manner.

The following code block describes the parameters of and how to run the command that is provided by Batch Tool.

usage: BatchTool [-?] [-batchsize <arg>] [-con <consumer count>] [-cs
           <charset>] -D <database> [-dir <directory>] [-f <from>] [-F
           <filenum>] [-func] -h <host> [-header] [-i] [-in] [-L <line>]
           [-lastSep] [-lb] [-local] [-maxConn <arg>] [-minConn <arg>]
           [-noesc] [-np] [-O <order by type>] -o <operation> [-OC <ordered
           column>] -p <password> [-P <port>] [-para] [-pre <prefix>] [-pro
           <producer count>] [-quote <auto/force/none>] [-readsize <arg>]
           [-rfonly] [-ringsize <arg>] -s <sep> [-t <table>] -u <user> [-w
           <where>]
     -?,--help                              Help message.
     -batchsize,--batchSize <arg>           Batch size of emitted tuples.
     -con,--consumer <consumer count>       Configure number of consumer
                                            threads.
     -cs,--charset <charset>                Define charset of files.
     -D,--database <database>               Database to use.
     -dir,--dir <directory>                 Directory path including files to
                                            import.
     -f,--from <from>                       Source file(s), separated by ; .
     -F,--filenum <filenum>                 Fixed number of exported files.
     -func,--sqlfunc                        Use sql function to update.
     -h,--host <host>                       Connect to host.
     -header,--header                       Whether the header line is column
                                            names.
     -H,--historyFile <filename>            history file name for resuming from breakpoint
     -i,--ignoreandresume                   Flag of insert ignore & resume from breakpoint
     -in,--wherein                          Using where ... in (...)
     -L,--line <line>                       Max line limit of exported files.
     -lastSep,--withLastSep                 Whether line ends with separator.
     -lb,--loadbalance                      If using load balance.
     -local,--localmerge                    o local merge sort.
     -maxConn,--maxConnection <arg>         Max connection number limit.
     -minConn,--minConnection <arg>         Mim connection number limit.
     -noesc,--noescape                      Don't escape values.
     -np,--noparttion                       No use of parttion.
     -O,--orderby <order by type>           asc or desc.
     -o,--operation <operation>             Batch operation type: export /
                                            import / delete / update.
     -OC,--orderCol <ordered column>        col1;col2;col3.
     -p,--password <password>               Password to use when connecting to
                                            server.
     -P,--port <port>                       Port number to use for connection.
     -para,--paraMerge                      Using parallel merge when doing
                                            order by export.
     -pre,--prefix <prefix>                 Export file name prefix.
     -pro,--producer <producer count>       Configure number of producer
                                            threads.
     -quote,--quoteMode <auto/force/none>   The mode of how field values are
                                            enclosed by double-quotes when
                                            exporting table. Default value is
                                            auto.
     -readsize,--readSize <arg>             Read block size in MB.
     -rfonly,--rfonly                       Only read and process file, no sql
                                            execution.
     -ringsize,--ringBufferSize <arg>       Ring buffer size.
     -s,--sep <sep>                         Separator between fields
                                            (delimiter).
     -t,--table <table>                     Target table.
     -tps,--tpsLimit <arg>                  Tps limit
     -u,--user <user>                       User for login.
     -w,--where <where>                     Where condition: col1>99 AND
                                            col2<100 ...
Parameter Description

The following section describes the common parameters:

  • -o: specifies the operation that you want to perform. Valid values: export, import, delete, and update.
  • -t: specifies the name of the destination table. Only one table can be specified.
  • -s: specifies the delimiter. You can use a string or a character as a delimiter.
  • -f: specifies one or more source files. If you specify multiple source files, separate the names of the source files with semicolons (;).
  • -OC: specifies one or more columns based on which data is sorted when the data is exported. If you specify multiple columns, separate the names of the columns with semicolons (;).
  • -cs: specifies the character set of the source file or source files. Default value: UTF-8.
  • -lastSep: specifies whether each line in the source file or source files ends with a delimiter.
  • -quote: specifies how double quotation marks (") enclose a column value when the data is exported or imported. Valid values:
    • auto: uses double quotation marks (") to enclose a column value based on whether the column value includes special characters, such as a delimiter and a line feed. This value is the default value.
    • force: enforces the usage of double quotation marks (") to enclose each column value.
    • none: forbids the usage of double quotation marks (") to enclose each column value. This value can be applied in scenarios in which the columns of the specified table contain only numeric values or contain string values that do not include special characters.
  • -header: specifies whether the first row is a column header.
  • -i: specifies whether to enable resumable uploads and execute the INSERT IGNORE statement.
  • -pre: specifies the prefix of the files that you want to export.
  • -F: specifies the number of files that you want to export.

Download Batch Tool

Click Batch Tool to download the JAR file of Batch Tool.

Examples

The following section provides examples on how to use Batch Tool. A compiled batch-tool.jar is used in these examples. For more information about batch-tool.jar, see the following code:
java -jar batch-tool.jar -?
  • Export multiple data rows at a time.
    ## 1. Export multiple files at a time in a default manner. The number of files to be exported equals the number of sharded tables from which you export the files.
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o export -t table_name -s ,
    
    ## 2. Export three files at a time. -F specifies the number of files that you want to export.
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o export -t table_name -s , -F 3
    
    ## 3. Export multiple files at a time. Each file that is exported contains a maximum of ten thousand rows. -L specifies the maximum number of rows that the files that are exported can contain.
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o export -t table_name -s , -L 10000
    
    ## 4. Use the -w parameter to specify a condition. The -w parameter works in the same manner as the WHERE clause. If the condition contains a space, enclose the condition within double quotation marks (").
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o export -t table_name -s , -w "id < 150 and id > 120"
  • Import an amount of data at a time. Before you import data, you must create a destination table. Batch Tool supports only data transmission and cannot create destination tables.
    ## 1. Separate multiple files with semicolons (;).
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o import -t table_name -s , -f "table_name_0;table_name_1;table_name_2;table_name_3"
    
    ## 2. By default, data is inserted based on the shard keys of the destination table. If you do not want to insert data based on the shard keys, enter -np in your command.
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o import -t table_name -s , -f "file0;file2" -np
    
    ## 3. Specify numbers of producer threads and consumer threads. -pro specifies the number of producer threads. The producer threads are used to read files. -con specifies the number of consumer threads. The consumer threads are used to import files.
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o import -t table_name -s , -f "table_name_0;table_name_1" -np -pro 16 -con 16
    
    ## 4. Allow Batch Tool to perform resumable uploads and execute the INSERT IGNORE statement.
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o import -t table_name -s , -f "table_name_0;table_name_1" -i
  • Delete multiple data rows at a time. The following sample code provides an example on how to delete the data in the specified database. The data is contained in the specified file. In the following code, a delete query is created to delete the data based on the table schema of the specified table.
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o delete -t table_name -s , -f "file0"
  • Update multiple data rows at a time. The following sample code provides an example on how to update the data in the specified database. The data is contained in the specified file. In the following code, an update query is created to update the data based on the table schema of the specified table.
    java -jar batch-tool.jar -h 127.0.0.1 -u testUser -p testPasswd -P 3306 -D db_name -o update -t table_name -s , -f "file0"