Tunnel commands - MaxCompute - Alibaba Cloud Documentation Center

If you want to transfer a large amount of data between your on-premises environment and MaxCompute, you can run Tunnel commands provided by MaxCompute to upload and download data. You can run Tunnel commands to upload and download batch or incremental data to improve efficiency and security during large-scale data transfer. This topic provides detailed instructions on Tunnel commands that are used to upload and download data.

Command description

Syntax

tunnel <subcommand> [options] [args]

Valid values:

Available subcommands:
    upload (u)
    download (d)
    resume (r)
    show (s)
    purge (p)
    help (h)

Parameters

upload: uploads data to a MaxCompute table. You can upload files to only one table or only one partition in a table each time. For a partitioned table, you must specify the partition to which you want to upload data. For a multi-level partitioned table, you must specify the lowest-level partition.

-- Upload data in the log.txt file to the p1="b1" and p2="b2" partitions of the test_table table that has two levels of partitions in the test_project project. The log.txt file is saved in the bin directory of the MaxCompute client. 
tunnel upload log.txt test_project.test_table/p1="b1",p2="b2";

-- Upload data in the log.txt file to the test_table table. The scan parameter is used to check whether data in the log.txt file complies with the schema of the test_table table. If the data does not comply, the system reports an error and stops the upload. 
tunnel upload  log.txt  test_table --scan=true;

-- Upload data in the log.txt file from another directory to the p1="b1" and p2="b2" partitions of the test_table table that has two levels of partitions in the test_project project. 
tunnel upload D:\test\log.txt test_project.test_table/p1="b1",p2="b2";

download: downloads data from a MaxCompute table. You can download data from only one table or partition to a single local file at a time. For a partitioned table, you must specify the partition from which you want to download data. For a multi-level partitioned table, you must specify the lowest-level partition.

-- Download data from the test_project.test_table table, which has two levels of partitions, to the test_table.txt file in the bin directory of the MaxCompute client. 
tunnel download  test_project.test_table/p1="b1",p2="b2"  test_table.txt;
-- Download data from the test_project.test_table table, which has two levels of partitions, to the test_table.txt file in another directory. 
tunnel download  test_project.test_table/p1="b1",p2="b2"  D:\test\test_table.txt;

resume: resumes the transfer of files or directories. The transfer is interrupted because your network is disconnected or Tunnel is faulty. You can use this command to resume only data uploads. One data download or upload is referred to as a session. You must specify the session ID in the resume command before you run this command.
```
tunnel resume;
```

show: displays historical task information.

-- Display the commands used in the last five data uploads or downloads. 
tunnel show history -n 5;
-- Display the logs of the last data upload or download. 
tunnel show log;

purge: clears the session directory. Logs from the last three days are cleared by default.
```
-- Clear logs from the last five days. 
tunnel purge 5;
```
help: obtains help information.

Upload

Description
Uploads local data to a MaxCompute table in append mode.
Note
Append mode: If the data that you want to import already exists in the MaxCompute table, the data is not overwritten after you run the Upload command. In this case, both the existing data and the imported data exist in the MaxCompute table.

Syntax

tunnel upload [options] <path> <[project.]table[/partition]>

Format:

Available options:
 -acp,-auto-create-partition <ARG>   auto create target partition if not
                                     exists, default false
 -bs,-block-size <ARG>               block size in MiB, default 100
 -c,-charset <ARG>                   specify file charset, default ignore.
                                     set ignore to download raw data
 -cf,-csv-format <ARG>               use csv format (true|false), default
                                     false. When uploading in csv format,
                                     file splitting not supported.
 -cp,-compress <ARG>                 compress, default true
 -dbr,-discard-bad-records <ARG>     specify discard bad records
                                     action(true|false), default false
 -dfp,-date-format-pattern <ARG>     specify date format pattern, default
                                     yyyy-MM-dd HH:mm:ss
 -fd,-field-delimiter <ARG>          specify field delimiter, support
                                     unicode, eg \u0001. default ","
 -h,-header <ARG>                    if local file should have table
                                     header, default false
 -mbr,-max-bad-records <ARG>         max bad records, default 1000
 -ni,-null-indicator <ARG>           specify null indicator string,
                                     default ""(empty string)
 -ow,-overwrite <true | false>       overwrite specified table or
                                     partition, default: false
 -rd,-record-delimiter <ARG>         specify record delimiter, support
                                     unicode, eg \u0001. default "\r\n"
 -s,-scan <ARG>                      specify scan file
                                     action(true|false|only), default true
 -sd,-session-dir <ARG>              set session dir, default
                                     D:\software\odpscmd_public\plugins\dship
 -ss,-strict-schema <ARG>            specify strict schema mode. If false,
                                     extra data will be abandoned and
                                     insufficient field will be filled
                                     with null. Default true
 -t,-threads <ARG>                   number of threads, default 1
 -te,-tunnel_endpoint <ARG>          tunnel endpoint
 -time,-time <ARG>                   keep track of upload/download elapsed
                                     time or not. Default false
 -tz,-time-zone <ARG>                time zone, default local timezone:
                                     Asia/Shanghai
Example:
    tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"

Parameters
- Required parameters
  - path
    Specifies the path and name of the data file that you want to upload.
    You can save data files in the bin directory of the MaxCompute client. In this case, you must set the path parameter to a value in the File name.File name extension format. You can also save data files to another directory, such as the test folder in drive D. In this case, you must set the path parameter to a value in the D:\test\File name.File name extension format.
    Note
    In macOS, the value of the path parameter can only be an absolute path. For example, if the data files are saved in the bin directory of the MaxCompute client, you must set the path parameter to a value in the D:\MaxCompute\bin \File name.File name extension format.
  - [project.]table[/partition]
    Specifies the name of the table to which you want to upload data. You must specify the lowest-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.
- Optional parameters
  - -acp
    Specifies the partition to which you want to upload data. If the specified partition does not exist, a partition is automatically created. Default value: False.
  - -bs
    Specifies the size of the data block uploaded by Tunnel each time. Default value: 100 MiB (1 MiB = 1024 × 1024 bytes).
  - -c
    Specifies the encoding format of the data file. By default, this parameter is not specified, and raw data is downloaded.
  - -cf
    Specifies whether the file is a CSV file. Default value: False.
    Note
    Only TXT and CSV files can be uploaded. TXT files are uploaded by default. If you want to upload a CSV file, you must configure the -cf parameter and download the latest version of the MaxCompute client.
  - -cp
    Specifies whether to compress the local data file before you upload it to MaxCompute to reduce network traffic. Default value: True.
  - -dbr
    Specifies whether to omit dirty data, such as additional columns, missing columns, or unmatched types of column data. Default value: False.
    - True: omits all data that does not match the definition of the table. By default, 1000 data records are omitted. To change the number of data records that you want to omit, specify the -mbr parameter.
    - False: An error is returned after dirty data is detected. This ensures that raw data in the table to which you want to upload data is not contaminated.
  - -dfp
    Specifies the format of DATETIME data. The default format is yyyy-MM-dd HH:mm:ss. If you want to specify DATETIME data that is accurate to the millisecond, the format yyyy-MM-dd HH:mm:ss.SSS can be used. For more information about the DATETIME data type, see Data type editions.
  - -fd
    Specifies the column delimiter used in the local data file. Default value: comma (,).
  - -h
    Specifies whether the data file that you want to upload has a table header. Default value: False. This value indicates that the data file cannot contain a table header. If you set this parameter to True, the data file can contain a table header. In this case, the system skips the table header and uploads data from the second row.
  - -mbr
    Specifies the maximum number of allowed dirty data records. This parameter must be used together with the -dbr parameter and is valid only when the -dbr parameter is set to True. If the number of dirty data records exceeds the specified value, the upload stops. Default value: 1000.
  - -ni
    Specifies the NULL data identifier. Default value: an empty string.
  - -ow
    Specifies whether the uploaded data overwrites the table or partition. Default value: False. This value indicates that data is uploaded in append mode. The following sample code shows an example.
```
-- Create a partitioned table.
CREATE TABLE IF NOT EXISTS sale_detail(
      shop_name     STRING,
      customer_id   STRING,
      total_price   DOUBLE)
PARTITIONED BY (sale_date STRING,region STRING);

alter table sale_detail add partition (sale_date='201312', region='hangzhou');

-- Prepare the local data file data.txt. The file contains the following content:
shopx,x_id,100
shopy,y_id,200

-- Upload data to the partitioned table.
tunnel upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou;

-- Query the sale_detail table.
select * from sale_detail;

-- The following result is returned:
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| shopx      | x_id        | 100.0       | 201312     | hangzhou   |
| shopy      | y_id        | 200.0       | 201312     | hangzhou   |
+------------+-------------+-------------+------------+------------+

-- Modify the data in the file data.txt. The file contains the following content:
shopx,x_id,300
shopy,y_id,400

-- Upload the file after the data in the file is modified. The new file overwrites the existing file.
tunnel upload -ow true data.txt sale_detail/sale_date=201312,region=hangzhou;

-- Query the sale_detail table.
select * from sale_detail;

-- The following result is returned:
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| shopx      | x_id        | 300.0       | 201312     | hangzhou   |
| shopy      | y_id        | 400.0       | 201312     | hangzhou   |
+------------+-------------+-------------+------------+------------+
```
  - -rd
    Specifies the row delimiter used in the local data file. Default value: \r\n.
  - -s
    Specifies whether to scan the local data file. Default value: True.
    - True: The system scans the data and starts to import the data only if the data is in the correct format.
    - False: The system imports data without scanning.
    - Only: The system scans only the local data. The data is not imported after the scan.
  - -sd
    Specifies the session directory.
  - -ss
    Specifies the strict schema mode. Default value: True. If you set this parameter to False, extra data is discarded, and the fields that are not specified are filled with NULL.
  - -t
    Specifies the number of threads. Default value: 1.
  - -te
    Specifies the endpoint of Tunnel.
  - -time
    Specifies whether the upload time is tracked. Default value: False.
  - -tz
    Specifies the time zone. Default value: local time zone, such as Asia/Shanghai. For more information about time zones, see Time zones.

Show

Displays historical records.

Syntax
```
tunnel show history [-n <number>];
```
-n <number>: specifies the number of times that the command is executed.

Examples

Example 1: Display history records. By default, 500 data records are saved.

tunnel show history;

The following result is returned:

20230505xxxxxxxxxxxxxx0b0d5b3c  bad     'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -dbr true -time true'
20230505xxxxxxxxxxxxxx0ad720a3  failed  'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -time true'
20230505xxxxxxxxxxxxxx0ad5ca68  bad     'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -dbr true'
......

Example 2: Display the commands used in the last five data uploads or downloads.

tunnel show history -n 5;

The following result is returned:

20230505xxxxxxxxxxxxxx0aa48c4b  success 'download sale_detail/sale_date=201312,region=hangzhou result.txt'
20230505xxxxxxxxxxxxxx0aa6165c  success 'download sale_detail/sale_date=201312,region=hangzhou result.txt'
20230505xxxxxxxxxxxxxx0af11472  failed  'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'
20230505xxxxxxxxxxxxxx0b464374  success 'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'
20230505xxxxxxxxxxxxxx02dbb6bd  failed  'upload d:\data.txt sale_detail/sale_date="201312",region="hangzhou" -s false'

Displays the logs of the last data upload or download.
```
tunnel show log;
```

Resume

Description
Resumes the execution of historical operations. Only data uploads can be resumed.

Syntax

odps@  project_name>tunnel help resume;
usage: tunnel resume [session_id] [-force]
              resume an upload session
 -f,-force   force resume
Example:
       tunnel resume

Parameters
- session_id
  Specifies the ID of the session for which the upload failed. This parameter is a required parameter.
- -f
  Specifies whether to forcefully resume the execution of historical operations. This parameter is omitted by default.

Example

Run the following command to resume the session for which the upload failed. In this command, 20150610xxxxxxxxxxx70a002ec60c indicates the ID of the session for which the upload failed.

odps@ project_name>tunnel resume 20150610xxxxxxxxxxx70a002ec60c -force;
start resume
20150610xxxxxxxxxxx70a002ec60c
Upload session: 20150610xxxxxxxxxxx70a002ec60c
Start upload:d:\data.txt
Resume 1 blocks 
2015-06-10 16:46:42     upload block: '1'
2015-06-10 16:46:42     upload block complete, blockid=1
upload complete, average speed is 0 KB/s
OK

Download

Description
Downloads MaxCompute table data or the execution result of a specific instance to a local directory.
You must be granted the Download permission before you use Tunnel to download data. If you do not have the Download permission, you must contact the project owner or a user who is assigned the Super_Administrator role to complete the authorization. For more information about how to grant the Download permission, see Policy-based access control.

Syntax

odps@ project_name>tunnel help download;
usage: tunnel download [options] <[project.]table[/partition]> <path>

              download data to local file
 -c,-charset <ARG>                 specify file charset, default ignore.
                                   set ignore to download raw data
 -cf,-csv-format <ARG>             use csv format (true|false), default
                                   false. When uploading in csv format,
                                   file splitting not supported.
 -ci,-columns-index <ARG>          specify the columns index(starts from
                                   0) to download, use comma to split each
                                   index
 -cn,-columns-name <ARG>           specify the columns name to download,
                                   use comma to split each name
 -cp,-compress <ARG>               compress, default true
 -dfp,-date-format-pattern <ARG>   specify date format pattern, default
                                   yyyy-MM-dd HH:mm:ss
 -e,-exponential <ARG>             When download double values, use
                                   exponential express if necessary.
                                   Otherwise at most 20 digits will be
                                   reserved. Default false
 -fd,-field-delimiter <ARG>        specify field delimiter, support
                                   unicode, eg \u0001. default ","
 -h,-header <ARG>                  if local file should have table header,
                                   default false
    -limit <ARG>                   specify the number of records to
                                   download
 -ni,-null-indicator <ARG>         specify null indicator string, default
                                   ""(empty string)
 -rd,-record-delimiter <ARG>       specify record delimiter, support
                                   unicode, eg \u0001. default "\r\n"
 -sd,-session-dir <ARG>            set session dir, default
                                   D:\software\odpscmd_public\plugins\dship
 -t,-threads <ARG>                 number of threads, default 1
 -te,-tunnel_endpoint <ARG>        tunnel endpoint
 -time,-time <ARG>                 keep track of upload/download elapsed
                                   time or not. Default false
 -tz,-time-zone <ARG>              time zone, default local timezone:
                                   Asia/Shanghai
usage: tunnel download [options] instance://<[project/]instance_id> <path>

              download instance result to local file
 -c,-charset <ARG>                 specify file charset, default ignore.
                                   set ignore to download raw data
 -cf,-csv-format <ARG>             use csv format (true|false), default
                                   false. When uploading in csv format,
                                   file splitting not supported.
 -ci,-columns-index <ARG>          specify the columns index(starts from
                                   0) to download, use comma to split each
                                   index
 -cn,-columns-name <ARG>           specify the columns name to download,
                                   use comma to split each name
 -cp,-compress <ARG>               compress, default true
 -dfp,-date-format-pattern <ARG>   specify date format pattern, default
                                   yyyy-MM-dd HH:mm:ss
 -e,-exponential <ARG>             When download double values, use
                                   exponential express if necessary.
                                   Otherwise at most 20 digits will be
                                   reserved. Default false
 -fd,-field-delimiter <ARG>        specify field delimiter, support
                                   unicode, eg \u0001. default ","
 -h,-header <ARG>                  if local file should have table header,
                                   default false
    -limit <ARG>                   specify the number of records to
                                   download
 -ni,-null-indicator <ARG>         specify null indicator string, default
                                   ""(empty string)
 -rd,-record-delimiter <ARG>       specify record delimiter, support
                                   unicode, eg \u0001. default "\r\n"
 -sd,-session-dir <ARG>            set session dir, default
                                   D:\software\odpscmd_public\plugins\dshi
 -t,-threads <ARG>                 number of threads, default 1
 -te,-tunnel_endpoint <ARG>        tunnel endpoint
 -time,-time <ARG>                 keep track of upload/download elapsed
                                   time or not. Default false
 -tz,-time-zone <ARG>              time zone, default local timezone:
                                   Asia/Shanghai
Example:
    tunnel download test_project.test_table/p1="b1",p2="b2" log.txt // Download data from a specific table.
    tunnel download instance://test_project/test_instance log.txt   // Download the execution result of a specific instance.

Parameters
- Required parameters
  - path
    Specifies the path in which the downloaded data file is saved.
    You can save data files in the bin directory of the MaxCompute client. In this case, you must set path to a value in the File name.File name extension format. You can also save data files to another directory, such as the test folder in drive D. In this case, you must set path to a value in the D:\test\File name.File name extension format.
  - [project.]table[/partition]
    Specifies the name of the table that you want to download. You must specify the lowest-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.
  - [project/]instance_id
    Specifies the ID of an instance. You can download the execution result of a specific instance.
- Optional parameters
  - -c
    Specifies the encoding format of a local data file. This parameter is omitted by default.
  - -cf
    Specifies whether the file is a CSV file. Default value: False.
    Note
    Only TXT and CSV files can be downloaded. TXT files are downloaded by default. If you want to download a CSV file, you must set the -cf parameter to true and download the latest version of the MaxCompute client. After you set the -cf parameter to true, the file delimiter can only be commas (,). In this case, the delimiter specified by the -fd parameter does not take effect.
  - -ci
    Specifies the indexes of the columns that you want to download. The column indexes start from 0. Separate the column indexes with commas (,).
  - -cn
    Specifies the names of the columns that you want to download. Separate the column names with commas (,).
  - -cp
    Specifies whether to compress the local file before you download it to reduce network traffic. Default value: True.
  - -dfp
    Specifies the format of DATETIME data. The default format is yyyy-MM-dd HH:mm:ss.
  - -e
    Specifies whether to represent data of the DOUBLE type that you want to download by using an exponential function. If the data is not represented by an exponential function, retain a maximum of 20 digits. Default value: False.
  - -fd
    Specifies the column delimiter for the local data file. Default value: comma (,).
  - -h
    Specifies whether the data file has a table header. Default value: False. This value indicates that the data file does not have a table header. If you set this parameter to True, the data file has a table header.
    Note
    -h=true and threads>1 cannot be used together. threads>1 indicates that the number of threads is greater than 1.
  - -limit
    Specifies the number of rows to download.
  - -ni
    Specifies the NULL data identifier. Default value: an empty string.
  - -rd
    Specifies the row delimiter used in the local data file. Default value: \r\n.
  - -sd
    Specifies the session directory.
  - -t
    Specifies the number of threads. Default value: 1.
  - -te
    Specifies the endpoint of Tunnel.
  - -time
    Specifies whether the download time is tracked. Default value: False.
  - -tz
    Specifies the time zone. The local time zone is used by default, such as Asia/Shanghai.

Purge

Description
Clears the session directory.

Syntax

odps@ project_name>tunnel help purge;
usage: tunnel purge [n]
              force session history to be purged.([n] days before, default
              3 days)
Example:
       tunnel purge 5

Parameters
n: specifies the number of days after which historical logs are cleared. Default value: 3.

Precautions

The following table describes the data types.

Data type	Description
STRING	The string that supports a maximum length of 8 MB.
BOOLEAN	For file uploads, the value can be True, False, 0, or 1. For file downloads, the value can be True or False. The value is not case-sensitive.
BIGINT	Valid values: [-9223372036854775807,9223372036854775807].
DOUBLE	A 16-digit number. Data of this type is expressed in scientific notations during data uploads. Data of this type is expressed in numbers during data downloads. Maximum value: 1.7976931348623157E308. Minimum value: 4.9E-324. Positive infinity: Infinity. Negative infinity: -Infinity.
DATETIME	By default, data of the DATETIME type can be uploaded when the time zone is GMT+8. You can use command lines to specify the format pattern of this type. If you upload data of this type, you must specify the time format. For more information, see Data type editions. `"yyyyMMddHHmmss": For example, "20140209101000" is used. "yyyy-MM-dd HH:mm:ss" (default format): For example, "2014-02-09 10:10:00" is used. "MM/dd/yyyy": For example, "09/01/2014" is used.` Example: `tunnel upload log.txt test_table -dfp "yyyy-MM-dd HH:mm:ss"`

Null: Each data type can have null values.
- By default, a blank string indicates a null value.
- The -null-indicator parameter is used to specify a null string.
```
tunnel upload log.txt test_table -ni "NULL"
```
Encoding format: You can specify the encoding format of the file. Default value: UTF-8.
```
tunnel upload log.txt test_table -c "gbk"
```
Delimiter: Tunnel commands support custom file delimiters. -record-delimiter is used to customize row delimiters, and -field-delimiter is used to customize column delimiters.
- A row or column delimiter can contain multiple characters.
- A column delimiter cannot contain a row delimiter.
- Only the following escape character delimiters are supported in the command line: \r, \n, and \t.
```
tunnel upload log.txt test_table -fd "||" -rd "\r\n"
```

MaxCompute:Tunnel commands

Command description

Upload

Show

Resume

Download

Purge

Precautions

References