MaxCompute allows you to run Tunnel commands to upload and download data. This topic describes how to use MaxCompute Tunnel commands, such as UPLOAD, SHOW, and RESUME.

Descriptions

Command syntax
odps@ project_name>tunnel help;
    Usage: tunnel <subcommand> [options] [args]
    Type 'tunnel help <subcommand>' for help on a specific subcommand.
Available subcommands:
    upload (u)
    download (d)
    resume (r)
    show (s)
    purge (p)
    help (h)
tunnel is a command for uploading data to / downloading data from ODPS.
Command functions
  • UPLOAD: uploads data to a MaxCompute table. You can upload files or level-1 directories to only one table or partition in a table each time. For a partitioned table, you must specify the partition to which you want to upload data. For a multi-level partitioned table, you must specify the last-level partition.
    -- Upload data in the log.txt file to the p1="b1" and p2="b2" partitions in the test_table table, which has two levels of partitions, in the test_project project.
    tunnel upload log.txt test_project.test_table/p1="b1",p2="b2";
    
    -- Upload data in the log.txt file to the test_table table. The scan parameter indicates whether data in the log.txt file complies with the definition of the test_table table. If it does not, the system reports an error and stops the upload.
    tunnel upload  log.txt  test_table --scan=only;
  • DOWNLOAD: downloads data from a MaxCompute table. You can only download data from one table or partition to a single file each time. For a partitioned table, you must specify the partition from which you want to download data. For a multi-level partitioned table, you must specify the last-level partition.
    -- Download data from the test_project.test_table table, which has two levels of partitions, to the test_table.txt file.
    tunnel download  test_project.test_table/p1="b1",p2="b2"  test_table.txt;
  • RESUME: resumes the transfer of files or directories when your network is disconnected or the Tunnel service is faulty. You can use this command to resume only data uploads. Each data download or upload is referred to as a session. You must specify the session ID in the RESUME command before you run this command.
    tunnel resume;
  • SHOW: displays historical task information.
    -- Display commands used in the last five data uploads or downloads.
    tunnel show history -n 5;
    -- Display the logs of the last data upload or download.
    tunnel show log;
  • PURGE: clears the session directory. Sessions from the last three days are cleared by default.
    --Clear logs from the last five days.
    tunnel purge 5;
  • HELP: obtains help information.

UPLOAD

Uploads local data to a MaxCompute table in append mode.

Command syntax
odps@ project_name>tunnel help upload;
usage: tunnel upload [options] <path> <[project.]table[/partition]>
              upload data from local file
 -acp,-auto-create-partition <ARG>   auto create target partition if not
                                     exists, default false
 -bs,-block-size <ARG>               block size in MiB, default 100
 -c,-charset <ARG>                   specify file charset, default ignore.
                                     set ignore to download raw data
 -cp,-compress <ARG>                 compress, default true
 -dbr,-discard-bad-records <ARG>     specify discard bad records
                                     action(true|false), default false
 -dfp,-date-format-pattern <ARG>     specify date format pattern, default
                                     yyyy-MM-dd HH:mm:ss; 
 -fd,-field-delimiter <ARG>          specify field delimiter, support
                                     unicode, eg \u0001. default ","
 -h,-header <ARG>                    if local file should have table
                                     header, default false
 -mbr,-max-bad-records <ARG>         max bad records, default 1000
 -ni,-null-indicator <ARG>           specify null indicator string,
                                     default ""(empty string)
 -rd,-record-delimiter <ARG>         specify record delimiter, support
                                     unicode, eg \u0001. default "\r\n"
 -s,-scan <ARG>                      specify scan file
                                     action(true|false|only), default true
 -sd,-session-dir <ARG>              set session dir, default
                                     D:\software\odpscmd_public\plugins\ds
                                     hip
 -ss,-strict-schema <ARG>            specify strict schema mode. If false,
                                     extra data will be abandoned and
                                     insufficient field will be filled
                                     with null. Default true
 -te,-tunnel_endpoint <ARG>          tunnel endpoint
    -threads <ARG>                   number of threads, default 1
 -tz,-time-zone <ARG>                time zone, default local timezone:
                                     Asia/Shanghai
Example:
    tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"
Parameters
  • Required parameters
    • path

      Specifies the path and name of the local data file you want to upload.

    • [project.]table[/partition]

      Specifies the name of the destination table to which you want to upload data. You must specify the last-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.

  • (Optional parameters) Options
    • -acp

      Creates a destination partition if the specified partition does not exist.

      Default value: false.

    • -bs

      Specifies the size of each data block uploaded by Tunnel each time.

      Default value: 100 MiB (1 MiB = 1,024 × 1024 bytes).

    • -c

      Specifies the encoding format of the local data file.

      Default value: ignore. If you do not specify this option, the source data is downloaded by default.

    • -cp

      Specifies whether to compress the local data file before uploading it to MaxCompute to reduce network traffic.

      Default value: true.

    • -dbr

      Specifies whether to ignore dirty data, such as additional columns, missing columns, or unmatched types of column data.

      • If you set this option to true, all data that does not comply with the table definition is ignored.
      • If you set this option to false, an error is returned when dirty data is detected. This ensures that raw data in the destination table is not contaminated.
    • -dfp

      Specifies the format of DATETIME data. The default format is yyyy-MM-dd HH:mm:ss. If you want to specify the accuracy of DATETIME-type data to milliseconds, you can run the tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS' command. For more information about the DATETIME type, see Data types.

    • -fd

      Specifies the column delimiter used in the local data file.

      Default value: comma (,).

    • -h

      Specifies whether the local data file has a table header. If you set this option to true, Dship skips the table header and uploads data from the second row.

    • -mbr

      Specify the tolerable volume of dirty data.

      By default, if more than 1,000 records of dirty data are uploaded, the upload is terminated.

    • -ni

      Specifies the NULL data identifier.

      Default value: ""(empty string).

    • -rd

      Specifies the row delimiter used in the local data file.

      Default value: \r\n.

    • -s
      Specifies whether to scan the local data file. Valid values:
      • true: The system scans the data and starts to import it only if it is in the correct format.
      • false: The system does not scan the data before it imports it.
      • only: The system only scans the data but does not import it.

      Default value: false.

    • -sd

      Specifies the path of the session directory.

    • -te

      Specifies the endpoint of Tunnel.

    • -threads

      Specifies the number of threads.

      Default value: 1.

    • -tz

      Specifies the time zone.

      Default value: local time zone, for example, Asia/Shanghai.

SHOW

Displays historical records.

Command syntax
odps@ project_name>tunnel help show;
usage: tunnel show history [options]
              show session information
 -n,-number <ARG>   lines
Example:
       tunnel show history -n 5
       tunnel show log

Parameters

-n: specifies the number of rows to be returned.

Example
odps@ project_name>tunnel show  history;
20150610xxxxxxxxxxx70a002ec60c  failed  'u --config-file /D:/console/conf/odps_config.ini --project odpstest_ay52c_ay52 --endpoint http://service.odps.aliyun.com/api --id UlxxxxxxxxxxxrI1 --key 2m4r3WvTxxxxxxxxxx0InVke7UkvR d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'
Note 20150610xxxxxxxxxxx70a002ec60c is the session ID of a failed data upload.

RESUME

Resumes the execution of historical operations. Only data uploads can be resumed.

Command syntax
odps@  project_name>tunnel help resume;
usage: tunnel resume [session_id] [-force]
              resume an upload session
 -f,-force   force resume
Example:
       tunnel resume

Parameters

session_id: specifies the session ID of a failed data upload. This parameter is required.

Example

Run the following command to resume the data upload:
odps@ project_name>tunnel resume 20150610xxxxxxxxxxx70a002ec60c --force;
start resume
20150610xxxxxxxxxxx70a002ec60c
Upload session: 20150610xxxxxxxxxxx70a002ec60c
Start upload:d:\data.txt
Resume 1 blocks 
2015-06-10 16:46:42     upload block: '1'
2015-06-10 16:46:42     upload block complete, blockid=1
upload complete, average speed is 0 KB/s
OK
20150610xxxxxxxxxxx70a002ec60c is the session ID of a failed data upload.

DOWNLOAD

Downloads MaxCompute table data or the execution result of a specific instance to a local directory.

Command syntax
odps@ project_name>tunnel help download;
usage:tunnel download [options] <[project.]table[/partition]> <path>
              download data to local file
 -c,-charset <ARG>                 specify file charset, default ignore.
                                   set ignore to download raw data
 -ci,-columns-index <ARG>          specify the columns index(starts from
                                   0) to download, use comma to split each
                                   index
 -cn,-columns-name <ARG>           specify the columns name to download,
                                   use comma to split each name
 -cp,-compress <ARG>               compress, default true
 -dfp,-date-format-pattern <ARG>   specify date format pattern, default
                                   yyyy-MM-dd HH:mm:ss
 -e,-exponential <ARG>             When download double values, use
                                   exponential express if necessary.
                                   Otherwise at most 20 digits will be
                                   reserved. Default false
 -fd,-field-delimiter <ARG>        specify field delimiter, support
                                   unicode, eg \u0001. default ","
 -h,-header <ARG>                  if local file should have table header,
                                   default false
    -limit <ARG>                   specify the number of records to
                                   download
 -ni,-null-indicator <ARG>         specify null indicator string, default
                                   ""(empty string)
 -rd,-record-delimiter <ARG>       specify record delimiter, support
                                   unicode, eg \u0001. default "\r\n"
 -sd,-session-dir <ARG>            set session dir, default
                                   D:\software\odpscmd_public\plugins\dshi
                                   p
 -te,-tunnel_endpoint <ARG>        tunnel endpoint
    -threads <ARG>                 number of threads, default 1
 -tz,-time-zone <ARG>              time zone, default local timezone:
                                   Asia/Shanghai
usage: tunnel download [options] instance://<[project/]instance_id> <path>
              download instance result to local file
 -c,-charset <ARG>                 specify file charset, default ignore.
                                   set ignore to download raw data
 -ci,-columns-index <ARG>          specify the columns index(starts from
                                   0) to download, use comma to split each
                                   index
 -cn,-columns-name <ARG>           specify the columns name to download,
                                   use comma to split each name
 -cp,-compress <ARG>               compress, default true
 -dfp,-date-format-pattern <ARG>   specify date format pattern, default
                                   yyyy-MM-dd HH:mm:ss
 -e,-exponential <ARG>             When download double values, use
                                   exponential express if necessary.
                                   Otherwise at most 20 digits will be
                                   reserved. Default false
 -fd,-field-delimiter <ARG>        specify field delimiter, support
                                   unicode, eg \u0001. default ","
 -h,-header <ARG>                  if local file should have table header,
                                   default false
    -limit <ARG>                   specify the number of records to
                                   download
 -ni,-null-indicator <ARG>         specify null indicator string, default
                                   ""(empty string)
 -rd,-record-delimiter <ARG>       specify record delimiter, support
                                   unicode, eg \u0001. default "\r\n"
 -sd,-session-dir <ARG>            set session dir, default
                                   D:\software\odpscmd_public\plugins\dshi
                                   p
 -te,-tunnel_endpoint <ARG>        tunnel endpoint
    -threads <ARG>                 number of threads, default 1
 -tz,-time-zone <ARG>              time zone, default local timezone:
                                   Asia/Shanghai
Example:
       tunnel download test_project.test_table/p1="b1",p2="b2" log.txt   //Download data from a specific table.
       tunnel download instance://test_project/test_instance log.txt     //Download the execution result of a specific instance.
Parameters
  • Required parameters
    • path

      Specifies the local path for saving the downloaded data file.

    • [project.]table[/partition]

      Specifies the name of the table from which you want to download data. You must specify the last-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.

    • test_project/test_instance

      Specifies the ID of the instance. You must specify this option to download the execution result of a specific instance.

  • (Optional parameters) Options
    • -c: specifies the encoding format of the local data file. Default value: ignore.
    • -ci: specifies the column index (starting from 0) for data downloads. Separate multiple indexes with commas (,).
    • -cn: specifies the name of the column from which you want to download data. Separate multiple names with commas (,).
    • -cp,-compress: specifies whether to compress the local data file before uploading it to MaxCompute to reduce network traffic. Default value: true.
    • -dfp: specifies the format of DATETIME data. The default format is yyyy-MM-dd HH:mm:ss.
    • -e: allows you to express values in exponential notation when you download data of the DOUBLE type. If you do not specify this option, a maximum of 20 digits can be retained.
    • -fd: specifies the column delimiter used in the local data file. Default value: comma (,).
    • -h: specifies whether the local data file has a table header. If you set this option to true, Dship skips the table header and downloads data from the second row.
      Note -h=true and threads>1: specify that multiple threads cannot be used together.
    • -limit: specifies the maximum number of files that you can download at a time.
    • -ni: specifies the NULL data identifier. Default value: ""(empty string).
    • -rd: specifies the row delimiter used in the local data file. Default value: \r\n.
    • -sd: specifies the path of the session directory.
    • -te: specifies the endpoint of Tunnel.
    • -threads: specifies the number of threads. Default value: 1.
    • -tz: specifies the time zone. Default value: local time zone, for example, Asia/Shanghai.

PURGE

Clears the session directory.

Command syntax
odps@ project_name>tunnel help purge;
usage: tunnel purge [n]
              force session history to be purged.([n] days before, default
              3 days)
Example:
       tunnel purge 5

Parameters

n: specifies after how many days logs are cleared. Default value: 3.

Precautions

  • The following table describes the data types.
    Data type Description
    STRING The length of the data of this type cannot exceed 8 MB.
    BOOLEN The value can only be true, false, 0, or 1 for data uploads. The value can be true or false and is not case-sensitive for data downloads.
    BIGINT Value range: [-9223372036854775807,9223372036854775807].
    DOUBLE
    • 16-digit.
    • Data of this type is expressed in scientific notation during data uploads.
    • Data of this type is expressed in numerals during data downloads.
    • Maximum value: 1.7976931348623157E308.
    • Minimum value: 4.9E-324.
    • Positive infinity: Infinity.
    • Negative infinity: -Infinity.
    DATETIME By default, data of the DATETIME type can be uploaded when the time zone is GMT+8. You can use command lines to specify the date format. If you upload data of this type, you must specify the time format. For more information, see Data types.
    "yyyyMMddHHmmss": for example, "20140209101000"
    "yyyy-MM-dd HH:mm:ss" (default): for example, "2014-02-09 10:10:00"
    "MM/dd/yyyy": for example, "09/01/2014"
    Example:
    tunnel upload log.txt test_table -dfp "yyyy-MM-dd HH:mm:ss"
  • Null: Every data type can have null values.
    • By default, a blank string indicates a null value.
    • The -null-indicator parameter is used to specify a null string.
      tunnel upload log.txt test_table -ni "NULL"
  • Encoding format: You can specify the encoding format of the file. Default value: UTF-8.
    tunnel upload log.txt test_table -c "gbk"
  • Delimiter: Tunnel commands support custom file delimiters. -record-delimiter is used to customize row delimiters, and -field-delimiter is used to customize column delimiters.
    • A row or column delimiter can contain multiple characters.
    • A column delimiter cannot contain a row delimiter.
    • Only the following escape character delimiters are supported in the command line: \r, \n, and \t.
    tunnel upload log.txt test_table -fd "||" -rd "\r\n"