MaxCompute allows you to run Tunnel commands to upload and download data. This topic provides detailed instructions on Tunnel commands for data upload and download.

Descriptions

  • Syntax
    odps@ project_name>tunnel help;
        Usage: tunnel <subcommand> [options] [args]
        Type 'tunnel help <subcommand>' for help on a specific subcommand.
    Available subcommands:
        upload (u)
        download (d)
        resume (r)
        show (s)
        purge (p)
        help (h)
    tunnel is a command for uploading data to / downloading data from ODPS.
  • Parameters
    • upload: uploads data to a MaxCompute table. You can upload files or level-1 directories to only one table or partition in a table each time. For a partitioned table, you must specify the partition to which you want to upload data. For a multi-level partitioned table, you must specify the last-level partition.
      -- Upload data in the log.txt file to the p1="b1" and p2="b2" partitions in the test_table table, which has two levels of partitions, in the test_project project.
      tunnel upload log.txt test_project.test_table/p1="b1",p2="b2";
      
      -- Upload data in the log.txt file to the test_table table. The scan parameter indicates whether data in the log.txt file complies with the definition of the test_table table. If it does not, the system reports an error and stops the upload.
      tunnel upload  log.txt  test_table --scan=true;
    • download: downloads data from a MaxCompute table. You can only download data from one table or partition to a single file each time. For a partitioned table, you must specify the partition from which you want to download data. For a multi-level partitioned table, you must specify the last-level partition.
      -- Download data from the test_project.test_table table, which has two levels of partitions, to the test_table.txt file.
      tunnel download  test_project.test_table/p1="b1",p2="b2"  test_table.txt;
    • resume: resumes the transfer of files or directories when your network is disconnected or the Tunnel service is faulty. You can use this command to resume only data uploads. One data download or upload is referred to as a session. You must specify the session ID in the resume command before you run this command.
      tunnel resume;
    • show: displays historical task information.
      -- Display commands used in the last five data uploads or downloads.
      tunnel show history -n 5;
      -- Display the logs of the last data upload or download.
      tunnel show log;
    • purge: clears the session directory. By default, sessions from the last three days are cleared.
      --Clear logs from the last five days.
      tunnel purge 5;
    • help: obtains help information.

Upload

  • Function

    Uploads local data to a MaxCompute table in append mode.

  • Syntax
    odps@ project_name>tunnel help upload;
    usage: tunnel upload [options] <path> <[project.]table[/partition]>
    
                  upload data from local file
     -acp,-auto-create-partition <ARG>   auto create target partition if not
                                         exists, default false
     -bs,-block-size <ARG>               block size in MiB, default 100
     -c,-charset <ARG>                   specify file charset, default ignore.
                                         set ignore to download raw data
     -cf,-csv-format <ARG>               use csv format (true|false), default
                                         false. When uploading in csv format,
                                         file splitting not supported.
     -cp,-compress <ARG>                 compress, default true
     -dbr,-discard-bad-records <ARG>     specify discard bad records
                                         action(true|false), default false
     -dfp,-date-format-pattern <ARG>     specify date format pattern, default
                                         yyyy-MM-dd HH:mm:ss
     -fd,-field-delimiter <ARG>          specify field delimiter, support
                                         unicode, eg \u0001. default ","
     -h,-header <ARG>                    if local file should have table
                                         header, default false
     -mbr,-max-bad-records <ARG>         max bad records, default 1000
     -ni,-null-indicator <ARG>           specify null indicator string,
                                         default ""(empty string)
     -ow,-overwrite <true | false>       overwrite specified table or
                                         partition, default: false
     -rd,-record-delimiter <ARG>         specify record delimiter, support
                                         unicode, eg \u0001. default "\r\n"
     -s,-scan <ARG>                      specify scan file
                                         action(true|false|only), default true
     -sd,-session-dir <ARG>              set session dir, default
                                         D:\software\odpscmd_public\plugins\dship
     -ss,-strict-schema <ARG>            specify strict schema mode. If false,
                                         extra data will be abandoned and
                                         insufficient field will be filled
                                         with null. Default true
     -t,-threads <ARG>                   number of threads, default 1
     -te,-tunnel_endpoint <ARG>          tunnel endpoint
     -time,-time <ARG>                   keep track of upload/download elapsed
                                         time or not. Default false
     -tz,-time-zone <ARG>                time zone, default local timezone:
                                         Asia/Shanghai
    Example:
        tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"
  • Parameters
    • Required parameters
      • path

        Specifies the path and name of the local data file you want to upload.

      • [project.]table[/partition]

        Specifies the name of the table to which you want to upload data. You must specify the last-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.

    • Optional parameters
      • -acp

        Creates a partition to which you want to upload data if the specified partition does not exist. Default value: False.

      • -bs

        Specifies the size of each data block uploaded by Tunnel each time. Default value: 100 MiB (1 MiB = 1024 × 1024 bytes).

      • -c

        Specifies the encoding format of the local data file. By default, this parameter is not specified and source data is downloaded.

      • -cf

        Specifies whether a CSV file exists. Default value: False.

      • -cp

        Specifies whether to compress the local data file before you upload it to MaxCompute to reduce network traffic. Default value: True.

      • -dbr

        Specifies whether to omit dirty data, such as additional columns, missing columns, or unmatched types of column data. Default value: False.

        • True: omits all data that does not match the definition of the table.
        • False: An error is returned when dirty data is detected. This ensures that raw data in the table to which you want to upload data is not contaminated.
      • -dfp

        Specifies the format of DATETIME data. The default format is yyyy-MM-dd HH:mm:ss. If you want to specify the DATETIME data that is accurate to the millisecond, the format tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS' can be used. For more information about the DATETIME data type, see Date types.

      • -fd

        Specifies the column delimiter used in the local data file. Default value: comma (,).

      • -h

        Specifies whether the local data file has a table header. Default value: False. If this parameter is set to True, Dship skips the table header and uploads data from the second row.

      • -mbr

        Specifies the tolerable volume of dirty data. When the data volume exceeds the value of this parameter, the upload is stopped. Default value: 1000.

      • -ni

        Specifies the NULL data identifier. Default value: an empty string.

      • -ow
        Specifies whether the uploaded data overwrites the table or partition. The default value is False, indicating that data is uploaded through accumulation. Example:
        tunnel upload -overwrite true log.txt test_project.test_table/p1="b1",p2="b2";
      • -rd

        Specifies the row delimiter used in the local data file. Default value: \r\n.

      • -s
        Specifies whether to scan the local data file. Default value: True.
        • True: The system scans the data and starts to import it only if it is in the correct format.
        • False: The system imports data without scanning.
        • Only: The system only scans the data. The data is not imported after the scan.
      • -sd

        Clears the session directory.

      • -ss

        Specifies the strict schema mode. Default value: True. If the parameter is set to False, extra data is discarded and unspecified fields are filled with NULL.

      • -t

        Specifies the number of threads. Default value: 1.

      • -te

        Specifies the endpoint of Tunnel.

      • -time

        Specifies whether the upload time is tracked. Default value: False.

      • -tz

        Specifies the time zone. Default value: local time zone, for example, Asia/Shanghai.

Show

  • Function

    Displays historical records.

  • Syntax
    odps@ project_name>tunnel help show;
    usage: tunnel show history [options]
                  show session information
     -n,-number <ARG>   lines
    Example:
           tunnel show history -n 5
           tunnel show log
  • Parameter

    -n: specifies the number of rows that you want to display.

  • Example
    odps@ project_name>tunnel show history;
    20150610xxxxxxxxxxx70a002ec60c  failed  'u --config-file /D:/console/conf/odps_config.ini --project odpstest_ay52c_ay52 --endpoint http://service.odps.aliyun.com/api --id UlxxxxxxxxxxxrI1 --key 2m4r3WvTxxxxxxxxxx0InVke7UkvR d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'
    Note In the preceding example, 20150610xxxxxxxxxxx70a002ec60c indicates the ID of the session for which the upload failed.

Resume

  • Function

    Resumes the execution of historical operations. Only data uploads can be resumed.

  • Syntax
    odps@  project_name>tunnel help resume;
    usage: tunnel resume [session_id] [-force]
                  resume an upload session
     -f,-force   force resume
    Example:
           tunnel resume
  • Parameters
    • session_id

      Specifies the ID of the session for which the upload failed. It is a required parameter.

    • -f

      Specifies whether to forcibly resume the execution of historical operations. By default, this parameter is omitted.

  • Example
    Run the following command to resume the session for which the upload failed. In this command, 20150610xxxxxxxxxxx70a002ec60c indicates the ID of the session for which the upload failed.
    odps@ project_name>tunnel resume 20150610xxxxxxxxxxx70a002ec60c -force;
    start resume
    20150610xxxxxxxxxxx70a002ec60c
    Upload session: 20150610xxxxxxxxxxx70a002ec60c
    Start upload:d:\data.txt
    Resume 1 blocks 
    2015-06-10 16:46:42     upload block: '1'
    2015-06-10 16:46:42     upload block complete, blockid=1
    upload complete, average speed is 0 KB/s
    OK

Download

  • Function

    Downloads MaxCompute table data or the execution result of a specific instance to a local directory.

  • Syntax
    odps@ project_name>tunnel help download;
    usage: tunnel download [options] <[project.]table[/partition]> <path>
    
                  download data to local file
     -c,-charset <ARG>                 specify file charset, default ignore.
                                       set ignore to download raw data
     -cf,-csv-format <ARG>             use csv format (true|false), default
                                       false. When uploading in csv format,
                                       file splitting not supported.
     -ci,-columns-index <ARG>          specify the columns index(starts from
                                       0) to download, use comma to split each
                                       index
     -cn,-columns-name <ARG>           specify the columns name to download,
                                       use comma to split each name
     -cp,-compress <ARG>               compress, default true
     -dfp,-date-format-pattern <ARG>   specify date format pattern, default
                                       yyyy-MM-dd HH:mm:ss
     -e,-exponential <ARG>             When download double values, use
                                       exponential express if necessary.
                                       Otherwise at most 20 digits will be
                                       reserved. Default false
     -fd,-field-delimiter <ARG>        specify field delimiter, support
                                       unicode, eg \u0001. default ","
     -h,-header <ARG>                  if local file should have table header,
                                       default false
        -limit <ARG>                   specify the number of records to
                                       download
     -ni,-null-indicator <ARG>         specify null indicator string, default
                                       ""(empty string)
     -rd,-record-delimiter <ARG>       specify record delimiter, support
                                       unicode, eg \u0001. default "\r\n"
     -sd,-session-dir <ARG>            set session dir, default
                                       D:\software\odpscmd_public\plugins\dship
     -t,-threads <ARG>                 number of threads, default 1
     -te,-tunnel_endpoint <ARG>        tunnel endpoint
     -time,-time <ARG>                 keep track of upload/download elapsed
                                       time or not. Default false
     -tz,-time-zone <ARG>              time zone, default local timezone:
                                       Asia/Shanghai
    usage: tunnel download [options] instance://<[project/]instance_id> <path>
    
                  download instance result to local file
     -c,-charset <ARG>                 specify file charset, default ignore.
                                       set ignore to download raw data
     -cf,-csv-format <ARG>             use csv format (true|false), default
                                       false. When uploading in csv format,
                                       file splitting not supported.
     -ci,-columns-index <ARG>          specify the columns index(starts from
                                       0) to download, use comma to split each
                                       index
     -cn,-columns-name <ARG>           specify the columns name to download,
                                       use comma to split each name
     -cp,-compress <ARG>               compress, default true
     -dfp,-date-format-pattern <ARG>   specify date format pattern, default
                                       yyyy-MM-dd HH:mm:ss
     -e,-exponential <ARG>             When download double values, use
                                       exponential express if necessary.
                                       Otherwise at most 20 digits will be
                                       reserved. Default false
     -fd,-field-delimiter <ARG>        specify field delimiter, support
                                       unicode, eg \u0001. default ","
     -h,-header <ARG>                  if local file should have table header,
                                       default false
        -limit <ARG>                   specify the number of records to
                                       download
     -ni,-null-indicator <ARG>         specify null indicator string, default
                                       ""(empty string)
     -rd,-record-delimiter <ARG>       specify record delimiter, support
                                       unicode, eg \u0001. default "\r\n"
     -sd,-session-dir <ARG>            set session dir, default
                                       D:\software\odpscmd_public\plugins\dshi
     -t,-threads <ARG>                 number of threads, default 1
     -te,-tunnel_endpoint <ARG>        tunnel endpoint
     -time,-time <ARG>                 keep track of upload/download elapsed
                                       time or not. Default false
     -tz,-time-zone <ARG>              time zone, default local timezone:
                                       Asia/Shanghai
    Example:
        tunnel download test_project.test_table/p1="b1",p2="b2" log.txt //Download data from a specific table.
        tunnel download instance://test_project/test_instance log.txt   //Download the execution result of a specific instance.
  • Parameters
    • Required parameters
      • path

        Specifies the local path in which the downloaded data file is saved.

      • [project.]table[/partition]

        Specifies the name of the table from which you want to download data. You must specify the last-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.

      • [project/]instance_id

        Specifies the ID of the instance. You must specify this parameter to download the execution result of a specific instance.

    • Optional parameters
      • -c

        Specifies the encoding format of local data files. By default, this parameter is omitted.

      • -cf

        Specifies whether the CSV file exists. Default value: False.

      • -ci

        Specifies that column indexes (starting from 0) are downloaded. Separate column indexes with commas (,).

      • -cn

        Specifies the column names to download. Separate column names with commas (,).

      • -cp

        Specifies whether to compress the object before your download to reduce network traffic. Default value: True.

      • -dfp

        Specifies the data format of the DATETIME type. The default format is yyyy-MM-dd HH:mm:ss.

      • -e

        Specifies that the data of the DOUBLE type you want to download is represented by an exponential function if required. If the data is not represented by an exponential function, a maximum of 20 bits are retained. Default value: False.

      • -fd

        Specifies the column delimiter for the local data file. Default value: comma (,).

      • -h
        Specifies whether table headers in the data file are downloaded. Default value: False. If this parameter is set to True, the data from the second row in the file is downloaded. Table headers are not downloaded.
        Note -h=true and threads>1: specify that multiple threads cannot be used together.
      • -limit

        Specifies the number of files you want to download.

      • -ni

        Specifies the NULL data identifier. Default value: an empty string.

      • -rd

        Specifies the row delimiter used in the local data file. Default value: \r\n.

      • -sd

        Specifies the session directory.

      • -t

        Specifies the number of threads. Default value: 1.

      • -te

        Specifies the endpoint of Tunnel.

      • -time

        Specifies whether the download time is tracked. Default value: False.

      • -tz

        Specifies the time zone. Default value: local time zone, for example, Asia/Shanghai.

Purge

  • Function

    Clears the session directory.

  • Syntax
    odps@ project_name>tunnel help purge;
    usage: tunnel purge [n]
                  force session history to be purged.([n] days before, default
                  3 days)
    Example:
           tunnel purge 5
  • Parameter

    n: specifies after how many days historical log are cleared. Default value: 3.

Notes

  • The following table describes the data types.
    Type Description
    STRING The character string that supports a maximum length of 8 MB.
    BOOLEAN For file uploads, the value can be True, False, 0, or 1. For file downloads, the value can be True or False. The value is not case-sensitive.
    BIGINT Value range: [-9223372036854775807,9223372036854775807]
    DOUBLE
    • A 16-digit number.
    • Data of this type is expressed in scientific notations during data uploads.
    • Data of this type is expressed in numbers during data downloads.
    • Maximum value: 1.7976931348623157E308.
    • Minimum value: 4.9E-324.
    • Positive infinity: Infinity.
    • Negative infinity: -Infinity.
    DATETIME By default, data of the DATETIME type can be uploaded when the time zone is GMT+8. You can use command lines to specify the format pattern of this type. If you upload data of this type, you must specify the time format. For more information, see Date types.
    "yyyyMMddHHmmss": for example, "20140209101000".
    "yyyy-MM-dd HH:mm:ss" (default): for example, "2014-02-09 10:10:00".
    "MM/dd/yyyy": for example, "09/01/2014".
    Example:
    tunnel upload log.txt test_table -dfp "yyyy-MM-dd HH:mm:ss"
  • Null: Every data type can have null values.
    • By default, a blank string indicates a null value.
    • In the command line interface (CLI), the -null-indicator parameter is used to specify a null string.
      tunnel upload log.txt test_table -ni "NULL"
  • Encoding format: You can specify the encoding format of the file. Default value: UTF-8.
    tunnel upload log.txt test_table -c "gbk"
  • Delimiter: Tunnel commands support custom file delimiters. -record-delimiter is used to customize row delimiters, and -field-delimiter is used to customize column delimiters.
    • A row or column delimiter can contain multiple characters.
    • A column delimiter cannot contain a row delimiter.
    • Only the following escape character delimiters are supported in the CLI: \r, \n, and \t.
    tunnel upload log.txt test_table -fd "||" -rd "\r\n"