MaxCompute allows you to run Tunnel commands to upload and download data. This topic provides detailed instructions on Tunnel commands that are used to upload and download data.

Command description

  • Syntax
    odps@ project_name>tunnel help;
        Usage: tunnel <subcommand> [options] [args]
        Type 'tunnel help <subcommand>' for help on a specific subcommand.
    Available subcommands:
        upload (u)
        download (d)
        resume (r)
        show (s)
        purge (p)
        help (h)
    tunnel is a command for uploading data to / downloading data from ODPS.
  • Parameters
    • UPLOAD: uploads data to a MaxCompute table. You can upload files or level-1 directories to only one table or partition in a table each time. For a partitioned table, you must specify the partition to which you want to upload data. For a multi-level partitioned table, you must specify a lowest-level partition.
      -- Upload data in the log.txt file to the p1="b1" and p2="b2" partitions of the test_table table that has two levels of partitions in the test_project project. The log.txt file is saved in the bin directory of the MaxCompute client. 
      tunnel upload log.txt test_project.test_table/p1="b1",p2="b2";
      
      -- Upload data in the log.txt file to the test_table table. The scan parameter is used to check whether data in the log.txt file complies with the schema of the test_table table. If it does not, the system reports an error and stops the upload. 
      tunnel upload  log.txt  test_table --scan=true;
      
      -- Upload data in the log.txt file from another directory to the p1="b1" and p2="b2" partitions of the test_table table that has two levels of partitions in the test_project project. 
      tunnel upload D:\test\log.txt test_project.test_table/p1="b1",p2="b2";
    • DOWNLOAD: downloads data from a MaxCompute table. You can download data only from one table or partition to a single file each time. For a partitioned table, you must specify the partition from which you want to download data. For a multi-level partitioned table, you must specify a lowest-level partition.
      -- Download data from the test_project.test_table table, which has two levels of partitions, to the test_table.txt file in the bin directory of the MaxCompute client. 
      tunnel download  test_project.test_table/p1="b1",p2="b2"  test_table.txt;
      -- Download the data from the test_project.test_table table, which has two levels of partitions, to the test_table.txt file in another directory. 
      tunnel download  test_project.test_table/p1="b1",p2="b2"  D:\test\test_table.txt;
    • RESUME: resumes the transfer of files or directories. The transfer is interrupted because your network is disconnected or Tunnel is faulty. You can use this command to resume only data uploads. One data download or upload is referred to as a session. You must specify the session ID in the RESUME command before you run this command.
      tunnel resume;
    • SHOW: displays historical task information.
      -- Display the commands used in the last five data uploads or downloads. 
      tunnel show history -n 5;
      -- Display the logs of the last data upload or download. 
      tunnel show log;
    • PURGE: clears the session directory. Sessions from the last three days are cleared by default.
      --Clear logs from the last five days. 
      tunnel purge 5;
    • help: obtains help information.

Upload

  • Description

    Uploads local data to a MaxCompute table in append mode.

  • Syntax
    odps@ project_name>tunnel help upload;
    usage: tunnel upload [options] <path> <[project.]table[/partition]>
    
                  upload data from local file
     -acp,-auto-create-partition <ARG>   auto create target partition if not
                                         exists, default false
     -bs,-block-size <ARG>               block size in MiB, default 100
     -c,-charset <ARG>                   specify file charset, default ignore.
                                         set ignore to download raw data
     -cf,-csv-format <ARG>               use csv format (true|false), default
                                         false. When uploading in csv format,
                                         file splitting not supported.
     -cp,-compress <ARG>                 compress, default true
     -dbr,-discard-bad-records <ARG>     specify discard bad records
                                         action(true|false), default false
     -dfp,-date-format-pattern <ARG>     specify date format pattern, default
                                         yyyy-MM-dd HH:mm:ss
     -fd,-field-delimiter <ARG>          specify field delimiter, support
                                         unicode, eg \u0001. default ","
     -h,-header <ARG>                    if local file should have table
                                         header, default false
     -mbr,-max-bad-records <ARG>         max bad records, default 1000
     -ni,-null-indicator <ARG>           specify null indicator string,
                                         default ""(empty string)
     -ow,-overwrite <true | false>       overwrite specified table or
                                         partition, default: false
     -rd,-record-delimiter <ARG>         specify record delimiter, support
                                         unicode, eg \u0001. default "\r\n"
     -s,-scan <ARG>                      specify scan file
                                         action(true|false|only), default true
     -sd,-session-dir <ARG>              set session dir, default
                                         D:\software\odpscmd_public\plugins\dship
     -ss,-strict-schema <ARG>            specify strict schema mode. If false,
                                         extra data will be abandoned and
                                         insufficient field will be filled
                                         with null. Default true
     -t,-threads <ARG>                   number of threads, default 1
     -te,-tunnel_endpoint <ARG>          tunnel endpoint
     -time,-time <ARG>                   keep track of upload/download elapsed
                                         time or not. Default false
     -tz,-time-zone <ARG>                time zone, default local timezone:
                                         Asia/Shanghai
    Example:
        tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"
  • Parameters
    • Required parameters
      • path

        Specifies the path and name of the file that you want to upload.

        You can save data files in the bin directory of the MaxCompute client. In this case, you must set path to a value in the File name.File name extension format. You can also save data files to another directory, such as the test folder in drive D. In this case, you must set path to a value in the D:\test\File name.File name extension format.

      • [project.]table[/partition]

        Specifies the name of the table to which you want to upload data. You must specify a lowest-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.

    • Optional parameters
      • -acp

        Specifies the partition to which you want to upload data. If the specified partition does not exist, a partition is automatically created. Default value: False.

      • -bs

        Specifies the size of the data block uploaded by Tunnel each time. Default value: 100 MiB (1 MiB = 1024 × 1024 bytes).

      • -c

        Specifies the encoding format of the data file. By default, this parameter is not specified and raw data is downloaded.

      • -cf

        Specifies whether the file is a CSV file. Default value: False.

        Note Only TXT and CSV files can be uploaded. By default, TXT files are uploaded. If you want to upload a CSV file, you must specify the -cf parameter and download the latest version of the MaxCompute client.
      • -cp

        Specifies whether to compress the local data file before you upload it to MaxCompute to reduce network traffic. Default value: True.

      • -dbr

        Specifies whether to omit dirty data, such as additional columns, missing columns, or unmatched types of column data. Default value: False.

        • True: omits all data that does not match the definition of the table.
        • False: An error is returned if dirty data is detected. This ensures that raw data in the table to which you want to upload data is not contaminated.
      • -dfp

        Specifies the format of DATETIME data. The default format is yyyy-MM-dd HH:mm:ss. If you want to specify DATETIME data that is accurate to the millisecond, the format tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS' can be used. For more information about the DATETIME data type, see Data type editions.

      • -fd

        Specifies the column delimiter used in the local data file. Default value: comma (,).

      • -h

        Specifies whether the data file has a table header. Default value: False. If this parameter is set to True, Dship skips the table header and uploads data from the second row.

      • -mbr

        Specifies the maximum number of dirty data records allowed. If the number of dirty data records exceeds the value of this parameter, the upload stops. Default value: 1000.

      • -ni

        Specifies the NULL data identifier. Default value: an empty string.

      • -ow
        Specifies whether the uploaded data overwrites the table or partition. The default value is False. This indicates that data is uploaded in append mode. The following code shows an example:
        tunnel upload -overwrite true log.txt test_project.test_table/p1="b1",p2="b2";
      • -rd

        Specifies the row delimiter used in the local data file. Default value: \r\n.

      • -s
        Specifies whether to scan the local data file. Default value: True.
        • True: The system scans the data and starts to import the data only if it is in the correct format.
        • False: The system imports data without scanning.
        • Only: The system only scans the data. The data is not imported after the scan.
      • -sd

        Specifies the session directory.

      • -ss

        Specifies the strict schema mode. Default value: True. If the parameter is set to False, extra data is discarded and the fields that are not specified are filled with NULL.

      • -t

        Specifies the number of threads. Default value: 1.

      • -te

        Specifies the endpoint of Tunnel.

      • -time

        Specifies whether the upload time is tracked. Default value: False.

      • -tz

        The time zone. Default value: local time zone, for example, Asia/Shanghai.

Show

  • Description

    Displays historical records. By default, 500 data records are saved.

  • Syntax
    odps@ project_name>tunnel help show;
    usage: tunnel show history [options]
                  show session information
     -n,-number <ARG>   lines
    Example:
           tunnel show history -n 5
           tunnel show log
  • Parameters

    -n: specifies the number of rows that you want to display.

  • Example
    odps@ project_name>tunnel show history;
    20150610xxxxxxxxxxx70a002ec60c  failed  'u --config-file /D:/console/conf/odps_config.ini --project odpstest_ay52c_ay52 --endpoint http://service.<regionid>.maxcompute.aliyun.com/api --id UlxxxxxxxxxxxrI1 --key 2m4r3WvTxxxxxxxxxx0InVke7UkvR d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'
    Note In the preceding example, 20150610xxxxxxxxxxx70a002ec60c indicates the ID of the session for which the upload failed.

Resume

  • Description

    Resumes the execution of historical operations. Only data uploads can be resumed.

  • Syntax
    odps@  project_name>tunnel help resume;
    usage: tunnel resume [session_id] [-force]
                  resume an upload session
     -f,-force   force resume
    Example:
           tunnel resume
  • Parameters
    • session_id

      Specifies the ID of the session for which the upload failed. It is a required parameter.

    • -f

      Specifies whether to forcibly resume the execution of historical operations. This parameter is omitted by default.

  • Example
    Run the following command to resume the session for which the upload failed. In this command, 20150610xxxxxxxxxxx70a002ec60c indicates the ID of the session for which the upload failed.
    odps@ project_name>tunnel resume 20150610xxxxxxxxxxx70a002ec60c -force;
    start resume
    20150610xxxxxxxxxxx70a002ec60c
    Upload session: 20150610xxxxxxxxxxx70a002ec60c
    Start upload:d:\data.txt
    Resume 1 blocks 
    2015-06-10 16:46:42     upload block: '1'
    2015-06-10 16:46:42     upload block complete, blockid=1
    upload complete, average speed is 0 KB/s
    OK

Download

  • Description

    Downloads MaxCompute table data or the execution result of a specific instance to a local directory.

  • Syntax
    odps@ project_name>tunnel help download;
    usage: tunnel download [options] <[project.]table[/partition]> <path>
    
                  download data to local file
     -c,-charset <ARG>                 specify file charset, default ignore.
                                       set ignore to download raw data
     -cf,-csv-format <ARG>             use csv format (true|false), default
                                       false. When uploading in csv format,
                                       file splitting not supported.
     -ci,-columns-index <ARG>          specify the columns index(starts from
                                       0) to download, use comma to split each
                                       index
     -cn,-columns-name <ARG>           specify the columns name to download,
                                       use comma to split each name
     -cp,-compress <ARG>               compress, default true
     -dfp,-date-format-pattern <ARG>   specify date format pattern, default
                                       yyyy-MM-dd HH:mm:ss
     -e,-exponential <ARG>             When download double values, use
                                       exponential express if necessary.
                                       Otherwise at most 20 digits will be
                                       reserved. Default false
     -fd,-field-delimiter <ARG>        specify field delimiter, support
                                       unicode, eg \u0001. default ","
     -h,-header <ARG>                  if local file should have table header,
                                       default false
        -limit <ARG>                   specify the number of records to
                                       download
     -ni,-null-indicator <ARG>         specify null indicator string, default
                                       ""(empty string)
     -rd,-record-delimiter <ARG>       specify record delimiter, support
                                       unicode, eg \u0001. default "\r\n"
     -sd,-session-dir <ARG>            set session dir, default
                                       D:\software\odpscmd_public\plugins\dship
     -t,-threads <ARG>                 number of threads, default 1
     -te,-tunnel_endpoint <ARG>        tunnel endpoint
     -time,-time <ARG>                 keep track of upload/download elapsed
                                       time or not. Default false
     -tz,-time-zone <ARG>              time zone, default local timezone:
                                       Asia/Shanghai
    usage: tunnel download [options] instance://<[project/]instance_id> <path>
    
                  download instance result to local file
     -c,-charset <ARG>                 specify file charset, default ignore.
                                       set ignore to download raw data
     -cf,-csv-format <ARG>             use csv format (true|false), default
                                       false. When uploading in csv format,
                                       file splitting not supported.
     -ci,-columns-index <ARG>          specify the columns index(starts from
                                       0) to download, use comma to split each
                                       index
     -cn,-columns-name <ARG>           specify the columns name to download,
                                       use comma to split each name
     -cp,-compress <ARG>               compress, default true
     -dfp,-date-format-pattern <ARG>   specify date format pattern, default
                                       yyyy-MM-dd HH:mm:ss
     -e,-exponential <ARG>             When download double values, use
                                       exponential express if necessary.
                                       Otherwise at most 20 digits will be
                                       reserved. Default false
     -fd,-field-delimiter <ARG>        specify field delimiter, support
                                       unicode, eg \u0001. default ","
     -h,-header <ARG>                  if local file should have table header,
                                       default false
        -limit <ARG>                   specify the number of records to
                                       download
     -ni,-null-indicator <ARG>         specify null indicator string, default
                                       ""(empty string)
     -rd,-record-delimiter <ARG>       specify record delimiter, support
                                       unicode, eg \u0001. default "\r\n"
     -sd,-session-dir <ARG>            set session dir, default
                                       D:\software\odpscmd_public\plugins\dshi
     -t,-threads <ARG>                 number of threads, default 1
     -te,-tunnel_endpoint <ARG>        tunnel endpoint
     -time,-time <ARG>                 keep track of upload/download elapsed
                                       time or not. Default false
     -tz,-time-zone <ARG>              time zone, default local timezone:
                                       Asia/Shanghai
    Example:
        tunnel download test_project.test_table/p1="b1",p2="b2" log.txt // Download data from a specific table.
        tunnel download instance://test_project/test_instance log.txt   //Download the execution result of a specific instance.
  • Parameters
    • Required parameters
      • path

        Specifies the path in which the downloaded data file is saved.

        You can save data files in the bin directory of the MaxCompute client. In this case, you must set path to a value in the File name.File name extension format. You can also save data files to another directory, such as the test folder in drive D. In this case, you must set path to a value in the D:\test\File name.File name extension format.

      • [project.]table[/partition]

        Specifies the name of the table that you want to download. You must specify a lowest-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.

      • [project/]instance_id

        Specifies the ID of the instance. You must specify this parameter to download the execution result of a specific instance.

    • Optional parameters
      • -c

        Specifies the encoding format of local data files. This parameter is omitted by default.

      • -cf

        Specifies whether the file is a CSV file. Default value: False.

        Note Only TXT and CSV files can be downloaded. By default, TXT files are downloaded. If you want to download a CSV file, you must specify the -cf parameter and download the latest version of the MaxCompute client.
      • -ci

        Specifies the column indexes that you want to download. The column indexes start from 0. Separate the column indexes with commas (,).

      • -cn

        Specifies the names of the columns that you want to download. Separate the column names with commas (,).

      • -cp

        Specifies whether to compress the local file before you download it to reduce network traffic. Default value: True.

      • -dfp

        Specifies the data format of the DATETIME type. The default format is yyyy-MM-dd HH:mm:ss.

      • -e

        Specifies whether the data of the DOUBLE type you want to download is represented by an exponential function if required. If the data is not represented by an exponential function, a maximum of 20 bits are retained. Default value: False.

      • -fd

        Specifies the column delimiter for the local data file. Default value: comma (,).

      • -h
        Specifies whether the data file has a table header. Default value: False. If this parameter is set to True, the data from the second row in the file is downloaded.
        Note -h=true and threads>1 cannot be used together. threads>1 indicates multiple threads.
      • -limit

        Specifies the number of rows to download.

      • -ni

        Specifies the NULL data identifier. Default value: an empty string.

      • -rd

        Specifies the row delimiter used in the local data file. Default value: \r\n.

      • -sd

        Specifies the session directory.

      • -t

        Specifies the number of threads. Default value: 1.

      • -te

        Specifies the endpoint of Tunnel.

      • -time

        Specifies whether the download time is tracked. Default value: False.

      • -tz

        Specifies the time zone. The local time zone is used by default, for example, Asia/Shanghai.

Purge

  • Description

    Clears the session directory.

  • Syntax
    odps@ project_name>tunnel help purge;
    usage: tunnel purge [n]
                  force session history to be purged.([n] days before, default
                  3 days)
    Example:
           tunnel purge 5
  • Parameters

    n: specifies after how many days historical logs are cleared. Default value: 3.

Usage notes

  • The following table describes the data types.
    Type Description
    STRING The character string that supports a maximum length of 8 MB.
    BOOLEAN For file uploads, the value can be True, False, 0, or 1. For file downloads, the value can be True or False. The value is not case-sensitive.
    BIGINT Valid values: [-9223372036854775807,9223372036854775807]
    DOUBLE
    • A 16-digit number.
    • Data of this type is expressed in scientific notations during data uploads.
    • Data of this type is expressed in numbers during data downloads.
    • Maximum value: 1.7976931348623157E308.
    • Minimum value: 4.9E-324.
    • Positive infinity: Infinity.
    • Negative infinity: -Infinity.
    DATETIME By default, data of the DATETIME type can be uploaded when the time zone is UTC+8. You can use command lines to specify the format pattern of this type. If you upload data of this type, you must specify the time format. For more information, see Data type editions.
    "yyyyMMddHHmmss": For example, "20140209101000".
    "yyyy-MM-dd HH:mm:ss" (default format): For example, "2014-02-09 10:10:00".
       
    Example:
    tunnel upload log.txt test_table -dfp "yyyy-MM-dd HH:mm:ss"
  • Null: Every data type can have null values.
    • By default, a blank string indicates a null value.
    • The -null-indicator parameter is used to specify a null string.
      tunnel upload log.txt test_table -ni "NULL"
  • Encoding format: You can specify the encoding format of the file. Default value: UTF-8.
    tunnel upload log.txt test_table -c "gbk"
  • Delimiter: Tunnel commands support custom file delimiters. -record-delimiter is used to customize row delimiters, and -field-delimiter is used to customize column delimiters.
    • A row or column delimiter can contain multiple characters.
    • A column delimiter cannot contain a row delimiter.
    • Only the following escape character delimiters are supported in the command line: \r, \n, and \t.
    tunnel upload log.txt test_table -fd "||" -rd "\r\n"