Tunnel commands

Last Updated: May 08, 2018

The Client provides Tunnel commands for you to use the functions of the original Dship tool.

Tunnel commands are mainly used to upload data or download data. They provide the following functions:

  • Upload: Supports file or directory, level-one uploading. Data can only be uploaded to a single table or table partition each time. For partitioned tables, the destination partition must be specified.

    1. tunnel upload log.txt test_project.test_table/p1="b1",p2="b2";
    2. -- Uploads data in log.txt to the test_project project's test_table table, partitions: p1="b1",p2="b2".
    3. tunnel upload log.txt test_table --scan=only;
    4. -- Uploads data from log.txt to the test_table table. --The scan parameter indicates that the data in log.txt must be scanned to determine if it complies with the test_table definitions. If it does not, the system reports an error and the upload is stopped.
  • Download: You can only download data to a single file. Only data in one table or partition can be downloaded to one file each time. For partitioned tables, the source partition must be specified.

    1. tunnel download test_project.test_table/p1="b1",p2="b2" test_table.txt;
    2. -- Download data from the table to the test_table.txt file.
  • Resume: If an error occurs due to the network or the Tunnel service, you can resume transmission of the file or directory after interruption. This command allows you to resume a previous data upload operation, but does not support download operations.

    1. tunnel resume;
  • Show: Displays the performed commands.

    1. tunnel show history -n 5;
    2. --Displays details for the last five data upload/download commands.
    3. tunnel show log;
    4. --Displays the log for the last data upload/download.
  • Purge: Clears the session directory. By default, use this command to clear information of the last three days.

    1. tunnel purge 5;
    2. --Clears logs from the previous five days.

Use of Tunnel command

Tunnel commands allows you to obtain help information using the Help subcommand on the client. Each command and selection supports short command format:

  1. odps@ project_name>tunnel help;
  2. Usage: tunnel <subcommand> [options] [args]
  3. Type 'tunnel help <subcommand>' for help on a specific subcommand.
  4. Available subcommands:
  5. upload (u)
  6. download (d)
  7. resume (r)
  8. show (s)
  9. purge (p)
  10. help (h)
  11. tunnel is a command for uploading data to / downloading data from ODPS.

Parameters:

  • upload: Uploads data to a MaxCompute table.

  • download: Downloads data from a MaxCompute table.

  • resume: If data fails to be uploaded, use the Resume command to resume the upload from where it was interrupted. This command cannot be used for download operations. Each data upload or download operation is called a session. Run the Resume command and specify the session ID to be resumed.

  • show: Displays the performed operations.

  • purge: Clears the session directory.

  • help: Outputs Tunnel help information.

Upload

The upload command can import data from a local file to a MaxCompute table in append mode. The subcommands are used as follows:

  1. odps@ project_name>tunnel help upload;
  2. usage: tunnel upload [options] <path> <[project.]table[/partition]>
  3. upload data from local file
  4. -acp,-auto-create-partition <ARG> auto create target partition if not
  5. exists, default false
  6. -bs,-block-size <ARG> block size in MiB, default 100
  7. -c,-charset <ARG> specify file charset, default ignore.
  8. set ignore to download raw data
  9. -cp,-compress <ARG> compress, default true
  10. -dbr,-discard-bad-records <ARG> specify discard bad records
  11. action(true|false), default false
  12. -dfp,-date-format-pattern <ARG> specify date format pattern, default
  13. yyyy-MM-dd HH:mm:ss
  14. -fd,-field-delimiter <ARG> specify field delimiter, support
  15. unicode, eg \u0001. default ","
  16. -h,-header <ARG> if local file must have table
  17. header, default false
  18. -mbr,-max-bad-records <ARG> max bad records, default 1000
  19. -ni,-null-indicator <ARG> specify null indicator string,
  20. default ""(empty string)
  21. -rd,-record-delimiter <ARG> specify record delimiter, support
  22. unicode, eg \u0001. default "\r\n"
  23. -s,-scan <ARG> specify scan file
  24. action(true|false|only), default true
  25. -sd,-session-dir <ARG> set session dir, default
  26. D:\software\odpscmd_public\plugins\ds
  27. hip
  28. -ss,-strict-schema <ARG> specify strict schema mode. If false,
  29. extra data is abandoned and
  30. insufficient field is filled
  31. with null. Default true
  32. -te,-tunnel_endpoint <ARG> tunnel endpoint
  33. -threads <ARG> number of threads, default 1
  34. -tz,-time-zone <ARG> time zone, default local timezone:
  35. Asia/Shanghai
  36. Example:
  37. tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"

Parameters:

  • -acp: Determines if the operation automatically creates the destination partition if it does not exist. This one is disabled by default.

  • -bs: Specifies the size of each data block uploaded using Tunnel. Default value: 100 MiB (MiB=1024*1024 B).

  • -c: Specifies the local data file encoding. Default value: UTF-8. If this is not set, the source data is downloaded by default.

  • -cp: Determines whether the local file is compressed before being uploaded, reducing traffic usage. This one is enabled by default.

  • -dbr: Determines whether to ignore corrupted data (such as extra or missing columns and mismatched column data types).

    • When this parameter is set to ‘true’, all data conflicting with the table definitions is ignored.

    • When the parameter is set to ‘false’, the system displays error messages in case of corrupted data, but the raw data in the destination table is not contaminated.

  • -dfp: Specifies the format of DateTime data. Default value: yyyy-MM-dd HH:mm:ss.

  • -fd: Specifies the column delimiter of the local data file. The default value is comma.

  • -h: Determines whether the data file contains the header. If it is set to ‘true’, Dship skips the header and starts uploading from the second row.

  • -mbr: By default, if more than 1,000 rows of corrupted data uploaded, the upload is terminated. This parameter allows you to adjust the tolerated volume of corrupted data.

  • -ni: Specifies the NULL data identifier. Default value: “ “(blank string).

  • -rd: Specifies the row delimiter of the local data file. Default value: \r\n.

  • -s: Determines whether to scan the local data file. Default value: false.

    • If set to ‘true’, the system scans the data first, and then imports the data if the format is correct.

    • If set to ‘false’, the system imports the data directly without scanning.

    • If set to ‘only’, the system only scans the local data, and does not import data after scanning.

  • -sd: Sets the session directory.

  • -te: Specifies the tunnel endpoint.

  • -threads: Specifies the number of threads. Default value: 1.

  • -tz: Specifies the time zone. The default value is the local time zone: Asia/Shanghai.

Example:

  • To create a destination table:

    1. CREATE TABLE IF NOT EXISTS sale_detail(
    2. shop_name STRING,
    3. customer_id STRING,
    4. total_price DOUBLE)
    5. PARTITIONED BY (sale_date STRING,region STRING);
  • To add a partition:

    1. alter table sale_detail add partition (sale_date='201312', region='hangzhou');
  • Prepare the data file data.txt with the following content:

    1. shop9,97,100
    2. shop10,10,200
    3. shop11,11

    The third row of this file does not comply with the definitions of the sale_detail table. Three columns are defined by sale_detail, but this row only has two.

  • To import data:

    1. odps@ project_name>tunnel u d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false
    2. Upload session: 201506101639224880870a002ec60c
    3. Start upload:d:\data.txt
    4. Total bytes:41 Split input to 1 blocks
    5. 2015-06-10 16:39:22 upload block: '1'
    6. ERROR: column mismatch -,expected 3 columns, 2 columns found, please check data or delimiter

    Because data.txt contains corrupted data, data import fails. The system displays the session ID and error message.

  • To verify data:

    1. odps@ odpstest_ay52c_ay52> select * from sale_detail where sale_date='201312';
    2. ID = 20150610084135370gyvc61z5
    3. +-----------+-------------+-------------+-----------+--------+
    4. | shop_name | customer_id | total_price | sale_date | region |
    5. +-----------+-------------+-------------+-----------+--------+
    6. +-----------+-------------+-------------+-----------+--------+

    Because there is corrupted data, data import fails, and the table contains no data.

Show

The show command is used to display the performed operations. The subcommands are used as follows:

  1. odps@ project_name>tunnel help show;
  2. usage: tunnel show history [options]
  3. show session information
  4. -n,-number <ARG> lines
  5. Example:
  6. tunnel show history -n 5
  7. tunnel show log

Parameter:

-n: Specifies the number of rows to display.

Example:

  1. odps@ project_name>tunnel show history;
  2. 201506101639224880870a002ec60c failed 'u --config-file /D:/console/conf/odps_config.ini --project odpstest_ay52c_ay52 --endpoint http://service.odps.aliyun.com/api --id UlVxOHuthHV1QrI1 --key 2m4r3WvTZbsNJjybVXj0InVke7UkvR d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'

Note:

201506101639224880870a002ec60c is the session ID of the failed data importing in the previous section.

Resume

Resumes operations on the performed records but is only valid for data uploading. The subcommands are used as follows:

  1. odps@ project_name>tunnel help resume;
  2. usage: tunnel resume [session_id] [-force]
  3. resume an upload session
  4. -f,-force force resume
  5. Example:
  6. tunnel resume

Example:

Modify the data.txt file as follows:

  1. shop9,97,100
  2. shop10,10,200

To resume data uploading:

  1. odps@ project_name>tunnel resume 201506101639224880870a002ec60c --force;
  2. start resume
  3. 201506101639224880870a002ec60c
  4. Upload session: 201506101639224880870a002ec60c
  5. Start upload:d:\data.txt
  6. Resume 1 blocks
  7. 2015-06-10 16:46:42 upload block: '1'
  8. 2015-06-10 16:46:42 upload block complete, blockid=1
  9. upload complete, average speed is 0 KB/s
  10. OK

Note:

201506101639224880870a002ec60c is session ID of the failed uploading.

To verify data:

  1. odps@ project_name>select * from sale_detail where sale_date='201312';
  2. ID = 20150610084801405g0a741z5
  3. +-----------+-------------+-------------+-----------+--------+
  4. | shop_name | customer_id | total_price | sale_date | region |
  5. +-----------+-------------+-------------+-----------+--------+
  6. | shop9 | 97 | 100.0 | 201312 | hangzhou |
  7. | shop10 | 10 | 200.0 | 201312 | hangzhou |
  8. +-----------+-------------+-------------+-----------+--------+

Download

The subcommands are used as follows:

  1. odps@ project_name>tunnel help download;
  2. usage: tunnel download [options] <[project.]table[/partition]> <path>
  3. download data to local file
  4. -c,-charset <ARG> specify file charset, default ignore.
  5. set ignore to download raw data
  6. -ci,-columns-index <ARG> specify the columns index(starts from
  7. 0) to download, use comma to split each
  8. index
  9. -cn,-columns-name <ARG> specify the columns name to download,
  10. use comma to split each name
  11. -cp,-compress <ARG> compress, default true
  12. -dfp,-date-format-pattern <ARG> specify date format pattern, default
  13. yyyy-MM-dd HH:mm:ss
  14. -e,-exponential <ARG> When download double values, use
  15. exponential express if necessary.
  16. Otherwise at most 20 digits are
  17. reserved. Default false
  18. -fd,-field-delimiter <ARG> specify field delimiter, support
  19. unicode, eg \u0001. default ","
  20. -h,-header <ARG> if local file must have table header,
  21. default false
  22. -limit <ARG> specify the number of records to
  23. download
  24. -ni,-null-indicator <ARG> specify null indicator string, default
  25. ""(empty string)
  26. -rd,-record-delimiter <ARG> specify record delimiter, support
  27. unicode, eg \u0001. default "\r\n"
  28. -sd,-session-dir <ARG> set session dir, default
  29. D:\software\odpscmd_public\plugins\dshi
  30. p
  31. -te,-tunnel_endpoint <ARG> tunnel endpoint
  32. -threads <ARG> number of threads, default 1
  33. -tz,-time-zone <ARG> time zone, default local timezone:
  34. Asia/Shanghai
  35. usage: tunnel download [options] instance://<[project/]instance_id> <path>
  36. download instance result to local file
  37. -c,-charset <ARG> specify file charset, default ignore.
  38. set ignore to download raw data
  39. -ci,-columns-index <ARG> specify the columns index(starts from
  40. 0) to download, use comma to split each
  41. index
  42. -cn,-columns-name <ARG> specify the columns name to download,
  43. use comma to split each name
  44. -cp,-compress <ARG> compress, default true
  45. -dfp,-date-format-pattern <ARG> specify date format pattern, default
  46. yyyy-MM-dd HH:mm:ss
  47. -e,-exponential <ARG> When download double values, use
  48. exponential express if necessary.
  49. Otherwise at most 20 digits are
  50. reserved. Default false
  51. -fd,-field-delimiter <ARG> specify field delimiter, support
  52. unicode, eg \u0001. default ","
  53. -h,-header <ARG> if local file must have table header,
  54. default false
  55. -limit <ARG> specify the number of records to
  56. download
  57. -ni,-null-indicator <ARG> specify null indicator string, default
  58. ""(empty string)
  59. -rd,-record-delimiter <ARG> specify record delimiter, support
  60. unicode, eg \u0001. default "\r\n"
  61. -sd,-session-dir <ARG> set session dir, default
  62. D:\software\odpscmd_public\plugins\dshi
  63. p
  64. -te,-tunnel_endpoint <ARG> tunnel endpoint
  65. -threads <ARG> number of threads, default 1
  66. -tz,-time-zone <ARG> time zone, default local timezone:
  67. Asia/Shanghai
  68. Example:
  69. tunnel download test_project.test_table/p1="b1",p2="b2" log.txt
  70. tunnel download instance://test_project/test_instance log.txt

Parameter description:

  • -c: Specifies the local data file encoding. Default value: UTF-8.

  • -ci: Specifies the column index (starts from 0) for downloading. Separate multiple entries with commas(,).

  • -cn: Specifies the names of the columns to download. Separate multiple entries with commas(,).

  • -cp, —compress: Determines whether the data is compressed before being downloaded, reducing traffic usage. It is enabled by default.

  • -dfp: Specifies the format of DateTime data. Default value: yyyy-MM-dd HH:mm:ss.

  • -e: When downloading Double type data, you can use this parameter to express the values as exponential functions. Otherwise, a maximum of 20 digits can be retained.

  • -fd: Specifies the column delimiter of the local data file. The default value is comma.

  • -h: Determines whether the data file contains the header. If set to ‘true’, Dship skips the header and starts downloading from the second row.

  • -limit: Specifies the number of files to download.

  • -ni: Specifies the NULL data identifier. Default value: “ “(blank string).

  • -rd: Specifies the row delimiter of the local data file. Default value: \r\n.

  • -sd: Sets the session directory.

  • -te: Specifies the tunnel endpoint.

  • -threads: Specifies the number of threads. Default value: 1.

  • -tz: Specifies the time zone. The default value is the local time zone: Asia/Shanghai.

Example:

To download data to the result.txt file:

  1. $ ./tunnel download sale_detail/sale_date=201312,region=hangzhou result.txt;
  2. Download session: 201506101658245283870a002ed0b9
  3. Total records: 2
  4. 2015-06-10 16:58:24 download records: 2
  5. 2015-06-10 16:58:24 file size: 30 bytes
  6. OK

To verify the content of the result.txt file:

  1. shop9,97,100.0
  2. shop10,10,200.0

Purge

The purge command can clear the session directory. By default, use it to clear data of the last three days. The subcommands are used as follows:

  1. odps@ project_name>tunnel help purge;
  2. usage: tunnel purge [n]
  3. force session history to be purged.([n] days before, default
  4. 3 days)
  5. Example:
  6. tunnel purge 5

Data types:

Type Description
STRING String type data. The length cannot exceed 8 MB.
BOOLEN Only the following values are supported for uploading: “true”, “false”, “0”, and “1”. Only the values true/false (not case-sensitive) are supported for downloading.
BIGINT Value range: [-9223372036854775807, 9223372036854775807].
DOUBLE 1. Supports 16 significant digits.
2. Supports expression in scientific notation for uploading.
3. Support only numerical expression for downloading.
4. Maximum value: 1.7976931348623157E308.
5. Minimum value: 4.9E-324.
6. Positive infinity: Infinity.
7. Negative infinity: -Infinity.
DATETIME By default, Datetime data supports the UTC+8 time zone for data uploading. You can use the command to specify the format pattern for the date in your data.

If you upload DATETIME type data, you must specify the time and date format. For more information on specific formats, see SimpleDateFormat.

  1. "yyyyMMddHHmmss": data format "20140209101000"
  2. "yyyy-MM-dd HH:mm:ss" (default): data format "2014-02-09 10:10:00"
  3. "MM/dd/yyyy": data format "09/01/2014"

Example:

  1. tunnel upload log.txt test_table -dfp "yyyy-MM-dd HH:mm:ss"

Null: All data types can be Null.

  • By default, a blank string indicates a Null value.

  • The parameter ‘-null-indicator’ can be used at the command line to specify the Null string.

    1. tunnel upload log.txt test_table -ni "NULL"

Character encoding: You can specify the character encoding of the file. Default value: UTF-8.

  1. tunnel upload log.txt test_table -c "gbk"

Delimiter: The Tunnel commands allow support custom file delimiters. The row delimiter is ‘-record-delimiter’, and the column delimiter is ‘-field-delimiter’.

Description:

  • Row and column delimiters of multiple characters are supported.

  • A column delimiter cannot contain the row delimiter.

  • Only the follow escape character delimiters are supported at the command line: \r, \n, and \t.

Example:

  1. tunnel upload log.txt test_table -fd "||" -rd "\r\n"
Thank you! We've received your feedback.