Tunnel

Last Updated: Sep 28, 2017

In the original version of ODPS, we provide users with data upload and download tool. But in lastest service of ODPS, the user can achieve original functions of Dship tool through Tunnel Commands provided by Client Tool. We hope you can use the new version of the client and its Tunnel functions.

Tunnel commands are used to upload and download data. The main functions include:

  • Upload:it is used to upload files or directories (only tier-1 directories). When being uploaded, data can be uploaded only to a table or a partition at each time. For a table with partitions, data must be uploaded to the specified partition.
  1. tunnel upload log.txt test_project.test_table/p1="b1",p2="b2";
  2. tunnel upload log.txt test_table --scan=only;
  • Download: it is used to download one single file. When being downloaded, data in a table or a partition can be downloaded to a single file at each time. For a table with partition(s), data must be downloaded to the specified partition.
  1. tunnel download test_project.test_table/p1="b1",p2="b2" log.txt;
  • Resume: The dship tool can be used to resume files or directories due to errors arising from the network or tunnel service.
  1. tunnel resume;
  • Show:Show task histroy.
  1. tunnel show history -n 5
  2. tunnel show log
  • Purge: clear the session directories. The default is to clear the directories within 3 days.
  1. tunnel purge 5

Note: Tunnel commands are transplanted from original dship command.

Usage of Tunnel Commands

The help subcommand can be used to obtain help information on ODPS client. Each command and choice can be subject to a short-command format.

  1. odps@ project_name>tunnel help;
  2. Usage: tunnel <subcommand> [options] [args]
  3. Type 'tunnel help <subcommand>' for help on a specific subcommand.
  4. Available subcommands:
  5. upload (u)
  6. download (d)
  7. resume (r)
  8. show (s)
  9. purge (p)
  10. help (h)
  11. tunnel is a command for uploading data to / downloading data from ODPS.

Decription:

  • upload:help user upload data into a table in ODPS.
  • download: help user download data from the table in ODPS.
  • resume: When failing to be uploaded, data can be resumed at the breakpoint by using the resume command. At present, only the data to be uploaded can be resumed. The process for uploading or downloading data at each time is called as a session. After using the resume command, specify the session id to resume the data.
  • show:view historical information of running.
  • purge: clare session directory.
  • help: output tunnel help information.

Upload

Import data in local files into tables in ODPS through an append mode. The tips for using subcommands are showns as follows:

  1. odps@ project_name>tunnel help upload
  2. usage: tunnel upload [options] <path> <[project.]table[/partition]>
  3. upload data from local file
  4. -bs,--block-size <ARG> block size in MiB, default 100
  5. -c,--charset <ARG> specify file charset, default utf8.
  6. set ignore to download raw data
  7. -cp,--compress <ARG> compress, default true
  8. -dbr,--discard-bad-records <ARG> specify discard bad records
  9. action(true|false), default false
  10. -dfp,--date-format-pattern <ARG> specify date format pattern, default
  11. yyyy-MM-dd HH:mm:ss
  12. -e,--endpoint <ARG> odps endpoint
  13. -fd,--field-delimiter <ARG> specify field delimiter, default ","
  14. -h,--header <ARG> if local file should have table
  15. header, default false
  16. -mbr,--max-bad-records <ARG> max bad records, default 1000
  17. -ni,--null-indicator <ARG> specify null indicator string, default
  18. ""(empty string)
  19. -rd,--record-delimiter <ARG> specify record delimiter, default
  20. "\r\n"
  21. -s,--scan <ARG> specify scan file
  22. action(true|false|only), default true
  23. -sd,--session-dir <ARG> set session dir, default
  24. /D:/console/plugins/dship/lib/..
  25. -te,--tunnel-endpoint <ARG> tunnel endpoint
  26. -tz,--time-zone <ARG> time zone, default local timezone:
  27. Asia/Shanghai
  28. Example:
  29. tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"

Parameter Description:

  • -bs,—block-size: the size of data block uploaded to Tunnel each time and the default value is: 100MiB (MiB=1024*1024B).
  • -c,—charset: specify the charset for local data file and the default is ’UTF-8’; If it is not set, download raw data.
  • -cp,—compress: specify whether to be uploaded after compressed. Default is “true”.
  • -dfp: data format of DateTime type and the default is ’yyyy-MM-dd HH:mm:ss’.
  • -dbr: whether to ignore bad records (extra columns, missing columns or column data type is not matched.). When the value is ’true’, ignore all data that is not consistent with the table definition. If the value is ’false’, the error prompt will be thrown once the bad records are found.
  • -e: specify the endpoint of ODPS.
  • -fd: the column separator of a local data file. The default is comma ‘,’.
  • -h: whether the local file should have table header. If it is ‘true’, dship will skip the header and upload data from the second row.
  • -mbr,—max-bad-records: in default case, when the uploaded bad records surpass 1000, the uploading action is terminated.
  • -ni: NULL data identifier and the default is “ “(empty string).
  • -rd: the line separator of a local data file and the default is ’\r\n’.
  • -s: whether to scan local data files. It is ‘false’ by default. When the value is ‘true’, scan data firstly, and then import data if the data format is correct. When the value is ‘false’, do not scan data, but import data directly. When the value is ‘only’, scan local data only, without going on to import data upon scanning.
  • -te: specify the Endpoint of tunnel.
  • -tz: specify time zone. The default is: Asia/Shanghai.

Program Example:

  • Create an object table.
  1. CREATE TABLE IF NOT EXISTS sale_detail(
  2. shop_name STRING,
  3. customer_id STRING,
  4. total_price DOUBLE)
  5. PARTITIONED BY (sale_date STRING,region STRING);
  • Add the partition:
  1. alter table sale_detail add partition (sale_date='201312', region='hangzhou');
  • Prepare the data file ‘data.txt’ and its contents are as follows:
  1. shop9,97,100
  2. shop10,10,200
  3. shop11,11

The third line data in this file do not comply with the definition in the table sale_detail. The table sale_detail defines 3 columns. However, only 2 columns are available for the data.

  • Import the data:
  1. odps@ project_name>tunnel u d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false
  2. Upload session: 201506101639224880870a002ec60c
  3. Start upload:d:\data.txt
  4. Total bytes:41 Split input to 1 blocks
  5. 2015-06-10 16:39:22 upload block: '1'
  6. ERROR: column mismatch -,expected 3 columns, 2 columns found, please check data or delimiter

As the bad record exists in data.txt, data fails to be imported. The session id and error message are displayed.

Data verification:

  1. odps@ odpstest_ay52c_ay52> select * from sale_detail where sale_date='201312';
  2. ID = 20150610084135370gyvc61z5
  3. +-----------+-------------+-------------+-----------+--------+
  4. | shop_name | customer_id | total_price | sale_date | region |
  5. +-----------+-------------+-------------+-----------+--------+
  6. +-----------+-------------+-------------+-----------+--------+

As the bad record exists, data fails to be imported. There is no data in the table.

Show

Show historical records. The tips for using subcommands are as follows:

  1. usage: tunnel show history [options]
  2. show session information
  3. -c,--charset <ARG> specify file charset, default utf8.
  4. set ignore to download raw data
  5. -cp,--compress <ARG> compress, default true
  6. -dfp,--date-format-pattern <ARG> specify date format pattern, default
  7. yyyy-MM-dd HH:mm:ss
  8. -e,--endpoint <ARG> odps endpoint
  9. -fd,--field-delimiter <ARG> specify field delimiter, default ","
  10. -h,--header <ARG> if local file should have table
  11. header, default false
  12. -n,--number <ARG> lines
  13. -ni,--null-indicator <ARG> specify null indicator string, default
  14. ""(empty string)
  15. -rd,--record-delimiter <ARG> specify record delimiter, default
  16. "\r\n"
  17. -sd,--session-dir <ARG> set session dir, default
  18. /D:/console/plugins/dship/lib/..(plugins/dship/lib所在具体路径)
  19. -te,--tunnel-endpoint <ARG> tunnel endpoint
  20. -tz,--time-zone <ARG> time zone, default local timezone:
  21. Asia/Shanghai
  22. Example:
  23. tunnel show history -n 5
  24. tunnel show log

Command Example:

  1. odps@ project_name>tunnel show history;
  2. 201506101639224880870a002ec60c failed 'u --config-file /D:/console/conf/odps_config.ini --project odpstest_ay52c_ay52 --endpoint http://service.odps.aliyun.com/api --id UlVxOHuthHV1QrI1 --key 2m4r3WvTZbsNJjybVXj0InVke7UkvR d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false'

Description: 201506101639224880870a002ec60c is the running instance id when data fails to be imported in the previous section.

Resume

Resuming historical records for execution are valid only for the data uploaded. The tips for using subcommands are as follows:

  1. usage: tunnel resume [session_id] [--force]
  2. download data to local file
  3. -c,--charset <ARG> specify file charset, default utf8.
  4. set ignore to download raw data
  5. -cp,--compress <ARG> compress, default true
  6. -dfp,--date-format-pattern <ARG> specify date format pattern, default
  7. yyyy-MM-dd HH:mm:ss
  8. -e,--endpoint <ARG> odps endpoint
  9. -f,--force force resume
  10. -fd,--field-delimiter <ARG> specify field delimiter, default ","
  11. -h,--header <ARG> if local file should have table
  12. header, default false
  13. -ni,--null-indicator <ARG> specify null indicator string, default
  14. ""(empty string)
  15. -rd,--record-delimiter <ARG> specify record delimiter, default
  16. "\r\n"
  17. -sd,--session-dir <ARG> set session dir, default
  18. /D:/console/plugins/dship/lib/..
  19. -te,--tunnel-endpoint <ARG> tunnel endpoint
  20. -tz,--time-zone <ARG> time zone, default local timezone:
  21. Asia/Shanghai
  22. Example:
  23. tunnel resume

Command Example:

Modify data.txt to be:

  1. shop9,97,100
  2. shop10,10,200

Resume the data uploaded:

  1. odps@ project_name>tunnel resume 201506101639224880870a002ec60c --force;
  2. start resume
  3. 201506101639224880870a002ec60c
  4. Upload session: 201506101639224880870a002ec60c
  5. Start upload:d:\data.txt
  6. Resume 1 blocks
  7. 2015-06-10 16:46:42 upload block: '1'
  8. 2015-06-10 16:46:42 upload block complete, blockid=1
  9. upload complete, average speed is 0 KB/s
  10. OK

201506101639224880870a002ec60c is the session id when data fails to be uploaded.

Data verification:

  1. odps@ project_name>select * from sale_detail where sale_date='201312';
  2. ID = 20150610084801405g0a741z5
  3. +-----------+-------------+-------------+-----------+--------+
  4. | shop_name | customer_id | total_price | sale_date | region |
  5. +-----------+-------------+-------------+-----------+--------+
  6. | shop9 | 97 | 100.0 | 201312 | hangzhou |
  7. |
  8. shop10 | 10 | 200.0 | 201312 | hangzhou |
  9. +-----------+-------------+-------------+-----------+--------+

Download

The tips for using subcommands are as follows:

  1. odps@ project_name>tunnel help download
  2. usage: tunnel download [options] <[project.]table[/partition]> <path>
  3. resume [session_id] [--force]
  4. -c,--charset <ARG> specify file charset, default utf8.
  5. set ignore to download raw data
  6. -cp,--compress <ARG> compress, default true
  7. -dfp,--date-format-pattern <ARG> specify date format pattern, default
  8. yyyy-MM-dd HH:mm:ss
  9. -e,--endpoint <ARG> odps endpoint
  10. -fd,--field-delimiter <ARG> specify field delimiter, default ","
  11. -h,--header <ARG> if local file should have table
  12. header, default false
  13. -ni,--null-indicator <ARG> specify null indicator string, default
  14. ""(empty string)
  15. -rd,--record-delimiter <ARG> specify record delimiter, default
  16. "\r\n"
  17. -sd,--session-dir <ARG> set session dir, default
  18. /D:/console/plugins/dship/lib/..(plugins/dship/lib所在的实际路径)
  19. -te,--tunnel-endpoint <ARG> tunnel endpoint
  20. -tz,--time-zone <ARG> time zone, default local timezone:
  21. Asia/Shanghai
  22. Example:
  23. tunnel download test_project.test_table/p1="b1",p2="b2" log.txt

Description:

  • -fd: the column separator of a local data file. The default is comma ‘,’.
  • -rd: the row seperator of a local file and the default is ’\r\n’.
  • -dfp: the format of the data with DateTime type. The default is ‘yyyy-MM-dd HH:mm:ss’.
  • -ni: the identifier of NULL and the default is “ “ (empty string).
  • -c: the charset of local data file. The default is ’UTF-8’.

Command Example:

Upload data into the file result.txt:

  1. $ ./tunnel download sale_detail/sale_date=201312,region=hangzhou result.txt;
  2. Download session: 201506101658245283870a002ed0b9
  3. Total records: 2
  4. 2015-06-10 16:58:24 download records: 2
  5. 2015-06-10 16:58:24 file size: 30 bytes
  6. OK

Data verification. The contents of the result.txt file are as follows:

  1. shop9,97,100.0
  2. shop10,10,200.0

Purge

Clear session histroy. The default is to clear the session directories within 3 days. The tips for using subcommands are as follows:

  1. usage: tunnel purge [n]
  2. force session history to be purged.([n] days before, default
  3. 3 days)
  4. -c,--charset <ARG> specify file charset, default utf8.
  5. set ignore to download raw data
  6. -cp,--compress <ARG> compress, default true
  7. -dfp,--date-format-pattern <ARG> specify date format pattern, default
  8. yyyy-MM-dd HH:mm:ss
  9. -e,--endpoint <ARG> odps endpoint
  10. -fd,--field-delimiter <ARG> specify field delimiter, default ","
  11. -h,--header <ARG> if local file should have table
  12. header, default false
  13. -ni,--null-indicator <ARG> specify null indicator string, default
  14. " "(empty string)
  15. -rd,--record-delimiter <ARG> specify record delimiter, default
  16. "\r\n"
  17. -sd,--session-dir <ARG> set session dir, default
  18. /D:/console/plugins/dship/lib/..
  19. -te,--tunnel-endpoint <ARG> tunnel endpoint
  20. -tz,--time-zone <ARG> time zone, default local timezone:
  21. Asia/Shanghai
  22. Example:
  23. tunnel purge 5

Data Types Description

Type Description
STRING string type and the length can not exceed 8MB.
BOOLEN The uploaded values can only be “true”, “false”, “0”, “1”. The downloaded value can be true/false. The values are case insensitive.
BIGINT value range [-9223372036854775807,9223372036854775807].
DOUBLE 1.16 effective digits
2. Support the expression with scientific notation for upload operation.
3. Downloading can only be expressed with figure.
4.Max value: 1.7976931348623157E308
5. Mininum value: 4.9E-324
6.∞:Infinity
7.→0:-Infinity
DATETIME Datetime type supports data uploading based on GMT+8 time zone. The format pattern of date can be specified by using command lines.

To upload data of DATETIME types, specify the datetime format. See SimpleDateFormat for reference.

  1. "yyyyMMddHHmmss": data format "20140209101000"
  2. "yyyy-MM-dd HH:mm:ss" (default): data format "2014-02-09 10:10:00"
  3. "yyyy年MM月dd日": data format "2014年09月01日"

Example:

  1. tunnel upload log.txt test_table dfp "yyyy-MM-dd HH:mm:ss"

NULL

All data types can be null:

  • The empty string defaults to be NULL.

  • You can specify null through the parameter ‘–null-indicator’ in command line.

  1. tunnel upload log.txt test_table ni "NULL"

Character Code

User can specify the character code of a file and the default is “UTF-8”.

  1. tunnel upload log.txt test_table c "gbk"

Seperator

Tunnel commands supports user-defined file separators, with “–record-delimiter” being the row separator and “–field- delimiter” being the column separator.Separators are described as follows:

  • Row and column separators with multiple characters are supported.
  • A column separator cannot include a row separator.
  • A separator with an escape character supports only \r, \n and \t in command lines.

Example:

  1. tunnel upload log.txt test_table fd "||" rd "\r\n"
Thank you! We've received your feedback.