MaxCompute allows you to run Tunnel commands to upload and download data. This topic provides detailed instructions on Tunnel commands that are used to upload and download data.
Command description
- Syntax
Valid values:tunnel <subcommand> [options] [args]
Available subcommands: upload (u) download (d) resume (r) show (s) purge (p) help (h)
- Parameters
- upload: uploads data to a MaxCompute table. You can upload files to only one table or only one partition in a table each time. For a partitioned table, you must specify the partition to which you want to upload data. For a multi-level partitioned table, you must specify the lowest-level partition.
-- Upload data in the log.txt file to the p1="b1" and p2="b2" partitions of the test_table table that has two levels of partitions in the test_project project. The log.txt file is saved in the bin directory of the MaxCompute client. tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"; -- Upload data in the log.txt file to the test_table table. The scan parameter is used to check whether data in the log.txt file complies with the schema of the test_table table. If the data does not comply, the system reports an error and stops the upload. tunnel upload log.txt test_table --scan=true; -- Upload data in the log.txt file from another directory to the p1="b1" and p2="b2" partitions of the test_table table that has two levels of partitions in the test_project project. tunnel upload D:\test\log.txt test_project.test_table/p1="b1",p2="b2";
- download: downloads data from a MaxCompute table. You can download data from only one table or partition to a single local file each time. For a partitioned table, you must specify the partition from which you want to download data. For a multi-level partitioned table, you must specify the lowest-level partition.
-- Download data from the test_project.test_table table, which has two levels of partitions, to the test_table.txt file in the bin directory of the MaxCompute client. tunnel download test_project.test_table/p1="b1",p2="b2" test_table.txt; -- Download data from the test_project.test_table table, which has two levels of partitions, to the test_table.txt file in another directory. tunnel download test_project.test_table/p1="b1",p2="b2" D:\test\test_table.txt;
- resume: resumes the transfer of files or directories. The transfer is interrupted because your network is disconnected or Tunnel is faulty. You can use this command to resume only data uploads. One data download or upload is referred to as a session. You must specify the session ID in the resume command before you run this command.
tunnel resume;
- show: displays historical task information.
-- Display the commands used in the last five data uploads or downloads. tunnel show history -n 5; -- Display the logs of the last data upload or download. tunnel show log;
- purge: clears the session directory. Logs from the last three days are cleared by default.
-- Clear logs from the last five days. tunnel purge 5;
- help: obtains help information.
- upload: uploads data to a MaxCompute table. You can upload files to only one table or only one partition in a table each time. For a partitioned table, you must specify the partition to which you want to upload data. For a multi-level partitioned table, you must specify the lowest-level partition.
Upload
- DescriptionUploads local data to a MaxCompute table in append mode.Note Append mode: If the data that you want to import already exists in the MaxCompute table, the data is not overwritten after you run the Upload command. In this case, both the existing data and the imported data exist in the MaxCompute table.
- Syntax
Format:tunnel upload [options] <path> <[project.]table[/partition]>
Available options: -acp,-auto-create-partition <ARG> auto create target partition if not exists, default false -bs,-block-size <ARG> block size in MiB, default 100 -c,-charset <ARG> specify file charset, default ignore. set ignore to download raw data -cf,-csv-format <ARG> use csv format (true|false), default false. When uploading in csv format, file splitting not supported. -cp,-compress <ARG> compress, default true -dbr,-discard-bad-records <ARG> specify discard bad records action(true|false), default false -dfp,-date-format-pattern <ARG> specify date format pattern, default yyyy-MM-dd HH:mm:ss -fd,-field-delimiter <ARG> specify field delimiter, support unicode, eg \u0001. default "," -h,-header <ARG> if local file should have table header, default false -mbr,-max-bad-records <ARG> max bad records, default 1000 -ni,-null-indicator <ARG> specify null indicator string, default ""(empty string) -ow,-overwrite <true | false> overwrite specified table or partition, default: false -rd,-record-delimiter <ARG> specify record delimiter, support unicode, eg \u0001. default "\r\n" -s,-scan <ARG> specify scan file action(true|false|only), default true -sd,-session-dir <ARG> set session dir, default D:\software\odpscmd_public\plugins\dship -ss,-strict-schema <ARG> specify strict schema mode. If false, extra data will be abandoned and insufficient field will be filled with null. Default true -t,-threads <ARG> number of threads, default 1 -te,-tunnel_endpoint <ARG> tunnel endpoint -time,-time <ARG> keep track of upload/download elapsed time or not. Default false -tz,-time-zone <ARG> time zone, default local timezone: Asia/Shanghai Example: tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"
- Parameters
- Required parameters
- path
Specifies the path and name of the data file that you want to upload.
You can save data files in thebin
directory of the MaxCompute client. In this case, you must set the path parameter to a value in theFile name.File name extension
format. You can also save data files to another directory, such as the test folder in drive D. In this case, you must set the path parameter to a value in theD:\test\File name.File name extension
format.Note In macOS, the value of the path parameter can only be an absolute path. For example, if the data files are saved in thebin
directory of the MaxCompute client, you must set the path parameter to a value in theD:\MaxCompute\bin \File name.File name extension
format. - [project.]table[/partition]
Specifies the name of the table to which you want to upload data. You must specify the lowest-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.
- path
- Optional parameters
- -acp
Specifies the partition to which you want to upload data. If the specified partition does not exist, a partition is automatically created. Default value: False.
- -bs
Specifies the size of the data block uploaded by Tunnel each time. Default value: 100 MiB (1 MiB = 1024 × 1024 bytes).
- -c
Specifies the encoding format of the data file. By default, this parameter is not specified, and raw data is downloaded.
- -cf
Specifies whether the file is a CSV file. Default value: False.
Note Only TXT and CSV files can be uploaded. TXT files are uploaded by default. If you want to upload a CSV file, you must configure the-cf
parameter and download the latest version of the MaxCompute client. - -cp
Specifies whether to compress the local data file before you upload it to MaxCompute to reduce network traffic. Default value: True.
- -dbrSpecifies whether to omit dirty data, such as additional columns, missing columns, or unmatched types of column data. Default value: False.
- True: omits all data that does not match the definition of the table. By default, 1000 data records are omitted. To change the number of data records that you want to omit, specify the -mbr parameter.
- False: An error is returned after dirty data is detected. This ensures that raw data in the table to which you want to upload data is not contaminated.
- -dfp
Specifies the format of DATETIME data. The default format is
yyyy-MM-dd HH:mm:ss
. If you want to specify DATETIME data that is accurate to the millisecond, the format yyyy-MM-dd HH:mm:ss.SSS can be used. For more information about the DATETIME data type, see Data type editions. - -fd
Specifies the column delimiter used in the local data file. Default value: comma (,).
- -h
Specifies whether the data file that you want to upload has a table header. Default value: False. This value indicates that the data file cannot contain a table header. If you set this parameter to True, the data file can contain a table header. In this case, the system skips the table header and uploads data from the second row.
- -mbr
Specifies the maximum number of allowed dirty data records. This parameter must be used together with the -dbr parameter and is valid only when the -dbr parameter is set to True. If the number of dirty data records exceeds the specified value, the upload stops. Default value: 1000.
- -ni
Specifies the NULL data identifier. Default value: an empty string.
- -owSpecifies whether the uploaded data overwrites the table or partition. Default value: False. This value indicates that data is uploaded in append mode. The following code shows an example:
-- Create a partitioned table. CREATE TABLE IF NOT EXISTS sale_detail( shop_name STRING, customer_id STRING, total_price DOUBLE) PARTITIONED BY (sale_date STRING,region STRING); alter table sale_detail add partition (sale_date='201312', region='hangzhou'); -- Prepare the local data file data.txt. The file contains the following content: shopx,x_id,100 shopy,y_id,200 -- Upload data to the partitioned table. tunnel upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou; -- Query the sale_detail table. select * from sale_detail; -- The following result is returned: +------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | shopx | x_id | 100.0 | 201312 | hangzhou | | shopy | y_id | 200.0 | 201312 | hangzhou | +------------+-------------+-------------+------------+------------+ -- Modify the data in the file data.txt. The file contains the following content: shopx,x_id,300 shopy,y_id,400 -- Upload the file after the data in the file is modified. The new file overwrites the existing file. tunnel upload -ow true data.txt sale_detail/sale_date=201312,region=hangzhou; -- Query the sale_detail table. select * from sale_detail; -- The following result is returned: +------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | shopx | x_id | 300.0 | 201312 | hangzhou | | shopy | y_id | 400.0 | 201312 | hangzhou | +------------+-------------+-------------+------------+------------+
- -rd
Specifies the row delimiter used in the local data file. Default value:
\r\n
. - -sSpecifies whether to scan the local data file. Default value: True.
- True: The system scans the data and starts to import the data only if the data is in the correct format.
- False: The system imports data without scanning.
- Only: The system scans only the local data. The data is not imported after the scan.
- -sd
Specifies the session directory.
- -ss
Specifies the strict schema mode. Default value: True. If you set this parameter to False, extra data is discarded, and the fields that are not specified are filled with NULL.
- -t
Specifies the number of threads. Default value: 1.
- -te
Specifies the endpoint of Tunnel.
- -time
Specifies whether the upload time is tracked. Default value: False.
- -tz
Specifies the time zone. Default value: local time zone, such as Asia/Shanghai. For more information about time zones, see Time zones.
- -acp
- Required parameters
Show
- Displays historical records.
- Syntax
tunnel show history [-n <number>];
-n <number>: specifies the number of times that the command is executed.
- ExamplesExample 1: Display history records. By default, 500 data records are saved.
The following result is returned:tunnel show history;
20230505xxxxxxxxxxxxxx0b0d5b3c bad 'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -dbr true -time true' 20230505xxxxxxxxxxxxxx0ad720a3 failed 'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -time true' 20230505xxxxxxxxxxxxxx0ad5ca68 bad 'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -dbr true' ......
Example 2: Display the commands used in the last five data uploads or downloads.
The following result is returned:tunnel show history -n 5;
20230505xxxxxxxxxxxxxx0aa48c4b success 'download sale_detail/sale_date=201312,region=hangzhou result.txt' 20230505xxxxxxxxxxxxxx0aa6165c success 'download sale_detail/sale_date=201312,region=hangzhou result.txt' 20230505xxxxxxxxxxxxxx0af11472 failed 'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false' 20230505xxxxxxxxxxxxxx0b464374 success 'upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false' 20230505xxxxxxxxxxxxxx02dbb6bd failed 'upload d:\data.txt sale_detail/sale_date="201312",region="hangzhou" -s false'
- Syntax
- Displays the logs of the last data upload or download.
tunnel show log;
Resume
- Description
Resumes the execution of historical operations. Only data uploads can be resumed.
- Syntax
odps@ project_name>tunnel help resume; usage: tunnel resume [session_id] [-force] resume an upload session -f,-force force resume Example: tunnel resume
- Parameters
- session_id
Specifies the ID of the session for which the upload failed. This parameter is a required parameter.
- -f
Specifies whether to forcefully resume the execution of historical operations. This parameter is omitted by default.
- session_id
- ExampleRun the following command to resume the session for which the upload failed. In this command, 20150610xxxxxxxxxxx70a002ec60c indicates the ID of the session for which the upload failed.
odps@ project_name>tunnel resume 20150610xxxxxxxxxxx70a002ec60c -force; start resume 20150610xxxxxxxxxxx70a002ec60c Upload session: 20150610xxxxxxxxxxx70a002ec60c Start upload:d:\data.txt Resume 1 blocks 2015-06-10 16:46:42 upload block: '1' 2015-06-10 16:46:42 upload block complete, blockid=1 upload complete, average speed is 0 KB/s OK
Download
- Description
Downloads MaxCompute table data or the execution result of a specific instance to a local directory.
You must be granted the Download permission before you use Tunnel to download data. If you do not have the Download permission, you must contact the project owner or a user who is assigned the Super_Administrator role to complete the authorization. For more information about how to grant the Download permission, see Policy-based access control.
- Syntax
odps@ project_name>tunnel help download; usage: tunnel download [options] <[project.]table[/partition]> <path> download data to local file -c,-charset <ARG> specify file charset, default ignore. set ignore to download raw data -cf,-csv-format <ARG> use csv format (truefalse), default false. When uploading in csv format, file splitting not supported. -ci,-columns-index <ARG> specify the columns index(starts from 0) to download, use comma to split each index -cn,-columns-name <ARG> specify the columns name to download, use comma to split each name -cp,-compress <ARG> compress, default true -dfp,-date-format-pattern <ARG> specify date format pattern, default yyyy-MM-dd HH:mm:ss -e,-exponential <ARG> When download double values, use exponential express if necessary. Otherwise at most 20 digits will be reserved. Default false -fd,-field-delimiter <ARG> specify field delimiter, support unicode, eg \u0001. default "," -h,-header <ARG> if local file should have table header, default false -limit <ARG> specify the number of records to download -ni,-null-indicator <ARG> specify null indicator string, default ""(empty string) -rd,-record-delimiter <ARG> specify record delimiter, support unicode, eg \u0001. default "\r\n" -sd,-session-dir <ARG> set session dir, default D:\software\odpscmd_public\plugins\dship -t,-threads <ARG> number of threads, default 1 -te,-tunnel_endpoint <ARG> tunnel endpoint -time,-time <ARG> keep track of upload/download elapsed time or not. Default false -tz,-time-zone <ARG> time zone, default local timezone: Asia/Shanghai usage: tunnel download [options] instance://<[project/]instance_id> <path> download instance result to local file -c,-charset <ARG> specify file charset, default ignore. set ignore to download raw data -cf,-csv-format <ARG> use csv format (truefalse), default false. When uploading in csv format, file splitting not supported. -ci,-columns-index <ARG> specify the columns index(starts from 0) to download, use comma to split each index -cn,-columns-name <ARG> specify the columns name to download, use comma to split each name -cp,-compress <ARG> compress, default true -dfp,-date-format-pattern <ARG> specify date format pattern, default yyyy-MM-dd HH:mm:ss -e,-exponential <ARG> When download double values, use exponential express if necessary. Otherwise at most 20 digits will be reserved. Default false -fd,-field-delimiter <ARG> specify field delimiter, support unicode, eg \u0001. default "," -h,-header <ARG> if local file should have table header, default false -limit <ARG> specify the number of records to download -ni,-null-indicator <ARG> specify null indicator string, default ""(empty string) -rd,-record-delimiter <ARG> specify record delimiter, support unicode, eg \u0001. default "\r\n" -sd,-session-dir <ARG> set session dir, default D:\software\odpscmd_public\plugins\dshi -t,-threads <ARG> number of threads, default 1 -te,-tunnel_endpoint <ARG> tunnel endpoint -time,-time <ARG> keep track of upload/download elapsed time or not. Default false -tz,-time-zone <ARG> time zone, default local timezone: Asia/Shanghai Example: tunnel download test_project.test_table/p1="b1",p2="b2" log.txt // Download data from a specific table. tunnel download instance://test_project/test_instance log.txt // Download the execution result of a specific instance.
- Parameters
- Required parameters
- path
Specifies the path in which the downloaded data file is saved.
You can save data files in the
bin
directory of the MaxCompute client. In this case, you must set path to a value in theFile name.File name extension
format. You can also save data files to another directory, such as the test folder in drive D. In this case, you must set path to a value in theD:\test\File name.File name extension
format. - [project.]table[/partition]
Specifies the name of the table that you want to download. You must specify the lowest-level partition for a partitioned table. If the table does not belong to the current project, you must specify the project where the table is located.
- [project/]instance_id
Specifies the ID of an instance. You can download the execution result of a specific instance.
- path
- Optional parameters
- -c
Specifies the encoding format of a local data file. This parameter is omitted by default.
- -cf
Specifies whether the file is a CSV file. Default value: False.
Note Only TXT and CSV files can be downloaded. TXT files are downloaded by default. If you want to download a CSV file, you must configure the-cf
parameter and download the latest version of the MaxCompute client. - -ci
Specifies the indexes of the columns that you want to download. The column indexes start from 0. Separate the column indexes with commas (,).
- -cn
Specifies the names of the columns that you want to download. Separate the column names with commas (,).
- -cp
Specifies whether to compress the local file before you download it to reduce network traffic. Default value: True.
- -dfp
Specifies the format of DATETIME data. The default format is
yyyy-MM-dd HH:mm:ss
. - -e
Specifies whether to represent data of the DOUBLE type that you want to download by using an exponential function. If the data is not represented by an exponential function, retain a maximum of 20 digits. Default value: False.
- -fd
Specifies the column delimiter for the local data file. Default value: comma (,).
- -hSpecifies whether the data file has a table header. Default value: False. This value indicates that the data file does not have a table header. If you set this parameter to True, the data file has a table header.Note
-h=true
andthreads>1
cannot be used together. threads>1 indicates that the number of threads is greater than 1. - -limit
Specifies the number of rows to download.
- -ni
Specifies the NULL data identifier. Default value: an empty string.
- -rd
Specifies the row delimiter used in the local data file. Default value:
\r\n
. - -sd
Specifies the session directory.
- -t
Specifies the number of threads. Default value: 1.
- -te
Specifies the endpoint of Tunnel.
- -time
Specifies whether the download time is tracked. Default value: False.
- -tz
Specifies the time zone. The local time zone is used by default, such as Asia/Shanghai.
- -c
- Required parameters
Purge
- Description
Clears the session directory.
- Syntax
odps@ project_name>tunnel help purge; usage: tunnel purge [n] force session history to be purged.([n] days before, default 3 days) Example: tunnel purge 5
- Parameter
n: specifies the number of days after which historical logs are cleared. Default value: 3.
Precautions
- The following table describes the data types.
Data type Description STRING The string that supports a maximum length of 8 MB. BOOLEAN For file uploads, the value can be True, False, 0, or 1. For file downloads, the value can be True or False. The value is not case-sensitive. BIGINT Valid values: [-9223372036854775807,9223372036854775807]. DOUBLE - A 16-digit number.
- Data of this type is expressed in scientific notations during data uploads.
- Data of this type is expressed in numbers during data downloads.
- Maximum value: 1.7976931348623157E308.
- Minimum value: 4.9E-324.
- Positive infinity: Infinity.
- Negative infinity: -Infinity.
DATETIME By default, data of the DATETIME type can be uploaded when the time zone is GMT+8. You can use command lines to specify the format pattern of this type. If you upload data of this type, you must specify the time format. For more information, see Data type editions.
Example:"yyyyMMddHHmmss": For example, "20140209101000". "yyyy-MM-dd HH:mm:ss" (default format): For example, "2014-02-09 10:10:00". "MM/dd/yyyy": For example, "09/01/2014".
tunnel upload log.txt test_table -dfp "yyyy-MM-dd HH:mm:ss"
- Null: Each data type can have null values.
- By default, a blank string indicates a null value.
- The -null-indicator parameter is used to specify a null string.
tunnel upload log.txt test_table -ni "NULL"
- Encoding format: You can specify the encoding format of the file. Default value: UTF-8.
tunnel upload log.txt test_table -c "gbk"
- Delimiter: Tunnel commands support custom file delimiters. -record-delimiter is used to customize row delimiters, and -field-delimiter is used to customize column delimiters.
- A row or column delimiter can contain multiple characters.
- A column delimiter cannot contain a row delimiter.
- Only the following escape character delimiters are supported in the command line: \r, \n, and \t.
tunnel upload log.txt test_table -fd "" -rd "\r\n"