This topic describes the features of Tunnel commands and how to use them to upload and download data.
Supported tools
You can run Tunnel commands in odpscmd and MaxCompute Studio. For more information, see Connect using a local client (odpscmd) and Introduction to MaxCompute Studio.
Tunnel command features
The Tunnel commands available in the client replace the Dship tool. You can use these commands to upload and download data. The commands are as follows:
-
Upload: Uploads data from a local file to a MaxCompute table. Each upload operation can write data to a single table or a single partition. For a partitioned table, you must specify the destination partition. For a multi-level partitioned table, you must specify the lowest-level partition. For more information, see Upload.
-
Download: Downloads data from a MaxCompute table or the results of a specific instance to a local file. Each download operation can download one table or one partition to a single file. For a partitioned table, you must specify the source partition. For a multi-level partitioned table, you must specify the lowest-level partition. For more information, see Download.
-
Resume: If an upload fails because of network issues or Tunnel service errors, you can use the
Resumecommand to continue the file upload. This command resumes the previous upload operation. The Resume command does not currently support download operations. For more information, see Resume. -
Show: Displays information about historical tasks. For more information, see Show.
-
Purge: Clears the session directory. By default, this command removes logs from the last three days. For more information, see Purge.
-
Help: Displays help information. Short command formats are supported for each command and option.
-
Upsert: Writes data using a combination of Update and Insert semantics. This command is supported only for Transaction Table 2.0 tables.
If matching data is not found in the destination table, new data is inserted. If the data already exists, it is updated.
Tunnel upload and download limits
-
The Tunnel feature and the Tunnel software development kit (SDK) do not currently support operations on external tables. You can use Tunnel to upload data directly to a MaxCompute internal table. Alternatively, you can upload data to OSS using the OSS Python SDK and then create an external table in MaxCompute to map to the data. For more information about external tables, see Overview of external tables.
-
Tunnel commands do not support uploading or downloading data of the ARRAY, MAP, or STRUCT data types.
-
The lifecycle of each Tunnel session on the server-side is 24 hours. A session can be used within 24 hours after it is created. It can also be shared across processes or threads. However, you must ensure that the same BlockId is not reused.
NoteA Tunnel Session: The server-side creates a session for each upload or download and generates a unique UploadId or DownloadId to identify the operation. The lifecycle of a Tunnel session on the server-side is 24 hours. After 24 hours, the session becomes inactive.
-
When you download data using a session, note that a session created by an Alibaba Cloud account can be used for downloads only by that Alibaba Cloud account or its RAM users.
Upload and download table data
Before you start, prepare a data file named data.txt and save it to d:\data.txt. The file contains the following data.
shopx,x_id,100
shopy,y_id,200
shopz,z_id
The third row of data in the data.txt file does not match the schema of the partitioned table sale_detail that will be created. The sale_detail table is defined with three columns, but this row has only two.
The following steps describe how to upload and download table data.
-
In the MaxCompute client, run the following statements to create the partitioned table sale_detail and add a partition to it.
CREATE TABLE IF NOT EXISTS sale_detail( shop_name STRING, customer_id STRING, total_price DOUBLE) PARTITIONED BY (sale_date STRING,region STRING); alter table sale_detail add partition (sale_date='201312', region='hangzhou'); -
Use the
uploadcommand to upload the data.txt data file to the partitioned table sale_detail.-
Sample command
tunnel upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false; -
Return value
Upload session: 20230505xxxxxxxxxxxb0b02dbb6bd Start upload:d:\data.txt Using \r\n to split records Upload in strict schema mode: true Total bytes:42 Split input to 1 blocks 2023-05-05 10:11:35 upload block: '1' ERROR: column mismatch -,expected 3 columns, 2 columns found, please check data or delimiterNoteBecause data.txt contains dirty data, the data import fails. A session ID and an error message are returned.
-
-
Run the following statement to verify the data.
-
Sample command
select * from sale_detail where sale_date='201312'; -
Return values
ID = 20230505xxxxxxxxxxxvc61z5 +-----------+-------------+-------------+-----------+--------+ | shop_name | customer_id | total_price | sale_date | region | +-----------+-------------+-------------+-----------+--------+ +-----------+-------------+-------------+-----------+--------+NoteThe data import failed due to dirty data, leaving the table empty.
-
-
Use the
showcommand to query the session ID of the failed upload.-
Sample command
tunnel show history; -
Returned result
20230505xxxxxxxxxxxb0b02dbb6bd failed 'upload d:\data.txt sale_detail/sale_date=201312,region=Hangzhou -s false'
-
-
Modify the data.txt sample data file to contain the following data, which matches the table schema of sale_detail.
shopx,x_id,100 shopy,y_id,200 -
Run the
resumecommand to resume the data upload. In this command, 20230505xxxxxxxxxxxb0b02dbb6bd is the session ID of the failed upload.-
Sample command
tunnel resume 20230505xxxxxxxxxxxb0b02dbb6bd --force; -
Return value
start resume 20230505xxxxxxxxxxxb0b02dbb6bd Upload session: 20230505xxxxxxxxxxxb0b02dbb6bd Start upload:d:\data.txt Using \r\n to split records Upload in strict schema mode: true Resume 1 blocks 2023-05-05 10:32:39 upload block: '1' 2023-05-05 10:32:40 upload block complete, block id: 1 upload complete, average speed is 0 bytes/s OK
-
-
Run the following statement to verify the data. The upload is successful.
-
Sample command
select * from sale_detail where sale_date='201312'; -
Return value
ID = 20230505xxxxxxxxxxxx7afc9qcg +-----------+-------------+-------------+-----------+--------+ | shop_name | customer_id | total_price | sale_date | region | +-----------+-------------+-------------+-----------+--------+ | shopx | x_id | 100.0 | 201312 | hangzhou| | shopy | y_id | 200.0 | 201312 | hangzhou| +-----------+-------------+-------------+-----------+--------+
-
-
Run the
downloadcommand to download the data from the sale_detail table to a local file named result.txt.NoteThe naming conventions for the local path path where the data is downloaded are as follows:
-
To save the file directly to the bin directory of the MaxCompute client, set path to
filename.extension. -
To save the file to another path, such as the test folder on drive D, set path to
D:\test\filename.extension. -
If a local file with the same name exists, it is overwritten.
tunnel download sale_detail/sale_date=201312,region=hangzhou result.txt; -
-
Verify the content of the result.txt file. The file contains the following content, which indicates that the download is successful.
NoteThe downloaded file displays only the values of non-partition fields.
shopx,x_id,100.0 shopy,y_id,200.0
Download instance data
-
Method 1: Use the tunnel download command to download the execution results of a specific instance to a local file.
-
Run a SELECT statement to query the sale_detail table.
select * from sale_detail;NoteIf full table scans are disabled for the partitioned table sale_detail, this command returns the error
Table(xxxx) is full scan with all partitions, please specify partition predicates. For more information about how to resolve this issue, see Appendix: Error codes. -
Run the following Tunnel command to download the execution results to a local file.
--View the instance ID of the SELECT command show p; --Run the download command tunnel download instance://20170724071705393ge3csfb8 result.txt;
-
-
Method 2: Configure parameters to output SQL query results through InstanceTunnel by default.
After you enable the
use_instance_tunneloption in the MaxCompute client, SELECT queries use InstanceTunnel by default to download results. This helps avoid timeout errors and data volume limits when you retrieve SQL query results from the MaxCompute platform. You can enable this configuration in one of the following two ways:NoteFor more information about the constraints and limits of InstanceTunnel, see InstanceTunnel constraints and limits.
-
In the latest version of the client, this option is enabled by default in the odps_config.ini file, and
instance_tunnel_max_recordis set to 10000 by default.# Download SQL results by instance tunnel use_instance_tunnel=true # The maximum number of records to download when using instance tunnel for SQL results instance_tunnel_max_record=10000NoteThe
instance_tunnel_max_recordparameter specifies the maximum number of SQL query results to download through InstanceTunnel. If you do not set this parameter, an unlimited number of records can be downloaded. -
Run the
set console.sql.result.instancetunnel=truecommand to enable this feature.-
Enable the Instance Tunnel option.
set console.sql.result.instancetunnel=true; -
Run a select query.
select * from wc_in;The following result is returned.
ID = 20170724081946458g14csfb8 Log view: http://logview/xxxxx..... +------------+ | key | +------------+ | slkdfj | | hellp | | apple | | tea | | peach | | apple | | tea | | teaa | +------------+ A total of 8 records fetched by instance tunnel. Max record number: 10000
NoteWhen you use InstanceTunnel to output the results of a SELECT query, the system prints a message on the last line. In this example, the instance execution returned a total of 8 records. Similarly, you can run the
set console.sql.result.instancetunnel = falsecommand to disable this feature. -
-