MaxCompute SQL Limits & Project Space Restrictions Explained - MaxCompute - Alibaba Cloud - MaxCompute

This topic describes the features of Tunnel commands and how to use them to upload and download data.

Supported tools

You can run Tunnel commands in odpscmd and MaxCompute Studio. For more information, see Connect using a local client (odpscmd) and Introduction to MaxCompute Studio.

Tunnel command features

The Tunnel commands available in the client replace the Dship tool. You can use these commands to upload and download data. The commands are as follows:

Upload: Uploads data from a local file to a MaxCompute table. Each upload operation can write data to a single table or a single partition. For a partitioned table, you must specify the destination partition. For a multi-level partitioned table, you must specify the lowest-level partition. For more information, see Upload.
Download: Downloads data from a MaxCompute table or the results of a specific instance to a local file. Each download operation can download one table or one partition to a single file. For a partitioned table, you must specify the source partition. For a multi-level partitioned table, you must specify the lowest-level partition. For more information, see Download.
Resume: If an upload fails because of network issues or Tunnel service errors, you can use the Resume command to continue the file upload. This command resumes the previous upload operation. The Resume command does not currently support download operations. For more information, see Resume.
Show: Displays information about historical tasks. For more information, see Show.
Purge: Clears the session directory. By default, this command removes logs from the last three days. For more information, see Purge.
Help: Displays help information. Short command formats are supported for each command and option.
Upsert: Writes data using a combination of Update and Insert semantics. This command is supported only for Transaction Table 2.0 tables.

If matching data is not found in the destination table, new data is inserted. If the data already exists, it is updated.

Tunnel upload and download limits

The Tunnel feature and the Tunnel software development kit (SDK) do not currently support operations on external tables. You can use Tunnel to upload data directly to a MaxCompute internal table. Alternatively, you can upload data to OSS using the OSS Python SDK and then create an external table in MaxCompute to map to the data. For more information about external tables, see Overview of external tables.
Tunnel commands do not support uploading or downloading data of the ARRAY, MAP, or STRUCT data types.
The lifecycle of each Tunnel session on the server-side is 24 hours. A session can be used within 24 hours after it is created. It can also be shared across processes or threads. However, you must ensure that the same BlockId is not reused.

Note
A Tunnel Session: The server-side creates a session for each upload or download and generates a unique UploadId or DownloadId to identify the operation. The lifecycle of a Tunnel session on the server-side is 24 hours. After 24 hours, the session becomes inactive.
When you download data using a session, note that a session created by an Alibaba Cloud account can be used for downloads only by that Alibaba Cloud account or its RAM users.

Upload and download table data

Before you start, prepare a data file named data.txt and save it to d:\data.txt. The file contains the following data.

shopx,x_id,100
shopy,y_id,200
shopz,z_id

Note

The third row of data in the data.txt file does not match the schema of the partitioned table sale_detail that will be created. The sale_detail table is defined with three columns, but this row has only two.

The following steps describe how to upload and download table data.

In the MaxCompute client, run the following statements to create the partitioned table sale_detail and add a partition to it.

CREATE TABLE IF NOT EXISTS sale_detail(
      shop_name     STRING,
      customer_id   STRING,
      total_price   DOUBLE)
PARTITIONED BY (sale_date STRING,region STRING);

alter table sale_detail add partition (sale_date='201312', region='hangzhou');

Use the upload command to upload the data.txt data file to the partitioned table sale_detail.

Sample command

tunnel upload d:\data.txt sale_detail/sale_date=201312,region=hangzhou -s false;

Return value

Upload session: 20230505xxxxxxxxxxxb0b02dbb6bd
Start upload:d:\data.txt
Using \r\n to split records
Upload in strict schema mode: true
Total bytes:42   Split input to 1 blocks
2023-05-05 10:11:35     upload block: '1'
ERROR: column mismatch -,expected 3 columns, 2 columns found, please check data or delimiter

Note

Because data.txt contains dirty data, the data import fails. A session ID and an error message are returned.

Run the following statement to verify the data.

Sample command

select * from sale_detail where sale_date='201312';

Return values

ID = 20230505xxxxxxxxxxxvc61z5
+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
+-----------+-------------+-------------+-----------+--------+

Note

The data import failed due to dirty data, leaving the table empty.

Use the show command to query the session ID of the failed upload.

Sample command
```
tunnel show history;
```

Returned result

20230505xxxxxxxxxxxb0b02dbb6bd  failed  'upload d:\data.txt sale_detail/sale_date=201312,region=Hangzhou -s false'

Modify the data.txt sample data file to contain the following data, which matches the table schema of sale_detail.
```
shopx,x_id,100
shopy,y_id,200
```

Run the resume command to resume the data upload. In this command, 20230505xxxxxxxxxxxb0b02dbb6bd is the session ID of the failed upload.

Sample command

tunnel resume 20230505xxxxxxxxxxxb0b02dbb6bd --force;

Return value

start resume
20230505xxxxxxxxxxxb0b02dbb6bd
Upload session: 20230505xxxxxxxxxxxb0b02dbb6bd
Start upload:d:\data.txt
Using \r\n to split records
Upload in strict schema mode: true
Resume 1 blocks 
2023-05-05 10:32:39     upload block: '1'
2023-05-05 10:32:40     upload block complete, block id: 1
upload complete, average speed is 0 bytes/s
OK

Run the following statement to verify the data. The upload is successful.

Sample command

select * from sale_detail where sale_date='201312';

Return value

ID = 20230505xxxxxxxxxxxx7afc9qcg
 +-----------+-------------+-------------+-----------+--------+
 | shop_name | customer_id | total_price | sale_date | region |
 +-----------+-------------+-------------+-----------+--------+
 | shopx     | x_id        | 100.0       | 201312    | hangzhou|
 | shopy     | y_id        | 200.0       | 201312    | hangzhou|
 +-----------+-------------+-------------+-----------+--------+

Run the download command to download the data from the sale_detail table to a local file named result.txt.
Note
The naming conventions for the local path path where the data is downloaded are as follows:
- To save the file directly to the bin directory of the MaxCompute client, set path to filename.extension.
- To save the file to another path, such as the test folder on drive D, set path to D:\test\filename.extension.
- If a local file with the same name exists, it is overwritten.
```
tunnel download sale_detail/sale_date=201312,region=hangzhou result.txt;
```
Verify the content of the result.txt file. The file contains the following content, which indicates that the download is successful.

Note
The downloaded file displays only the values of non-partition fields.
```
shopx,x_id,100.0
shopy,y_id,200.0
```

Download instance data

Method 1: Use the tunnel download command to download the execution results of a specific instance to a local file.
1. Run a SELECT statement to query the sale_detail table.
```
select * from sale_detail;
```
  Note
  If full table scans are disabled for the partitioned table sale_detail, this command returns the error Table(xxxx) is full scan with all partitions, please specify partition predicates. For more information about how to resolve this issue, see Appendix: Error codes.
2. Run the following Tunnel command to download the execution results to a local file.
```
--View the instance ID of the SELECT command
show p;

--Run the download command
tunnel download instance://20170724071705393ge3csfb8 result.txt;
```
Method 2: Configure parameters to output SQL query results through InstanceTunnel by default.
After you enable the use_instance_tunnel option in the MaxCompute client, SELECT queries use InstanceTunnel by default to download results. This helps avoid timeout errors and data volume limits when you retrieve SQL query results from the MaxCompute platform. You can enable this configuration in one of the following two ways:

Note
For more information about the constraints and limits of InstanceTunnel, see InstanceTunnel constraints and limits.
- In the latest version of the client, this option is enabled by default in the odps_config.ini file, and instance_tunnel_max_record is set to 10000 by default.
```
# Download SQL results by instance tunnel
use_instance_tunnel=true
# The maximum number of records to download when using instance tunnel for SQL results
instance_tunnel_max_record=10000
```
  Note
  The instance_tunnel_max_record parameter specifies the maximum number of SQL query results to download through InstanceTunnel. If you do not set this parameter, an unlimited number of records can be downloaded.
- Run the set console.sql.result.instancetunnel=true command to enable this feature.
  - Enable the Instance Tunnel option.
```
set console.sql.result.instancetunnel=true;
```
  - Run a select query.
```
select * from wc_in;
```
    The following result is returned.
    ID = 20170724081946458g14csfb8 Log view: http://logview/xxxxx..... +------------+ | key | +------------+ | slkdfj | | hellp | | apple | | tea | | peach | | apple | | tea | | teaa | +------------+ A total of 8 records fetched by instance tunnel. Max record number: 10000
  Note
  When you use InstanceTunnel to output the results of a SELECT query, the system prints a message on the last line. In this example, the instance execution returned a total of 8 records. Similarly, you can run the set console.sql.result.instancetunnel = false command to disable this feature.