TableTunnel is an entry class of the MaxCompute Tunnel service. You can use TableTunnel to upload or download only table data. Views are not supported.

Definition

The following code defines TableTunnel. For more information, visit Java-sdk-doc.
public class TableTunnel {
 public DownloadSession createDownloadSession(String projectName, String tableName);
 public DownloadSession createDownloadSession(String projectName, String tableName, PartitionSpec partitionSpec);
 public UploadSession createUploadSession(String projectName, String tableName,boolean overwrite);
 public UploadSession createUploadSession(String projectName, String tableName, PartitionSpec partitionSpec,boolean overwrite);
 public DownloadSession getDownloadSession(String projectName, String tableName, PartitionSpec partitionSpec, String id);
 public DownloadSession getDownloadSession(String projectName, String tableName, String id);
 public UploadSession getUploadSession(String projectName, String tableName, PartitionSpec partitionSpec, String id);
 public UploadSession getUploadSession(String projectName, String tableName, String id);
}
Descriptions:
  • The lifecycle of TableTunnel starts from the creation of the TableTunnel instance to the end of the program.
  • TableTunnel provides a method to create UploadSession and DownloadSession objects. TableTunnel.UploadSession is used to upload data. TableTunnel.DownloadSession is used to download data.
  • A session refers to the process of uploading or downloading a table or partition. A session consists of one or more HTTP requests to Tunnel RESTful APIs.
  • In an upload session, each RecordWriter matches an HTTP request and is identified by a unique block ID. The block ID is the name of the file that corresponds to the RecordWriter.
  • If you use the same block ID to enable a RecordWriter multiple times in the same session, the data uploaded when the RecordWriter calls the close() method for the last time overwrites all previous data. This feature can be used to retransmit a data block that fails to be uploaded.
  • In UploadSession of TableTunnel:
    • If the boolean overwrite parameter is not specified, the INSERT INTO statement is used.
    • If the boolean overwrite parameter is set to True, the INSERT OVERWRITE statement is used.
    • If the boolean overwrite parameter is set to False, the INSERT INTO statement is used.
    Descriptions of the two statements:
    • INSERT INTO: Upload sessions of the same table or partition do not affect each other. Data uploaded in each session is saved in different directories.
    • INSERT OVERWRITE: All data in a table or partition is overwritten by the data in the current upload session. If you use this statement to upload data, do not perform concurrent operations on the same table or partition.

Implementation process

  1. The RecordWriter.write() method uploads your data as files to a temporary directory.
  2. The RecordWriter.close() method moves the files from the temporary directory to the data directory.
  3. The session.commit() method moves all files in the data directory to the directory where the required table is located, and updates the table metadata. This way, the data moved into a table by the current job is visible to other MaxCompute jobs such as SQL and MapReduce jobs.

Limits

  • The value of a block ID must be greater than or equal to 0 but less than 20000. The size of the data that you want to upload in a block cannot exceed 100 GB.
  • A session is uniquely identified by its ID. The lifecycle of a session is 24 hours. If your session times out due to the transfer of large amounts of data, you must transfer your data in multiple sessions.
  • The lifecycle of an HTTP request that corresponds to a RecordWriter is 120 seconds. If no data flows over an HTTP connection within 120 seconds, the server closes the connection.
    Note HTTP has an 8 KB buffer. When you call the RecordWriter.write() method, your data may be saved to the buffer and no inbound traffic flows over the HTTP connection. In this case, you can call the TunnelRecordWriter.flush() method to forcibly flush data from the buffer.
  • If you use a RecordWriter to write logs to MaxCompute, the RecordWriter may time out due to unexpected traffic fluctuations.
    • We recommend that you do not use a RecordWriter for each data record. If you use a RecordWriter for each data record, a large number of small files are generated, because each RecordWriter corresponds to a file. This affects the performance of MaxCompute.
    • If the size of cached code reaches 64 MB, we recommend that you use a RecordWriter to write multiple data records at a time.
  • The lifecycle of a RecordReader is 300 seconds.