This topic describes how to use UploadSession to upload data to a table.

UploadSession definition

Use the following format to define UploadSession:
public class UploadSession {
    UploadSession(Configuration conf, String projectName, String tableName,
                  String partitionSpec) throws TunnelException;
    UploadSession(Configuration conf, String projectName, String tableName, 
                  String partitionSpec, String uploadId) throws TunnelException;
    public void commit(Long[] blocks);
    public Long[] getBlockList();
    public String getId();
    public TableSchema getSchema();
    public UploadSession.Status getStatus();
    public Record newRecord();
    public RecordWriter openRecordWriter(long blockId);
    public RecordWriter openRecordWriter(long blockId, boolean compress);
    public RecordWriter openBufferedWriter();
    public RecordWriter openBufferedWriter(boolean compress);
}

UploadSession description

  • Time-to-live (TTL): The entire process starts when you create an Upload instance and ends when the upload process is complete.
  • Create an Upload instance. You can call the Constructor method or use TableTunnel to create an Upload instance.
    • Send requests in synchronous mode.
    • The server creates a session for the Upload instance and generates a unique upload ID to identify the Upload instance. You can run the getId command on the client to obtain the upload ID.
  • Upload data:
    • Send requests in synchronous mode.
    • Call the openRecordWriter method to generate a RecordWriter instance. The blockId parameter identifies the data to upload and describes the position of the data in the table. The value range of blockId is [0, 20000]. If the data upload fails, you can re-upload the data based on the blockId parameter.
  • View upload:
    • Send requests in synchronous mode.
    • Call the getStatus method to obtain the current upload status.
    • Call the getBlockList method to obtain the list of the blocks that are uploaded. Compare the result with the block ID list that was previously sent to the server and re-upload the blocks that have failed to be uploaded.
  • End upload:
    • Send requests in synchronous mode.
    • Call the commit(Long[] blocks) method. The blocks parameter indicates the blocks that have been uploaded. The server verifies the block list.
    • This verification can enhance data accuracy. If the provided block list does not match the block list on the server, an error is returned.
    • If the Commit operation fails, try again.
  • State description:
    1. UNKNOWN: This is the initial state when the server creates a session.
    2. NORMAL: The upload session is created.
    3. CLOSING: When you call the complete method to end an upload session, the server changes the state to CLOSING.
    4. CLOSED: The upload is complete. The data is moved to the directory where the result table is located.
    5. EXPIRED: The upload session has timed out.
    6. CRITICAL: A service error has occurred.
Note
  • Block IDs that are used within the same upload session must be unique. If you use a block ID to open a RecordWriter, write data, and then call the close and commit methods, you cannot use this block ID to open another RecordWriter.
  • The maximum size of a block is 100 GB. We recommend that you store more than 64 MB of data in each block.
  • The TTL of each session on the server is 24 hours.
  • When you upload data, a network action is triggered every time the RecordWriter writes 8 KB of data. If no network actions are triggered within 120 seconds, the server closes the connection and the RecordWriter becomes unavailable. You must open a new RecordWriter to upload data.
  • We recommend that you use openBufferedWriter to upload data. This operation does not show the blockId details and contains an internal data cache. When a block fails to be uploaded, the Writer automatically re-uploads the block. For more information, see Data upload by using BufferedWriter.