How do I use Tablestore SDKs to deliver data to OSS - Tablestore

Before you use Tablestore SDKs to deliver data, you need to know about the usage notes and operations. You can create a delivery task in the Tablestore console to deliver data from a Tablestore table to an OSS bucket.

Usage notes

Data delivery is available in the China (Hangzhou), China (Shanghai), China (Beijing), and China (Zhangjiakou) regions.
The delete operation on Tablestore data is ignored when the data is delivered. Tablestore data delivered to OSS is not deleted when you perform a delete operation on the data.
It takes at most one minute for initialization when you create a delivery task.
Latencies are within 3 minutes when data is written at a steady rate. P99 is within 10 minutes when data is synchronized.
Note
P99 indicates the average latency of the slowest 1% of requests over the last 10 seconds.

Operations

Operation	Description
CreateDeliveryTask	Creates a delivery task.
ListDeliveryTask	Lists all delivery task information of a table.
DescribeDeliveryTask	Queries the descriptive information of a delivery task.
DeleteDeliveryTask	Deletes a delivery task.

Use Tablestore SDKs

You can use the following Tablestore SDKs to deliver data:

Prerequisites

The endpoint that you want to use is obtained. For more information, see Obtain an endpoint.
An AccessKey pair is configured. For more information, see Configure an AccessKey pair.
The AccessKey pair is configured in environment variables. For more information, see Configure environment variables.
The OTS_AK_ENV environment variable indicates the AccessKey ID of an Alibaba Cloud account or a RAM user. The OTS_SK_ENV environment variable indicates the AccessKey secret of an Alibaba Cloud account or a RAM user. Specify the AccessKey pair based on your requirements.
A data table is created. For more information, see Create a data table in the Tablestore console or Create a data table by using Tablestore SDKs.

Parameters

Parameter	Description
tableName	The name of the table.
taskName	The name of the delivery task. The name must be 3 to 16 characters in length and can contain only lowercase letters, digits, and hyphens (-). It must start and end with a lowercase letter or digit.
taskConfig	The configurations of the delivery task. Valid values: ossPrefix: the prefix of the folder in the bucket. The data from Tablestore is delivered to the folder. The path of the destination folder supports the following time variables: $yyyy, $MM, $dd, $HH, and $mm. When the path uses time variables, OSS folders are dynamically generated based on the time at which data is written. This way, the data in OSS are organized, partitioned, and distributed based on time, which follows the hive partition naming style. When the path does not use time variables, all files are delivered to an OSS folder whose name contains this prefix. ossBucket: the name of the OSS bucket. ossEndpoint: the endpoint of the region where an OSS bucket is located. ossStsRole: the Alibaba Cloud Resource Name (ARN) of the Tablestore service-linked role. format: The format in which the delivered data is stored. Default value: Parquet. By default, PLAIN is used to encode all types of data for delivery. Currently, only Parquet is supported. You do not need to specify this parameter. eventTimeColumn: the event time column. This parameter specifies that data is partitioned based on the time of the data in a column. You can specify a name and EventTimeFormat. Valid values for EventTimeFormat: RFC822, RFC850, RFC1123, RFC3339, and Unix. Specify this parameter based on your requirements. If you do not specify this parameter, data is partitioned based on the time at which the data is written to Tablestore. parquetSchema: the column you want to deliver. You need to specify the source fields, destination fields, and destination field types to deliver. You can specify the order and name of the fields that you want to deliver in the schema. After data is delivered to OSS, it is distributed based on the order of fields in the schema. Important The data types need to be consistent between the source and destination fields. Otherwise, the fields are discarded as dirty data. For more information, see Data type mappings.
taskType	The type of the delivery task. Default value: BASE_INC. Valid values: INC: incremental data delivery. Only incremental data is synchronized. BASE: full data delivery. All data in the tables is scanned and synchronized. BASE_INC: differential data delivery. After full data is synchronized, Tablestore synchronizes incremental data. When Tablestore synchronizes incremental data, you can view the time when data is last delivered and the status of the current delivery task.

Examples

import com.alicloud.openservices.tablestore.ClientException;
import com.alicloud.openservices.tablestore.SyncClient;
import com.alicloud.openservices.tablestore.TableStoreException;
import com.alicloud.openservices.tablestore.model.delivery.*;
public class DeliveryTask {

        public static void main(String[] args) {
            final String endPoint = "https://yourinstancename.cn-hangzhou.ots.aliyuncs.com";

            final String accessKeyId = System.getenv("OTS_AK_ENV");
            
            final String accessKeySecret = System.getenv("OTS_SK_ENV");

            final String instanceName = "yourinstancename";

            SyncClient client = new SyncClient(endPoint, accessKeyId, accessKeySecret, instanceName);
            try {
                createDeliveryTask(client);
                System.out.println("end");
            } catch (TableStoreException e) {
                System.err.println("The operation failed. Details:" + e.getMessage() + e.getErrorCode() + e.toString());
                System.err.println("Request ID:" + e.getRequestId());
            } catch (ClientException e) {
                System.err.println("The request failed. Details:" + e.getMessage());
            } finally {
                client.shutdown();
            }
        }

        private static void createDeliveryTask(SyncClient client){
            String tableName = "sampleTable";
            String taskName = "sampledeliverytask";
            OSSTaskConfig taskConfig = new OSSTaskConfig();
            taskConfig.setOssPrefix("sampledeliverytask/year=$yyyy/month=$MM");
            taskConfig.setOssBucket("datadeliverytest");
            taskConfig.setOssEndpoint("oss-cn-hangzhou.aliyuncs.com");
            taskConfig.setOssStsRole("acs:ram::17************45:role/aliyunserviceroleforotsdatadelivery");
            // eventColumn is optional. This parameter specifies that data is partitioned based on the time of data in a column. If you do not specify this parameter, data is partitioned based on the time when the data is written to Tablestore. 
            EventColumn eventColumn = new EventColumn("Col1", EventTimeFormat.RFC1123);
            taskConfig.setEventTimeColumn(eventColumn);
            taskConfig.addParquetSchema(new ParquetSchema("PK1", "PK1", DataType.UTF8));
            taskConfig.addParquetSchema(new ParquetSchema("PK2", "PK2", DataType.BOOL));
            taskConfig.addParquetSchema(new ParquetSchema("Col1", "Col1", DataType.UTF8));
            CreateDeliveryTaskRequest request = new CreateDeliveryTaskRequest();
            request.setTableName(tableName);
            request.setTaskName(taskName);
            request.setTaskConfig(taskConfig);
            request.setTaskType(DeliveryTaskType.BASE_INC);
            CreateDeliveryTaskResponse response = client.createDeliveryTask(request);
            System.out.println("resquestID: "+ response.getRequestId());
            System.out.println("traceID: " + response.getTraceId());
            System.out.println("create delivery task success");
        }
}