When your source database isn't supported by Data Transmission Service (DTS), or when security constraints prevent DTS from connecting directly to it, you can use the data shipping SDK to push data into DTS manually. DTS then synchronizes the data to a destination AnalyticDB for PostgreSQL (ADB for PostgreSQL) instance.
When to use data shipping
Use data shipping in these situations:
Unsupported source database: The source is a database that DTS doesn't natively support, such as a database provided by another Chinese vendor.
Non-standard data types: The data is log data or another special type that requires custom encoding before ingestion.
Restricted direct access: The source database can't expose its credentials or network endpoint directly to DTS for security reasons.
If your source database is already supported by DTS, use a standard data synchronization task instead. See the list of databases supported by DTS.
How it works
Data shipping follows a two-phase setup:
Create a data shipping instance in the DTS console. This provisions the channel that receives data from the SDK.
Configure and start the SDK. After the instance is created, retrieve the connection parameters from the instance details page, configure the SDK, and start pushing data.
DTS synchronizes the data from the shipping channel to the destination ADB for PostgreSQL instance.
Prerequisites
Before you begin, ensure that you have:
An ADB for PostgreSQL instance to receive the data. Only ADB for PostgreSQL is supported as a destination. See Create an instance
A database and schema created in the destination instance. In this guide, the schema is named dts_deliver_test. See the Import data section in "Use SQL to import vector data"
An AccessKey ID and AccessKey secret for the account that owns the data shipping instance. See Create an AccessKey pair
(Conditional) If you select Express Connect, VPN Gateway, or Smart Access Gateway as the access method, configure DTS access over VPN Gateway first. See Connect a data center to DTS by using VPN Gateway
Development capability to implement the data shipping SDK in your codebase
Important notes
Encoding capabilities are required to use the data shipping SDK to ship data from a data source to DTS.
The schema name in the destination ADB for PostgreSQL instance must exactly match the database name you enter in the Drop data object configuration step and the
dbNameparameter in the SDK. A mismatch causes the destination database to fail to receive data.After a data shipping instance is created, the number of shards cannot be changed.
Start the data shipping SDK immediately after the instance is created. If the SDK is not started promptly, incremental data cannot be collected and the instance fails.
Billing
See Billing overview.
Create a data shipping instance
Go to the Data Synchronization Tasks page.
Log on to the Data Management (DMS) console.
In the top navigation bar, click Data + AI.
In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.
Operations may vary based on the mode and layout of the DMS console. See Simple mode and Customize the layout and style of the DMS console. You can also go directly to the Data Synchronization Tasks page in the new DTS console.
Select the region where the data synchronization instance resides.
In the new DTS console, select the region in the top navigation bar.
Click Create Task. In the Create Task wizard, configure the source and destination databases.
Section Parameter Description N/A Task Name A name for the DTS task. DTS generates a name automatically. Specify a descriptive name to make the task easy to identify. The name does not need to be unique. Source Database Select a DMS database instance Do not select an existing instance. Configure the source parameters manually for the data shipping task. Database Type Select Data Shipping. Access Method The access method of the source. In this example, Public IP Address is selected. If you select Express Connect, VPN Gateway, or Smart Access Gateway, also configure the VPC and vSwitch parameters. Instance Region The region where the source database resides. If the region is not listed, select the geographically closest region. Destination Database Select a DMS database instance Select an existing database instance, or leave blank and configure the parameters below manually. If you select an existing instance, DTS populates the parameters automatically. Database Type Select AnalyticDB for PostgreSQL. Access Method Select Alibaba Cloud Instance. Instance Region The region where the destination ADB for PostgreSQL instance resides. Instance ID The ID of the destination ADB for PostgreSQL instance. Database Name The name of the database in the destination instance that receives data. Database Account The account for the destination instance. The account must have read and write permissions on the destination database. See Create and manage a database account. Database Password The password for the database account. Click Test Connectivity and Proceed. DTS automatically adds its server CIDR blocks to the whitelist of Alibaba Cloud database instances and to the security group rules of ECS-hosted databases. For self-managed databases in a data center or hosted by a third-party provider, manually add the DTS server CIDR blocks to the database whitelist. See Add the CIDR blocks of DTS servers.
WarningAdding DTS server CIDR blocks to your database whitelist or security group rules introduces security exposure. Before proceeding, take preventive measures such as strengthening credentials, restricting exposed ports, authenticating API calls, and auditing the whitelist regularly. Alternatively, connect the database to DTS through Express Connect, VPN Gateway, or Smart Access Gateway.
Configure the objects to ship and advanced settings.
Parameter Description Processing Mode of Conflicting Tables Precheck and Report Errors: checks whether the destination contains tables with the same names as the source. If identical names exist, the precheck fails and the task cannot start. To resolve naming conflicts without deleting or renaming destination tables, use object name mapping. See Map object names. Ignore Errors and Proceed: skips the precheck for identical table names. If the source and destination schemas match and a record has the same primary key or unique key value, full synchronization retains the existing destination record while incremental synchronization overwrites it. If the schemas differ, initialization may fail or only some columns are synchronized. Proceed with caution. Capitalization of Object Names in Destination Instance Controls the capitalization of database, table, and column names in the destination. Defaults to DTS default policy. See Specify the capitalization of object names in the destination instance. Drop data object configuration Configure the source databases and tables to ship: 1. Click Add Library. In the New Database dialog box, enter the source database name. This name must match the schema name of the destination ADB for PostgreSQL instance. In this example, enter dts_deliver_test. To add another database, click Add next to the existing entry. 2. Click OK. 3. Click the
icon next to the database to expand the list. 4. Click Add Table next to Table. In the Add Table dialog box, enter the table names. The table name must match the tableNameparameter in the SDK. In this example, enter tab1, tab2, and tab3. 5. Click OK. 6. (Optional) Configure table and column name mappings: click Edit next to a table, modify Table Name for table mappings, then clear Synchronize All Tables and modify Column Name and Map column name for column mappings. The Column Name value corresponds to thenamein thecreateFieldmethod ofFakeSource.java. The Map column name value is the column name in the destination instance. Click
to add more columns. Click OK when done.Click Next: Advanced Settings and configure the advanced parameters.
Parameter Description Dedicated Cluster for Task Scheduling By default, DTS schedules the task to a shared cluster. To use a dedicated cluster, purchase one separately. See What is a DTS dedicated cluster? Set Alerts Whether to enable alerting. No: alerting disabled. Yes: configure the alert threshold and notification settings. See Configure monitoring and alerting when you create a DTS task. Retry Time for Failed Connections How long DTS retries after a connection failure, in minutes. Valid values: 10–1440. Default: 720. Set this to more than 30 minutes. If DTS reconnects within this window, the task resumes. Otherwise, the task fails. NoteIf multiple tasks share the same source or destination, the shortest configured retry time applies to all. You are charged for the instance during retry periods.
Retry Time for Other Issues How long DTS retries after DDL or DML operation failures, in minutes. Valid values: 1–1440. Default: 10. Set this to more than 10 minutes. This value must be less than Retry Time for Failed Connections. Enable Throttling for Incremental Data Synchronization Whether to throttle incremental synchronization. If enabled, configure RPS of Incremental Data Migration and BPS of Incremental Data Migration to reduce load on the destination. Environment Tag A tag to identify the DTS instance. Select based on your environment. Configure ETL Whether to enable extract, transform, and load (ETL). Yes: enter data processing statements in the code editor. See Configure ETL in a data migration or data synchronization task. No: ETL disabled. See What is ETL? Save the task and run a precheck.
To preview the API parameters for this configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
Click Next: Save Task Settings and Precheck.
DTS runs a precheck before starting the task. If the precheck fails, click View Details next to each failed item, fix the issues, and click Precheck Again. If an alert is triggered for an item that can be safely ignored, click Confirm Alert Details, then click Ignore in the dialog box, click OK, and click Precheck Again. Ignoring an alert may result in data inconsistency.
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the Purchase Instance page, configure billing and instance settings.
Parameter Description Billing Method Subscription: pay upfront for a fixed term. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Suitable for short-term use. Release the instance when no longer needed to avoid unnecessary charges. Number of Shards The number of partitions in the destination topic. This cannot be changed after the instance is created. Resource Group Settings The resource group for the instance. Default: default resource group. See What is Resource Management? Instance Class The synchronization specification, which determines performance. See Instance classes of data synchronization instances. Subscription Duration Available only for the Subscription billing method. Valid values: 1–9 months, or 1, 2, 3, or 5 years. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog box.
The task appears in the task list. Start the data shipping SDK immediately after the instance is created.
Configure and start the SDK
After creating the instance, retrieve the connection parameters and configure the SDK.
Step 1: Add the SDK dependency
Open your project in an IDE (such as IntelliJ IDEA) and add the following dependency to pom.xml:
<dependency>
<groupId>com.aliyun.dts.deliver</groupId>
<artifactId>dts-deliver-client</artifactId>
<version>1.0.0</version>
</dependency>For the latest version, see the dts-deliver-client page.
Step 2: Download the sample code and retrieve connection parameters
Download the sample code from dts-deliver-test on GitHub. Use DtsDeliverTest.java in the dts-deliver-test folder as the starting point.
InFakeSource.java, thereadmethod shows an example data source implementation. Thenamefield increateFieldis the column name of the source table. Implement encoding based on your actual data source.
To retrieve connection parameters, go to the Data Synchronization Tasks page, click the ID of your data shipping instance, and navigate to Basic Information in the left pane. The Shipping Channel Information section contains the following values.
| Parameter | Description | Where to find it |
|---|---|---|
ip:port | Endpoint of the data shipping instance | In Shipping Channel Information, click Copy next to Public Endpoint or VPC Endpoint. Use VPC Endpoint only if the source database is in the same VPC as the data shipping instance. |
ak | AccessKey ID of the account that owns the instance | See Create an AccessKey pair and View the information about AccessKey pairs of a RAM user. |
secret | AccessKey secret | Same as above. |
dts_job_id | Task ID of the data shipping instance (not the instance ID) | Call the API. From the response, find the Shipped Topic value and extract the substring between _vpc_ and _data_delivery_. For example, if Shipped Topic is cn_hangzhou_vpc_cxti86dc11z*_data_delivery_version2, the dts_job_id is cxti86dc11z*. |
topic | Destination topic of the data shipping instance | In Shipping Channel Information, click Copy next to Shipped Topic. |
partition | Number of shards in the destination topic | In Shipping Channel Information, view the shard count. |
region | Region where the data shipping instance resides | In Shipping Channel Information, view Instance Region. |
dbName | Source database name. Must match the schema name of the destination ADB for PostgreSQL instance. | In this example: dts_deliver_test. |
tableName | Source table name. Must match the table name configured in Drop data object configuration. | In this example: tab1, tab2, tab3. |
Step 3: Start the SDK and set the offset
Note the current time, then start the data shipping SDK.
Update the current offset of the data shipping instance to the time when the SDK started. See Change the current offset of a data synchronization or migration instance.
By default, Current Offset is the time when the Incremental Write module started, not when the SDK started. Change it to the SDK start time to ensure data is collected from the correct point.
Step 4: Verify data flow
View the data synchronized to the destination database.