Use Data Transmission Service (DTS) to synchronize data from a self-managed Oracle database to a MaxCompute project. DTS supports schema synchronization, full data synchronization, and incremental data synchronization (change data capture via redo and archived logs).
How it works
DTS synchronizes data in three sequential phases:
Schema synchronization — DTS creates the destination tables in MaxCompute. Each table name gets a
_basesuffix. For example, if the source table iscustomer, DTS createscustomer_basein MaxCompute.Full data synchronization — DTS copies all existing rows from the source Oracle database to the corresponding
_basetables in MaxCompute.Incremental data synchronization — DTS reads Oracle redo and archived logs and writes change records to a separate incremental data table in MaxCompute. The incremental data table name gets a
_logsuffix. For example, changes tocustomerare written tocustomer_log.
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. For more information, see Billing overview. |
Supported SQL operations
| Operation type | SQL statements |
|---|---|
| DML (data manipulation language) | INSERT, UPDATE, DELETE |
| DDL (data definition language) | ADD COLUMN (attribute columns not supported) |
DTS does not synchronize foreign keys. Cascade and delete operations on the source database are not replicated to the destination.
Prerequisites
Before you begin, make sure you have:
Activated MaxCompute and created a MaxCompute project. See Activate MaxCompute and DataWorks and Create a MaxCompute project.
Configured an IP address whitelist on the MaxCompute client to allow DTS access. See Configure an IP address whitelist to allow access from Alibaba Cloud services to MaxCompute.
Created an AccessKey pair for the Alibaba Cloud account that owns the destination MaxCompute project. See Create an AccessKey pair.
NoteAlternatively, create a RAM user and set it as the super administrator for the MaxCompute project.
Configured the source Oracle database:
Running in ARCHIVELOG mode with accessible archived log files and an appropriate retention period. See Managing archived redo log files.
Supplemental logging enabled, with
SUPPLEMENTAL_LOG_DATA_PKandSUPPLEMENTAL_LOG_DATA_UIset to Yesalert notification settings. See Supplemental logging.A database account with the required permissions. See Prepare an Oracle database.
Reviewed the capabilities and limits of DTS for Oracle synchronization, and (optionally) evaluated migration feasibility with Advanced Database & Application Migration (ADAM). See Overview.
Limitations
Source database requirements
Tables must have a primary key or unique key, with all fields unique. Without this, duplicate rows may appear in the destination.
Oracle 12c and later: table names cannot exceed 30 bytes.
If you rename tables or columns at the destination (Selected Objects > right-click), a single task supports up to 1,000 tables. For larger workloads, split tables across multiple tasks or synchronize the entire database without renaming.
Oracle Real Application Cluster (RAC): use a virtual IP address (VIP), not a Single Client Access Name (SCAN) IP address. After you specify the VIP, node failover is not supported.
If the Oracle RAC database connects over Express Connect, specify a VIP when configuring the task.
Log retention requirements
The redo logs and archived logs must meet the following minimum retention periods:
| Synchronization mode | Minimum retention period |
|---|---|
| Incremental data synchronization only | 24 hours |
| Full data synchronization + incremental data synchronization | 7 days |
If logs are purged before DTS can read them, the task fails. In edge cases, data inconsistency or loss may occur. After full data synchronization completes, you can reduce the retention period to 24 hours.
Retention periods below these minimums may cause DTS to miss log data, which can compromise the Service Level Agreement (SLA) guarantees.
Other limitations
Do not perform a primary/secondary switchover while the task is running — the task will fail.
Empty
VARCHAR2strings (treated as NULL by Oracle) cause failures if the corresponding destination column has aNOT NULLconstraint.Tables with the Fine-Grained Audit (FGA) policy enabled: DTS cannot read the
ORA_ROWSCNpseudocolumn and the task fails. Disable the FGA policy on affected tables, or exclude them from synchronization.Do not update
LONGTEXTfields while synchronization is running.Do not run DDL operations during schema synchronization or initial full data synchronization.
Do not use Oracle Data Pump to write to the source database during incremental data synchronization — data loss may occur.
External tables cannot be synchronized.
MaxCompute does not support primary key constraints. If network errors occur during synchronization, DTS may write duplicate rows to MaxCompute.
Write to the destination MaxCompute project only through DTS to prevent data inconsistency.
If DTS needs to recover a failed instance, the process may take up to 8 hours and may involve restarting the instance or adjusting DTS instance parameters (database parameters are not modified). For the parameters that may be changed, see Modify instance parameters.
Performance considerations
Synchronize data during off-peak hours. Full data synchronization consumes read and write resources on both the source and destination databases.
After full data synchronization, the destination tablespace may be larger than the source due to fragmentation from concurrent INSERT operations.
Synchronization latency is calculated from the timestamp of the latest synchronized record in the destination versus the current timestamp in the source. If no DML operations run on the source for an extended period, the latency reading may be inaccurate. Run a DML operation on the source to refresh it, or enable a heartbeat table if you synchronize an entire database.
Required database account permissions
| Database | Required permissions | Reference |
|---|---|---|
| Self-managed Oracle database | Fine-grained permissions | Prepare a database account, CREATE USER, GRANT |
To synchronize incremental data, archive logging and supplemental logging must be enabled on the Oracle database. See Configure an Oracle database.
Create a data synchronization task
Step 1: Open the Data Synchronization page
Use one of the following consoles to start:
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the data synchronization task will run.
DMS console
The exact navigation path may vary based on the DMS console mode. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list next to Data Synchronization Tasks, select the region where the task will run.
Step 2: Configure source and destination databases
Click Create Task to open the task configuration page. Configure the parameters as follows:
| Section | Parameter | Description |
|---|---|---|
| N/A | Task Name | A name for the DTS task. DTS generates a name automatically. Specify a descriptive name for easy identification. Task names do not need to be unique. |
| Source Database | Select Existing Connection | Select a registered database instance from the drop-down list to auto-populate the connection parameters. If the instance is not registered, configure the parameters manually. |
| Database Type | Select Oracle. | |
| Access Method | Select the method that matches where the Oracle database is deployed. This example uses Self-managed Database on ECS. For other access methods, see Preparation overview. | |
| Instance Region | The region where the source Oracle database resides. | |
| ECS Instance ID | The ID of the ECS instance running the source Oracle database. | |
| Port Number | The Oracle service port. Default: 1521. | |
| Oracle Type | Non-RAC Instance: requires the SID (system ID). RAC or PDB Instance: requires the Service Name. This example uses RAC or PDB Instance with a Service Name. | |
| Database Account | The Oracle account. For required permissions, see Required database account permissions. | |
| Database Password | The password for the Oracle account. | |
| Destination Database | Select Existing Connection | Select a registered database instance from the drop-down list to auto-populate the connection parameters. If the instance is not registered, configure the parameters manually. |
| Database Type | Select MaxCompute. | |
| Access Method | Select Alibaba Cloud Instance. | |
| Instance Region | The region where the destination MaxCompute project resides. | |
| Project | The name of the destination MaxCompute project. | |
| AccessKey ID of Alibaba Cloud Account | The AccessKey ID from the AccessKey pair you prepared in the Prerequisites section. | |
| AccessKey Secret of Alibaba Cloud Account | The AccessKey secret from the AccessKey pair you prepared in the Prerequisites section. |
Step 3: Test connectivity
At the bottom of the page, click Test Connectivity and Proceed. In the CIDR Blocks of DTS Servers dialog box, click Test Connectivity.
The CIDR blocks of DTS servers must be added to the security settings of both the source and destination databases. This can be done automatically or manually. See Add DTS server IP addresses to a whitelist.
Click OK to grant permissions to your MaxCompute account, then click Test Connectivity and Proceed.
Step 4: Configure objects to synchronize
In the Configure Objects step, set the following parameters:
| Parameter | Description |
|---|---|
| Synchronization Types | Incremental Data Synchronization is selected by default. Also select Schema Synchronization and Full Data Synchronization to run all three phases. |
| Naming Rules of Additional Columns | DTS adds extra columns to destination tables. If an additional column name conflicts with an existing column, the task fails and data loss can occur. Select New Rule or Previous Rule based on your requirements. Check for naming conflicts before setting this parameter. See Naming rules for additional columns. |
| Partition Definition of Incremental Data Table | Select partition names based on your requirements. See Partition. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors (default): checks for identical table names in source and destination. The precheck fails if conflicts exist. To resolve conflicts, use the object name mapping feature to rename destination tables. See Map object names. Ignore Errors and Proceed: skips the conflict check. During full synchronization, existing destination records are kept; during incremental synchronization, existing records are overwritten. If schemas differ, initialization may fail or only some columns may be synchronized. |
| Capitalization of Object Names in Destination Instance | Controls the case of database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names in the destination instance. |
| Source Objects | Select one or more tables and click |
| Selected Objects | To rename a single object: right-click it and specify a new name. See Map the name of a single object. To rename multiple objects: click Batch Edit. See Map multiple object names at a time. To filter rows: right-click the table and set filter conditions. See Specify filter conditions. Renaming an object may cause other objects that depend on it to fail synchronization. |
Step 5: Configure advanced settings
Click Next: Advanced Settings and configure the following:
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS schedules tasks to a shared cluster. Purchase a dedicated cluster for improved stability. See What is a DTS dedicated cluster. |
| Retry Time for Failed Connections | How long DTS retries after a connection failure. Valid values: 10–1440 minutes. Default: 720 minutes. Set this to at least 30 minutes. If multiple tasks share the same source or destination database, the shortest retry time applies. DTS charges for the instance during retries. |
| Retry Time for Other Issues | How long DTS retries after a DDL or DML failure. Valid values: 1–1440 minutes. Default: 10 minutes. Set this to a value greater than 10 minutes, and to a value smaller than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Synchronization | Throttle the full synchronization to reduce load on the database servers. Configure QPS (queries per second) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. |
| Enable Throttling for Incremental Data Synchronization | Throttle the incremental synchronization by configuring RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). |
| Actual Write Code | The encoding format for data written to the destination. Select based on your requirements. |
| Environment Tag | An optional tag to identify the DTS instance. |
| Configure ETL | Enables the extract, transform, and load (ETL) feature. Select Yes to enter data processing statements in the code editor. See Configure ETL in a data migration or data synchronization task. Select No to skip. See also What is ETL? |
| Monitoring and Alerting | Configure alerts for task failures or high synchronization latency. Select Yes and set the alert threshold and notification contacts. See Configure monitoring and alerting when you create a DTS task. |
Step 6: Run the precheck
Click Next: Save Task Settings and Precheck.
To preview the API parameters for this task configuration, hover over the button and click Preview OpenAPI parameters.
DTS runs a precheck before the task can start. If the precheck fails:
Click View Details next to each failed item, resolve the issue, and click Precheck Again.
For alert items that can be ignored: click Confirm Alert Details, then Ignore in the dialog, then OK, then Precheck Again. Ignored alerts may result in data inconsistency.
Step 7: Purchase a data synchronization instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the following:
Section Parameter Description New Instance Class Billing Method Subscription: pay upfront for a fixed term — more cost-effective for long-term use. Pay-as-you-go: billed hourly — suitable for short-term use. Release the instance when no longer needed to stop charges. Resource Group Settings The resource group for the instance. Default: default resource group. See What is Resource Management? Instance Class The synchronization throughput class. See Instance classes of data synchronization instances. Subscription Duration Available for the Subscription billing method. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years. Read and accept Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog.
The task appears in the task list. Monitor its progress from the list.
Incremental data table schema
Run set odps.sql.allow.fullscan=true; in MaxCompute to allow full table scans on the project.
DTS writes incremental change records to the _log table in MaxCompute. Each row in the incremental data table represents one change event and includes the following metadata fields in addition to the source table columns:
| Field | Description |
|---|---|
record_id | Unique identifier for the log entry. Auto-increments for each new entry. For an UPDATE operation, DTS generates two entries (pre-update and post-update values) with the same record_id. |
operation_flag | The operation type: I (INSERT), D (DELETE), or U (UPDATE). |
utc_timestamp | The operation timestamp in UTC. Also the timestamp of the binary log file. |
before_flag | Whether the row contains pre-update values: Y or N. |
after_flag | Whether the row contains post-update values: Y or N. |