All Products
Search
Document Center

DataWorks:DRDS (PolarDB-X) data source

Last Updated:Mar 20, 2024

DataWorks provides DRDS (PolarDB-X) Reader and DRDS (PolarDB-X) Writer for you to read data from and write data to DRDS (PolarDB-X) data sources. This topic describes the capabilities of synchronizing data from or to DRDS (PolarDB-X) data sources.

Supported DRDS (PolarDB-X) versions

Real-time data read

Non-read-only instances of DRDS (PolarDB-X 1.0) and PolarDB-X 2.0 are supported. For more information about how to create a DRDS (PolarDB-X) instance, see Create a PolarDB-X 1.0 instance or Create a PolarDB-X 2.0 instance.

Limits

Batch data read and write

  • DRDS (PolarDB-X) Reader supports only MySQL engines. DRDS (PolarDB-X) is a distributed MySQL database service. Most communication protocols that are used by DRDS (PolarDB-X) are the same as the communication protocols that are used by MySQL.

  • Only exclusive resource groups for Data Integration can be used to read data from DRDS (PolarDB-X) instances that run MySQL 8.0.

  • DRDS (PolarDB-X) Writer connects to the proxy of a remote DRDS (PolarDB-X) database by using Java Database Connectivity (JDBC) and executes the REPLACE INTO statement to write data to the DRDS (PolarDB-X) database.

    To execute the REPLACE INTO statement, you must make sure that your table has the primary key or a unique index to prevent duplicate data.

  • DRDS (PolarDB-X) Writer obtains data from a reader and executes the REPLACE INTO statement to write the data to the destination database. If no primary key conflict or unique index conflict occurs, data is processed in the same way as that when you execute the INSERT INTO statement. If a conflict occurs, data in conflicting rows in the destination table is replaced by new data. DRDS (PolarDB-X) Writer sends data to the DRDS (PolarDB-X) proxy when the amount of buffered data reaches a specific threshold. The proxy determines whether to write the data to one or more tables and how to route the data when the data is written to multiple tables.

    Note

    A synchronization task that uses DRDS (PolarDB-X) Writer must have at least the permissions to execute the REPLACE INTO statement. Whether other permissions are required depends on the SQL statements that you specify in the preSql and postSql parameters when you configure the task.

  • Data of views can be read.

Real-time data read

  • Only the data sources that are added in DataWorks in Alibaba Cloud instance mode are supported. If you add a data source in connection string mode and use the data source in a data synchronization task, the task fails.

  • The storage type must be PolarDB for MySQL or ApsaraDB RDS (excluding ApsaraDB RDS for MySQL). ApsaraDB RDS can be used only for existing DRDS (PolarDB-X) instances and cannot be used for newly purchased DRDS (PolarDB-X) instances.

  • Real-time synchronization of data on which XA ROLLBACK statements are executed is not supported.

    For transaction data on which XA PREPARE statements are executed, you can use the real-time synchronization feature to synchronize the data to a destination. If XA ROLLBACK statements are executed later on the data, the rollback changes to the data cannot be synchronized to the destination. If the tables that you want to synchronize include tables on which XA ROLLBACK statements are executed, you must remove the tables on which XA ROLLBACK statements are executed from the real-time synchronization task and add the removed tables back to synchronize data.

Data type mappings

Most DRDS (PolarDB-X) data types are supported. Make sure that the data types of your database are supported.

The following table lists the data type mappings based on which DRDS (PolarDB-X) Reader or DRDS (PolarDB-X) Writer converts data types.

Category

DRDS (PolarDB-X) data type

Integer

INT, TINYINT, SMALLINT, MEDIUMINT, and BIGINT

Floating point

FLOAT, DOUBLE, and DECIMAL

String

VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT

Date and time

DATE, DATETIME, TIMESTAMP, TIME, and YEAR

Boolean

BIT and BOOLEAN

Binary

TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, and VARBINARY

Preparations before data synchronization

You need to create an account that is used to log on to the DRDS (PolarDB-X) databases for subsequent operations and grant the required permissions to the account. For more information, see Create an account.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.

Add a data source

Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

Configure a batch synchronization task to synchronize data of a single table

Configure synchronization settings to implement batch synchronization of all data in a database or real-time synchronization of full and incremental data in a single table or a database

For more information about the configuration procedure, see Configure a synchronization task in Data Integration.

Additional information

  • Consistent views

    DRDS (PolarDB-X) cannot provide a consistent view for multiple tables in multiple databases because it is a distributed database service. DRDS (PolarDB-X) Reader obtains different snapshots from different table shards, but cannot obtain the snapshot of table shards at the same time slice. As a result, DRDS (PolarDB-X) Reader cannot ensure strong consistency for data queries.

  • Character encoding

    DRDS (PolarDB-X) supports flexible encoding configurations. You can specify the encoding format for an entire instance and specific fields, tables, and databases. The configurations at the field, table, database, and instance levels are prioritized in descending order. We recommend that you use UTF-8 for a database.

    DRDS (PolarDB-X) Reader uses JDBC to read data. This enables DRDS (PolarDB-X) Reader to automatically convert the encoding formats of characters. Therefore, you do not need to specify the encoding format.

    If you specify an encoding format for a DRDS (PolarDB-X) database but data is written to the DRDS (PolarDB-X) database in a different encoding format, DRDS (PolarDB-X) Reader cannot recognize this inconsistency and may export garbled characters.

  • Incremental data synchronization

    DRDS (PolarDB-X) Reader uses JDBC to connect to a database and uses a SELECT statement with a WHERE clause to read incremental data.

    • For batch data, incremental add, update, and delete operations (including logically delete operations) are distinguished by timestamps. Specify the WHERE clause based on a specific timestamp. The time indicated by the timestamp must be later than the time indicated by the latest timestamp in the previous synchronization.

    • For streaming data, specify the WHERE clause based on the ID of a specific record. The ID must be greater than the maximum ID involved in the previous synchronization.

    If the data that is added or modified cannot be distinguished, DRDS (PolarDB-X) Reader can read only full data.

  • You cannot configure filter conditions for physical tables in the WHERE clause.

Appendix: Code and parameters

Appendix: Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.

Code for DRDS (PolarDB-X) Reader

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"drds",// The plug-in name.
            "parameter":{
                "datasource":"",// The name of the data source.
                "column":[// The names of the columns.
                    "id",
                    "name"
                ],
                "where":"",// The WHERE clause.
                "table":"",// The name of the table.
                "splitPk": ""// The shard key.
            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"stream",// The plug-in name.
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1,// The maximum number of parallel threads.
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Parameters in code for DRDS (PolarDB-X) Reader

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.

Yes

No default value

table

The name of the table from which you want to read data.

Yes

No default value

column

The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [*], which indicates all the columns in the source table.

  • You can select specific columns to read.

  • The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table.

  • Constants are supported. The column names must be arranged in compliance with the SQL syntax supported by MySQL, such as ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3", "true"]. Description of the parameters in the preceding example:

    • id: a column name.

    • table: the name of a column that contains reserved keywords.

    • 1: an integer constant.

    • bazhen.csy: a string constant.

    • null: a null pointer.

    • to_char(a + 1): a function expression that is used to calculate the length of a string.

    • 2.3: a floating-point constant.

    • true: a Boolean value.

  • The column parameter must explicitly specify all the columns from which you want to read data. The parameter cannot be left empty.

Yes

No default value

where

The WHERE clause. DRDS (PolarDB-X) Reader generates an SQL statement based on the settings of the column, table, and where parameters, and uses the generated statement to read data.

  • You can use the WHERE clause to read incremental data.

  • If the where parameter is not configured or is left empty, DRDS (PolarDB-X) Reader reads full data.

For example, you can set this parameter to STRTODATE('${bdp.system.bizdate}','%Y%m%d') <= today AND today < DATEADD(STRTODATE('${bdp.system.bizdate}', '%Y%m%d'), interval 1 day) to read the data that is generated on the current day.

No

No default value

Code for DRDS (PolarDB-X) Writer

{
    "type":"job",
    "version":"2.0",// The version number. 
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
                },
        {
            "stepType":"drds",// The plug-in name. 
            "parameter":{
                "postSql":[],// The SQL statement that you want to execute after the synchronization task is run. 
                "datasource":"",// The name of the data source. 
                "column":[// The names of the columns. 
                "id"
                ],
                "writeMode":"insert ignore",
                "batchSize":"1024",// The number of data records to write at a time. 
                "table":"test",// The name of the table. 
                "preSql":[]// The SQL statement that you want to execute before the synchronization task is run. 
                },
            "name":"Writer",
            "category":"writer"
                }
                ],
    "setting":{
        "errorLimit":{
        "record":"0"// The maximum number of dirty data records allowed. 
            },
        "speed":{
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
                }
            },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
                }
            ]
        }
    }

Parameters in code for DRDS (PolarDB-X) Writer

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.

Yes

No default value

table

The name of the table to which you want to write data.

Yes

No default value

writeMode

The write mode. Valid values:

  • insert ignore: If a primary key conflict or unique index conflict occurs, the source data cannot be written.

  • replace into: If a primary key conflict or unique index conflict occurs, the original data is deleted, and new data is inserted.

No

insert ignore

column

The names of the columns to which you want to write data. Separate the names with commas (,), such as "column": ["id","name","age"]. If you want to write data to all the columns in the destination table, set this parameter to an asterisk (*), such as "column": ["*"].

Yes

No default value

preSql

The SQL statement that you want to execute before the synchronization task is run. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor.

For example, you can set this parameter to delete * from table xxx; to delete data from the xxx table before data synchronization. You can configure this parameter based on your business requirements.

No

No default value

postSql

The SQL statement that you want to execute after the synchronization task is run. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor.

For example, you can set this parameter to delete * from table xxx where xx=xx; to delete specific data from the xxx table after data synchronization. You can configure this parameter based on your business requirements.

No

No default value

batchSize

The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and DRDS (PolarDB-X) and increases throughput. If you set this parameter to an excessively large value, an out of memory (OOM) error may occur during data synchronization.

No

1,024