AnalyticDB for MySQL 3.0数据同步的能力支持情况 - DataWorks

The AnalyticDB for MySQL 3.0 data source lets you read from and write data to AnalyticDB for MySQL 3.0. This topic describes the data synchronization capabilities of DataWorks for AnalyticDB for MySQL 3.0.

Limitations

You cannot configure or run synchronization tasks for a data lakehouse edition data source on a shared resource group.
If you switch an AnalyticDB for MySQL instance from the data warehouse edition to the data lakehouse edition, synchronization tasks that use this data source and run on a shared resource group will fail. Before you switch the instance edition, check for such tasks and reconfigure them to run on a Serverless resource group (recommended) or an exclusive Data Integration resource group.
Batch synchronization supports reading from views.

Supported data types

Batch read

The AnalyticDB for MySQL 3.0 Reader supports the following data types.

Type	AnalyticDB for MySQL 3.0 data type
Integer types	INT, INTEGER, TINYINT, SMALLINT, and BIGINT
Floating-point types	FLOAT, DOUBLE, and DECIMAL
String types	VARCHAR
Date and time types	DATE, DATETIME, TIMESTAMP, and TIME
Boolean types	BOOLEAN

Batch write

The AnalyticDB for MySQL 3.0 Writer supports the following data types.

Type	AnalyticDB for MySQL 3.0 data type
Integer types	INT, INTEGER, TINYINT, SMALLINT, and BIGINT
Floating-point types	FLOAT, DOUBLE, and DECIMAL
String types	VARCHAR
Date and time types	DATE, DATETIME, TIMESTAMP, and TIME
Boolean types	BOOLEAN

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a single-table batch synchronization task

For instructions, see Configure in codeless UI and Configure in code editor.
For a list of all parameters and a sample script for the code editor, see Appendix: Script examples and parameters.

Appendix: Script examples and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script example

{
  "type": "job",
  "steps": [
    { 
      "stepType": "analyticdb_for_mysql", // The plugin name.
      "parameter": {
        "column": [ // The columns to read.
          "id",
          "value",
          "table"
        ],
        "connection": [
          {
            "datasource": "xxx", // The data source name.
            "table": [ // The table name.
              "xxx"
            ]
          }
        ],
        "where": "", // The filter condition.
        "splitPk": "", // The split key.
        "encoding": "UTF-8" // The encoding format.
      },
      "name": "Reader",
      "category": "reader"
    },
    { 
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "version": "2.0",
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  },
  "setting": {
    "errorLimit": {
      "record": "0" // The maximum number of error records allowed.
    },
    "speed": {
      "throttle":true,// A value of `true` enables throttling and respects the `mbps` limit. A value of `false` disables it.
      "concurrent":1, // The concurrency level for the task.
      "mbps":"12"// The rate limit in MB/s.
    }
  }
}

Reader parameters

Parameter	Description	Required	Default
datasource	The name of your DataWorks data source.	Yes	None
table	The source table name.	Yes	None
column	An array of columns that you want to synchronize from the source table. The default value `[*]` indicates all columns. You can select a subset of columns to export. You can specify the columns in any order, regardless of their order in the table schema. You can use constants. You must follow the MySQL syntax. Example: ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"]. `id` is a standard column name. `table` is a reserved keyword used as a column name. `1` is an integer constant. `bazhen.csy` is a string constant. `null` specifies a NULL value. `to_char(a + 1)` is a function expression. `2.3` is a floating-point number. `true` is a Boolean value. You must explicitly specify the columns to synchronize. The column array cannot be empty.	Yes	None
splitPk	When AnalyticDB for MySQL 3.0 Reader extracts data, if you specify the splitPk parameter, the system shards the data based on the field specified by splitPk. As a result, data synchronization is performed by using concurrent tasks, which improves efficiency. For best performance, use the primary key of the table as the split key. The values in a primary key column are typically distributed evenly, which helps prevent data skew among the subtasks. The split key must be an integer data type. String, floating-point, and date types are not supported. If you specify a column of an unsupported data type, the split key is ignored, and the task runs serially. If you do not specify splitPk, including not providing splitPk or leaving the value of splitPk empty, the data of the table is synchronized by using a single channel.	No	None
where	A filter condition. For example, to synchronize only the data from the current day, you can set the where condition to `gmt_create>$bizdate`. The where clause can be used to effectively perform incremental business data synchronization. If you do not specify a where statement, which includes not providing the where key or value, the data synchronization is treated as a full data synchronization. You cannot specify a condition like LIMIT 10. The condition must conform to the MySQL `WHERE` clause syntax.	No	None

Writer script example

{
  "type": "job",
  "steps": [
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "analyticdb_for_mysql", // The plugin name.
      "parameter": {
        "postSql": [], // SQL statements to execute after the import.
        "tableType": null, // Reserved field. The default value is null.
        "datasource": "hangzhou_ads", // The data source name.
        "column": [ // The columns to synchronize.
          "id",
          "value"
        ],
        "guid": null,
        "writeMode": "insert", // The write mode. See the description of the writeMode parameter.
        "batchSize": 2048, // The number of records in each batch write. See the description of the batchSize parameter.
        "encoding": "UTF-8", // The encoding format.
        "table": "t5", // The destination table name.
        "preSql": [] // SQL statements to execute before the import.
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "version": "2.0", // The version of the configuration file format.
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  },
  "setting": {
    "errorLimit": {
      "record": "0" // The maximum number of error records allowed.
    },
    "speed": {
      "throttle":true,// A value of `true` enables throttling and respects the `mbps` limit. A value of `false` disables it.
      "concurrent":2, // The concurrency level for the task.
      "mbps":"12"// The rate limit in MB/s.
    }
  }
}

Writer parameters

Parameter	Description	Required	Default
datasource	The name of your DataWorks data source.	Yes	None
table	The destination table name.	Yes	None
writeMode	The method for writing data. Valid values are insert, replace, and update. `insert`: Inserts new records. If a primary key or unique index conflict occurs, the record is skipped. `replace`: Inserts new records. If a primary key or unique index conflict occurs, the new record replaces the existing one. `update`: Inserts new records. If a primary key or unique index conflict occurs, the existing record is updated with the new values. Note This mode is available only in the code editor.	No	insert
column	The columns in the destination table to which you want to write data. Separate column names with commas. Example: `"column": ["id", "name", "age"]`. To write data to all columns in order, use an asterisk (). Example: `"column": [""]`. Note If a column name contains a reserved keyword like select, enclose it in backticks (). For example, specify item_select_no as `item_select_no`.	Yes	None
preSql	One or more SQL statements to execute before the write task begins. The codeless UI supports one statement, while the code editor supports multiple. Note Transactions are not supported for multiple SQL statements.	No	None
postSql	One or more SQL statements to execute after the write task finishes. The codeless UI supports one statement, while the code editor supports multiple. Note Transactions are not supported for multiple SQL statements.	No	None
batchSize	A larger value can improve throughput but also increases memory consumption, which may lead to out-of-memory (OOM) errors.	No	1,024