When you need to load data from a relational database such as MySQL into ApsaraDB for SelectDB, you can use the Data Integration feature of DataWorks with SelectDB Writer to run an offline batch synchronization job — without writing custom ingestion code.
This topic walks you through adding the required data sources in DataWorks and configuring a batch synchronization task that reads from MySQL and writes to ApsaraDB for SelectDB.
Limits
Data Integration only supports offline writes to ApsaraDB for SelectDB. Real-time streaming is not supported.
SelectDB Writer cannot write fields of the following data types: BITMAP, HyperLogLog (HLL), or QUANTILE_STATE.
Prerequisites
Before you begin, make sure you have:
An ApsaraDB for SelectDB instance
A MySQL database containing the data to synchronize
Access to the DataWorks console with permissions to create data synchronization tasks
Add data sources
Before configuring the synchronization task, add both a MySQL data source and an ApsaraDB for SelectDB data source to DataWorks.
Add a MySQL data source. For details, see MySQL data source.
Add an ApsaraDB for SelectDB data source. For details, see Add and manage data sources. Use the following parameters:
If MySQL and ApsaraDB for SelectDB are in the same virtual private cloud (VPC), use the VPC endpoint. If they are in different VPCs, use the Public endpoint.
ImportantAdd the IP addresses of your DataWorks resource groups to the IP address whitelist of ApsaraDB for SelectDB. Exclusive resource group IPs and shared resource group IPs must be added to separate whitelists. For details, see Configure an IP address whitelist.
Parameter Description Data Source Name The name of the data source. JDBC URL The Java Database Connectivity (JDBC) connection string. Format: jdbc:mysql://<ip>:<port>/<dbname>. Find the VPC endpoint and MySQL port in the Network information section on the Basic information tab of your instance. Example:jdbc:mysql://selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:9030/test_dbHTTP address The HTTP access address. Format: <ip>:<port>. Find the VPC endpoint and HTTP port in the Network information section. Example:selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080Username The username of the ApsaraDB for SelectDB instance owner account. Password The password of the ApsaraDB for SelectDB instance owner account.
Configure a batch synchronization task
DataWorks provides two ways to configure the synchronization task:
| Method | Best for |
|---|---|
| Codeless UI | Visual configuration without writing scripts |
| Code editor | Full control over the synchronization script (recommended for complex mappings) |
Configure a batch synchronization task by using the codeless UI
Configure a batch synchronization task by using the code editor
The following sections provide a complete code example and parameter reference for the code editor approach.
Sample code and parameters
Code example
The following script uses MySQL Reader and SelectDB Writer to synchronize data from a MySQL table to ApsaraDB for SelectDB.
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "mysql",
"parameter": {
"column": [
"<id>",
"<table_id>",
"<table_no>",
"<table_name>",
"<table_status>"
],
"connection": [
{
"datasource": "<mysql_datasource>",
"table": [
"<mysql_table_name>"
]
}
],
"where": "",
"splitPk": "",
"encoding": "UTF-8"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "selectdb",
"parameter": {
"postSql": [],
"preSql": [],
"username": "<selectdb_username>",
"password": "<selectdb_password>",
"loadUrl": [
"<ip:port>"
],
"column": [
"<id>",
"<table_id>",
"<table_no>",
"<table_name>",
"<table_status>"
],
"connection": [
{
"datasource": "<selectdb_datasource>",
"table": [
"<selectdb_table_name>"
]
}
],
"maxBatchRows": 1000000,
"loadProps": {
"format": "csv",
"column_separator": "\\x01",
"line_delimiter": "\\x02"
}
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": false,
"concurrent": 1
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Replace the placeholders with your actual values:
| Placeholder | Description | Example |
|---|---|---|
<mysql_datasource> | Name of the MySQL data source added in DataWorks | my_mysql_source |
<mysql_table_name> | Source table name in MySQL | orders |
<selectdb_username> | ApsaraDB for SelectDB username | admin |
<selectdb_password> | ApsaraDB for SelectDB password | — |
<ip:port> | VPC endpoint and HTTP port of the SelectDB instance | selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080 |
<selectdb_datasource> | Name of the ApsaraDB for SelectDB data source added in DataWorks | my_selectdb_source |
<selectdb_table_name> | Destination table name in SelectDB | orders_replica |
SelectDB Writer parameters
datasource
The name of the data source configured in DataWorks. Must match the name used when adding the data source.
Required: Yes
table
The name of the destination table to write data into.
Required: Yes
column
The column names to write to in the destination table. Separate multiple column names with commas. To write to all columns, use ["*"].
Required: Yes
Example:
["id", "name", "age"]
loadUrl
The HTTP endpoint of the ApsaraDB for SelectDB instance. Use the VPC endpoint and HTTP port.
Required: Yes
Format:
ip:portExample:
selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080
username
The username for accessing the ApsaraDB for SelectDB database.
Required: Yes
password
The password for accessing the ApsaraDB for SelectDB database.
Required: Yes
preSql
SQL statements to run before the synchronization task starts — for example, to delete stale data. The codeless UI supports one statement; the code editor supports multiple.
Required: No
postSql
SQL statements to run after the synchronization task completes — for example, to add a timestamp. The codeless UI supports one statement; the code editor supports multiple.
Required: No
maxBatchRows
The maximum number of rows written per batch. maxBatchRows controls how frequently data is flushed to SelectDB — a higher value means fewer, larger batches.
Required: No
Default:
500000
loadProps
Parameters for the COPY INTO request that control the import format. If left blank or set to {}, JSON format is used by default.
Required: No
Default:
json
JSON format (default): The strip_outer_array parameter is automatically set to true.
"loadProps": {
"format": "json",
"strip_outer_array": true
}CSV format: Specify delimiters explicitly. If you do not specify the row and column delimiters, the imported data is converted into strings by default — \t is used as the column delimiter and \n as the row delimiter.
"loadProps": {
"format": "csv",
"column_separator": "\\x01",
"line_delimiter": "\\x02"
}Supported data types
SelectDB Writer supports the following ApsaraDB for SelectDB data types for offline writes.
| Data type | Supported |
|---|---|
| INT | Yes |
| BIGINT | Yes |
| LARGEINT | Yes |
| SMALLINT | Yes |
| TINYINT | Yes |
| BOOLEAN | Yes |
| DECIMAL | Yes |
| DOUBLE | Yes |
| FLOAT | Yes |
| CHAR | Yes |
| VARCHAR | Yes |
| STRING | Yes |
| DATE | Yes |
| DATEV2 | Yes |
| DATETIME | Yes |
| DATETIMEV2 | Yes |
| ARRAY | Yes |
| JSONB | Yes |
| BITMAP | No |
| HLL | No |
| QUANTILE_STATE | No |