Import data from MySQL to ApsaraDB for SelectDB using DataWorks - ApsaraDB for SelectDB

When you need to load data from a relational database such as MySQL into ApsaraDB for SelectDB, you can use the Data Integration feature of DataWorks with SelectDB Writer to run an offline batch synchronization job — without writing custom ingestion code.

This topic walks you through adding the required data sources in DataWorks and configuring a batch synchronization task that reads from MySQL and writes to ApsaraDB for SelectDB.

Limits

Data Integration only supports offline writes to ApsaraDB for SelectDB. Real-time streaming is not supported.
SelectDB Writer cannot write fields of the following data types: BITMAP, HyperLogLog (HLL), or QUANTILE_STATE.

Prerequisites

Before you begin, make sure you have:

An ApsaraDB for SelectDB instance
A MySQL database containing the data to synchronize
Access to the DataWorks console with permissions to create data synchronization tasks

Add data sources

Before configuring the synchronization task, add both a MySQL data source and an ApsaraDB for SelectDB data source to DataWorks.

Add a MySQL data source. For details, see MySQL data source.

Add an ApsaraDB for SelectDB data source. For details, see Add and manage data sources. Use the following parameters:

If MySQL and ApsaraDB for SelectDB are in the same virtual private cloud (VPC), use the VPC endpoint. If they are in different VPCs, use the Public endpoint.

Important

Add the IP addresses of your DataWorks resource groups to the IP address whitelist of ApsaraDB for SelectDB. Exclusive resource group IPs and shared resource group IPs must be added to separate whitelists. For details, see Configure an IP address whitelist.

Parameter	Description
Data Source Name	The name of the data source.
JDBC URL	The Java Database Connectivity (JDBC) connection string. Format: `jdbc:mysql://<ip>:<port>/<dbname>`. Find the VPC endpoint and MySQL port in the Network information section on the Basic information tab of your instance. Example: `jdbc:mysql://selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:9030/test_db`
HTTP address	The HTTP access address. Format: `<ip>:<port>`. Find the VPC endpoint and HTTP port in the Network information section. Example: `selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080`
Username	The username of the ApsaraDB for SelectDB instance owner account.
Password	The password of the ApsaraDB for SelectDB instance owner account.

Configure a batch synchronization task

DataWorks provides two ways to configure the synchronization task:

Method	Best for
Codeless UI	Visual configuration without writing scripts
Code editor	Full control over the synchronization script (recommended for complex mappings)

The following sections provide a complete code example and parameter reference for the code editor approach.

Sample code and parameters

Code example

The following script uses MySQL Reader and SelectDB Writer to synchronize data from a MySQL table to ApsaraDB for SelectDB.

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "mysql",
      "parameter": {
        "column": [
          "<id>",
          "<table_id>",
          "<table_no>",
          "<table_name>",
          "<table_status>"
        ],
        "connection": [
          {
            "datasource": "<mysql_datasource>",
            "table": [
              "<mysql_table_name>"
            ]
          }
        ],
        "where": "",
        "splitPk": "",
        "encoding": "UTF-8"
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "selectdb",
      "parameter": {
        "postSql": [],
        "preSql": [],
        "username": "<selectdb_username>",
        "password": "<selectdb_password>",
        "loadUrl": [
          "<ip:port>"
        ],
        "column": [
          "<id>",
          "<table_id>",
          "<table_no>",
          "<table_name>",
          "<table_status>"
        ],
        "connection": [
          {
            "datasource": "<selectdb_datasource>",
            "table": [
              "<selectdb_table_name>"
            ]
          }
        ],
        "maxBatchRows": 1000000,
        "loadProps": {
          "format": "csv",
          "column_separator": "\\x01",
          "line_delimiter": "\\x02"
        }
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "throttle": false,
      "concurrent": 1
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Replace the placeholders with your actual values:

Placeholder	Description	Example
`<mysql_datasource>`	Name of the MySQL data source added in DataWorks	`my_mysql_source`
`<mysql_table_name>`	Source table name in MySQL	`orders`
`<selectdb_username>`	ApsaraDB for SelectDB username	`admin`
`<selectdb_password>`	ApsaraDB for SelectDB password	—
`<ip:port>`	VPC endpoint and HTTP port of the SelectDB instance	`selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080`
`<selectdb_datasource>`	Name of the ApsaraDB for SelectDB data source added in DataWorks	`my_selectdb_source`
`<selectdb_table_name>`	Destination table name in SelectDB	`orders_replica`

SelectDB Writer parameters

datasource

The name of the data source configured in DataWorks. Must match the name used when adding the data source.

Required: Yes

table

The name of the destination table to write data into.

Required: Yes

column

The column names to write to in the destination table. Separate multiple column names with commas. To write to all columns, use ["*"].

Required: Yes
Example: ["id", "name", "age"]

loadUrl

The HTTP endpoint of the ApsaraDB for SelectDB instance. Use the VPC endpoint and HTTP port.

Required: Yes
Format: ip:port
Example: selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080

username

The username for accessing the ApsaraDB for SelectDB database.

Required: Yes

password

The password for accessing the ApsaraDB for SelectDB database.

Required: Yes

preSql

SQL statements to run before the synchronization task starts — for example, to delete stale data. The codeless UI supports one statement; the code editor supports multiple.

Required: No

postSql

SQL statements to run after the synchronization task completes — for example, to add a timestamp. The codeless UI supports one statement; the code editor supports multiple.

Required: No

maxBatchRows

The maximum number of rows written per batch. maxBatchRows controls how frequently data is flushed to SelectDB — a higher value means fewer, larger batches.

Required: No
Default: 500000

loadProps

Parameters for the COPY INTO request that control the import format. If left blank or set to {}, JSON format is used by default.

Required: No
Default: json

JSON format (default): The strip_outer_array parameter is automatically set to true.

"loadProps": {
  "format": "json",
  "strip_outer_array": true
}

CSV format: Specify delimiters explicitly. If you do not specify the row and column delimiters, the imported data is converted into strings by default — \t is used as the column delimiter and \n as the row delimiter.

"loadProps": {
  "format": "csv",
  "column_separator": "\\x01",
  "line_delimiter": "\\x02"
}

Supported data types

SelectDB Writer supports the following ApsaraDB for SelectDB data types for offline writes.

Data type	Supported
INT	Yes
BIGINT	Yes
LARGEINT	Yes
SMALLINT	Yes
TINYINT	Yes
BOOLEAN	Yes
DECIMAL	Yes
DOUBLE	Yes
FLOAT	Yes
CHAR	Yes
VARCHAR	Yes
STRING	Yes
DATE	Yes
DATEV2	Yes
DATETIME	Yes
DATETIMEV2	Yes
ARRAY	Yes
JSONB	Yes
BITMAP	No
HLL	No
QUANTILE_STATE	No

ApsaraDB for SelectDB:Use DataWorks to import data