DataWorks Data Integration supports writing data to Alibaba Cloud SelectDB via single-table or Entire-Database Offline Synchronization.
Supported SelectDB versions
SelectDB Writer uses MySQL Driver 5.1.47. The following versions are supported:
Standard SelectDB: 2.2.31 and 2.2.22
Alibaba Cloud SelectDB: 2.4, 3.0, and 4.0
For details on driver capabilities, see MySQL Connectors.
Limitations
Writing data to BITMAP, HLL (HyperLogLog), or QUANTILE_STATE fields is not supported.
Supported field types
The following table lists the supported field types using SelectDB 2.2.22 as an example. For a complete list for each version, see the official SelectDB documentation.
| SelectDB field type | Offline write (SelectDB Writer) |
|---|---|
| INT | Supported |
| BIGINT | Supported |
| LARGEINT | Supported |
| SMALLINT | Supported |
| TINYINT | Supported |
| BOOLEAN | Supported |
| DECIMAL | Supported |
| DOUBLE | Supported |
| FLOAT | Supported |
| CHAR | Supported |
| VARCHAR | Supported |
| STRING | Supported |
| DATE | Supported |
| DATEV2 | Supported |
| DATETIME | Supported |
| DATETIMEV2 | Supported |
| ARRAY | Supported |
| JSONB | Supported |
| BITMAP | Not supported |
| HLL (HyperLogLog) | Not supported |
| QUANTILE_STATE | Not supported |
Prerequisites
Before you configure a synchronization task, complete the following steps.
Step 1: Confirm the SelectDB version
Data Integration has version requirements for SelectDB. Verify that your SelectDB version meets the requirements listed in Supported SelectDB versions. Check your version in the SelectDB console.
Step 2: Create an account and grant permissions
Create a dedicated SelectDB account for DataWorks to use when accessing the data source. If you use the default admin user of the SelectDB data warehouse instead, set a password for that user.
Step 3: Configure network connectivity
Choose a connection type based on your network environment.
| Resource group network | SelectDB connection type | Steps |
|---|---|---|
| VPC | Private network (recommended) | In the SelectDB console, create a private endpoint following the Network Configuration instructions. After the endpoint status changes from Creating to Available and the connection status changes from Connecting to Connected, bind the endpoint's VPC to the resource group's VPC. See Network connectivity solution. |
| Public | Internet | Add the elastic IP addresses (EIPs) of the resource group to the SelectDB whitelist. See Add a whitelist. |
A private network connection provides higher bandwidth and a more stable connection than an Internet connection.
Add a data source
Add SelectDB as a data source in DataWorks following the instructions in Data source management. The following parameters are specific to SelectDB:
MySQL Connection Address: The Java Database Connectivity (JDBC) connection string. Copy the JDBC Connection String from Connection Method > Other Methods in the SelectDB console. Both public and private IP addresses are supported.
HTTP Connection Address: The HTTP protocol endpoint. Copy the HTTP Protocol Endpoint from Connection Method > Other Methods in the SelectDB console. Both public and private IP addresses are supported.
Username: The username for the SelectDB data warehouse.
Password: The password for the specified username.
View parameter descriptions directly in the DataWorks console when adding a data source.
Develop a synchronization task
Single-table offline synchronization
Configure a task using either the codeless UI or the code editor:
Codeless UI: Configure an offline synchronization task using the codeless UI
Code editor: Configure an offline synchronization task using the code editor
For the complete parameter reference and script examples, see Appendix: Script demo and parameter reference.
Entire-database offline synchronization
Synchronize an entire database from MySQL or PostgreSQL to SelectDB. For the configuration procedure, see Configure an Entire-Database Offline Synchronization task.
Appendix: Script demo and parameter reference
Use the code editor to configure a batch synchronization task in script format. For the general script format requirements, see Configure an offline synchronization task using the code editor.
Script examples
All examples use "stepType": "selectdb" to identify the SelectDB Writer plug-in.
Example 1: Write in JSON format (default)
JSON is the default import format. Use this when the source data is already structured as JSON objects.
{
"stepType": "selectdb",
"parameter": {
"datasource": "selectdb_datasource",
"table": "selectdb_table_name",
"column": ["id", "table_id", "table_no", "table_name", "table_status"],
"loadProps": {
"format": "json",
"strip_outer_array": true
}
},
"name": "Writer",
"category": "writer"
}Example 2: Write in CSV format
Use CSV format when you need custom column and row delimiters.
{
"stepType": "selectdb",
"parameter": {
"datasource": "selectdb_datasource",
"table": "selectdb_table_name",
"column": ["id", "table_id", "table_no", "table_name", "table_status"],
"loadProps": {
"format": "csv",
"column_separator": "\\x01",
"line_delimiter": "\\x02"
}
},
"name": "Writer",
"category": "writer"
}If you use CSV format without specifyingcolumn_separatorandline_delimiter, all imported data is converted to strings, with\tas the column delimiter and\nas the row delimiter.
Writer parameters
| Parameter | Description | Required | Default value |
|---|---|---|---|
datasource | The name of the data source added in DataWorks. | Yes | None |
table | The name of the destination table. | Yes | None |
column | The columns to write data to, specified as a list. For example: ["id", "name", "age"]. To write to all columns in order, use ["*"]. | Yes | None |
preSql | SQL statements to run before the synchronization task starts. The codeless UI supports one statement; the code editor supports multiple statements. | No | None |
postSql | SQL statements to run after the synchronization task completes. The codeless UI supports one statement; the code editor supports multiple statements. | No | None |
maxBatchRows | The maximum number of rows per import batch. The import starts when either maxBatchRows or batchSize is reached. | No | 500000 |
batchSize | The maximum data volume per import batch, in bytes. The import starts when either batchSize or maxBatchRows is reached. | No | 94371840 |
maxRetries | The number of retry attempts after a batch import fails. | No | 3 |
labelPrefix | The prefix for each uploaded file's label. The final label is labelPrefix + UUID, forming a globally unique label that prevents duplicate imports. | No | datax_selectdb_writer_ |
loadProps | The COPY INTO request parameters, used to configure the import format. If omitted or set to {}, data is imported in JSON format with strip_outer_array=true. | No | JSON format (strip_outer_array=true) |
clusterName | The name of the SelectDB Cloud cluster. View the name in the SelectDB console. | No | None |
flushInterval | The interval between batch writes, in milliseconds. If maxBatchRows and batchSize thresholds are not reached within this interval, the batch is imported based on the interval. | No | 30000 |