Data Integration is a visual tool for importing external data into ApsaraDB for SelectDB instances and databases without writing code. Use it to load benchmark datasets for performance testing or production data from Object Storage Service (OSS).
Supported integration types
| Type | Description |
|---|---|
| Sample data | Preloaded benchmark datasets (ClickBench, TPC-H, Github Demo, SSB-FLAT) for performance testing |
| OSS | Data files stored in an OSS bucket, supporting JSON, CSV, ORC, Parquet, and automatic format detection |
Prerequisites
Before you begin, ensure that you have:
An ApsaraDB for SelectDB instance running version 3.0.7 or later. For details, see Create an instance.
Open Data Integration
Log on to the ApsaraDB for SelectDB console.
In the top navigation bar, select the region where your instance resides.
In the left-side navigation pane, click Instances. On the Instances page, find your instance and click its ID to go to the Instance Details page.
In the left-side navigation pane, click Data Development and Management (Studio) > Data Integration.
NoteThe first time you open Data Development and Management (Studio), the console prompts you to add your machine's public IP address to the webui_whitelist IP address whitelist. Click OK to proceed.
If you haven't logged on to the WebUI system before, the WebUI logon page appears. Log on with the admin account. If you don't know the password, see Reset the password of an account.
The Integration page opens. If you haven't created any tasks yet, the Stage page opens instead — you can create your first task from there.
Load sample data
Use this option to import benchmark datasets for performance testing.
On the Integration page, click Create in the upper-left corner.
On the New Integration page, select a dataset in the Sample Data section.
Sample data Description ClickBench The ClickBench datasets TPC-H The TPC-H datasets Github Demo The GitHub events SSB-FLAT The SSB-FLAT datasets Configure the following parameters and click Create and Load.
Parameter Description Example Integration name Name for this integration task test Comment Description of the task test comment Cluster The cluster to run the task on new_cluster Sample data size Amount of sample data to load 1GB
Import data from OSS
Use this option to load your own data files from an OSS bucket.
On the Integration page, click Create in the upper-left corner.
On the New Integration page, click Object Storage in the Stage section.
On the New Integration - Object Storage OSS page, configure the parameters in the following sections and click Confirm.
Connection settings
| Parameter | Description | Example |
|---|---|---|
| Integration name | Name for this integration task | test |
| Comment | Description of the task | test comment |
| Bucket | Name of the OSS bucket | test_bucket_name |
| Default data file path | Default path within the bucket | — |
| Authentication | Authorization method to access OSS | Access Key |
| Access Key | The AccessKey ID of your Alibaba Cloud account | akdemo |
| Secret Key | The AccessKey secret of your Alibaba Cloud account | skdemo |
| Advanced settings | Default properties applied during all object imports | — |
File configuration
| Parameter | Description | Example |
|---|---|---|
| File type | File format of OSS objects. Options: JSON, ORC, CSV, Parquet, Automatic Recognition | JSON |
| Compression method | Compression format of OSS objects | gz |
| Column separator | Column delimiter for data in OSS objects | \t |
| Line delimiter | Row delimiter for data in OSS objects | \n |
| File size | Size limits on OSS objects | Unlimited |
Loading configuration
| Parameter | Description | Default |
|---|---|---|
| on Error | How to handle errors during import: Continue keeps importing, Abort stops the task, Customized applies a custom policy | Abort |
| Strict mode | Controls how type conversion errors are handled. Open filters out error data after column type conversion. The following rules apply: (1) Error data refers to NULL values generated in NOT NULL destination columns after type conversion — the strict mode does not apply to destination columns whose NULL values are generated by functions. (2) If a destination column restricts values to a specific range and the converted value does not belong to that range, the strict mode does not apply (for example, source value is 10 and the destination column is DECIMAL(1,0) — 10 can be converted but falls outside the allowed range). Close does not filter out error data after column type conversion. | Open |
Manage integration tasks
Search for a task: On the Integration page, click the Search icon in the upper-right corner and enter the task name.
Delete a task: On the Integrations page, find the task and click the Delete icon in the Actions column.
Deleting a task does not affect data that has already been imported, but may affect data currently being imported. Deleted tasks cannot be recovered.