OSS Load lets you import hundreds of gigabytes of data at a time from Object Storage Service (OSS) into ApsaraDB for SelectDB over an internal network. The import runs asynchronously — after you submit the job, SelectDB processes the files in the background while you continue working.
When to use OSS Load
OSS Load is the right choice when:
You need to import large batches of files (tens to hundreds of gigabytes)
Your files are stored in an OSS bucket in the same region as your SelectDB instance
You need to load CSV, PARQUET, or ORC files
OSS Load uses the S3 protocol internally, so the syntax contains AWS and S3 keywords. This is expected — ApsaraDB for SelectDB supports any S3-compatible object storage, and OSS is accessed using the S3-compatible endpoint.
Prerequisites
Before you begin, ensure that you have:
An AccessKey pair. See Create an AccessKey pair.
An OSS bucket in the same region as your ApsaraDB for SelectDB instance. See Get started by using the OSS console.
OSS accessible over a virtual private cloud (VPC) — OSS Load uses the internal VPC endpoint, not the public endpoint
Syntax
LOAD LABEL <load_label>
(
data_desc1[, data_desc2, ...]
)
WITH S3
(
"AWS_PROVIDER" = "OSS",
"AWS_REGION" = "<region>",
"AWS_ENDPOINT" = "<endpoint>",
"AWS_ACCESS_KEY" = "<AccessKey ID>",
"AWS_SECRET_KEY" = "<AccessKey secret>"
)
PROPERTIES
(
"key1" = "value1", ...
);The OSS bucket path must start with s3://.
Parameters
Data description (data_desc)
Each data_desc block describes a group of files to import. The full syntax is:
[MERGE|APPEND|DELETE]
DATA INFILE
(
"file_path1"[, "file_path2", ...]
)
[NEGATIVE]
INTO TABLE `<table_name>`
[PARTITION (p1, p2, ...)]
[COLUMNS TERMINATED BY "<column_separator>"]
[FORMAT AS "<file_type>"]
[(column_list)]
[COLUMNS FROM PATH AS (c1, c2, ...)]
[PRECEDING FILTER predicate]
[SET (column_mapping)]
[WHERE predicate]
[DELETE ON expr]
[ORDER BY source_sequence]
[PROPERTIES ("key1"="value1", ...)]| Parameter | Description |
|---|---|
MERGE|APPEND|DELETE | Data merge type. Default: APPEND. MERGE and DELETE apply only to tables in the Unique Key model. When set to MERGE, use DELETE ON to specify the delete_flag column. When set to DELETE, the imported data rows are deleted from the table. |
DATA INFILE | Path of the file(s) to import. Wildcards are supported. Must be a file path, not a directory path. |
NEGATIVE | Imports data using a negative method, which reverses integer values in SUM-aggregated columns to offset previously imported error data. Applies only to tables that use SUM aggregation with integer columns. |
PARTITION (p1, p2, ...) | Partitions to import into. Data outside the specified partitions is ignored. |
COLUMNS TERMINATED BY | Column delimiter. Valid only for CSV files. Must be a single byte. |
FORMAT AS | File format. Valid values: CSV, PARQUET, ORC. Default: CSV. |
column_list | Column order in the source file. See Converting source data. |
COLUMNS FROM PATH AS | Columns to extract from the file path. |
PRECEDING FILTER predicate | The preset conditions for filtering data. The data is merged into source data rows in sequence based on the values of the column list and COLUMNS FROM PATH AS parameters. Then, the data is filtered based on the preset conditions for filtering data. |
SET (column_mapping) | Column transformation functions applied to the source data. |
WHERE predicate | The conditions based on which the imported data is filtered. |
DELETE ON expr | Used with MERGE type only. Specifies the delete_flag column and condition. Applies only to tables in the Unique Key model. |
ORDER BY | Sequence column for the import. Applies only to tables in the Unique Key model, ensuring data order during import. |
PROPERTIES | Additional format parameters. For JSON files, you can set json_root, jsonpaths, and fuzzy_parse. |
OSS connection parameters (WITH S3)
| Parameter | Description |
|---|---|
AWS_PROVIDER | Object storage provider. Set to OSS. |
AWS_REGION | Region where the OSS bucket resides. |
AWS_ENDPOINT | The endpoint used to access OSS data. See Regions and endpoints. The OSS bucket and SelectDB instance must be in the same region. |
AWS_ACCESS_KEY | AccessKey ID used to access OSS. |
AWS_SECRET_KEY | AccessKey secret used to access OSS. |
Job properties (PROPERTIES)
| Parameter | Default | Description |
|---|---|---|
timeout | 14400 (4 hours) | Timeout for the import job, in seconds. |
max_filter_ratio | 0 | Maximum proportion of rows that can be filtered out due to data quality issues. Valid values: 0 to 1. |
exec_mem_limit | 2147483648 (2 GiB) | Maximum memory available for the import job, in bytes. |
strict_mode | false | Whether to enable strict mode for the import job. |
timezone | Asia/Shanghai | Time zone for time-related functions such as strftime, alignment_timestamp, and from_unixtime. See List of all time zones. |
load_parallelism | 1 | Number of concurrent import tasks. Increase this value to speed up large imports. |
send_batch_parallelism | — | Number of concurrent tasks for sending data in batches. Capped by the BE configuration max_send_batch_parallelism_per_job. |
load_to_single_tablet | false | Whether to import all data into a single tablet per partition. Applies only to tables in the Duplicate Key model with random partitioning. |
Import data from OSS
This example creates a table, prepares a sample CSV file in OSS, and loads it using OSS Load.
Step 1: Create the target table.
CREATE TABLE test_table
(
id int,
name varchar(50),
age int,
address varchar(50),
url varchar(500)
)
DISTRIBUTED BY HASH(id) BUCKETS 4
PROPERTIES("replication_num" = "1");Step 2: Upload your data file to OSS.
Upload a CSV file named test_file.txt to your OSS bucket. The file should look like:
1,yang,32,shanghai,http://example.com
2,wang,22,beijing,http://example.com
3,xiao,23,shenzhen,http://example.comStep 3: Submit the import job.
LOAD LABEL test_db.test_label_1
(
DATA INFILE("s3://your_bucket_name/test_file.txt")
INTO TABLE test_table
COLUMNS TERMINATED BY ","
)
WITH S3
(
"AWS_PROVIDER" = "OSS",
"AWS_REGION" = "oss-cn-beijing",
"AWS_ENDPOINT" = "oss-cn-beijing-internal.aliyuncs.com",
"AWS_ACCESS_KEY" = "<your_access_key>",
"AWS_SECRET_KEY" = "<your_secret_key>"
)
PROPERTIES
(
"timeout" = "3600"
);The LOAD statement has four sections:
LABEL: A unique identifier for this import job, used to query its status and cancel it if needed.
Data declaration: The source file path, format, and destination table.
WITH S3: OSS connection credentials and endpoint.
PROPERTIES: Job-level settings such as timeout.
Monitor and manage import jobs
Check job status
OSS Load is asynchronous. After submitting the job, use SHOW LOAD to track its progress.
SHOW LOAD
[FROM <db_name>]
[
WHERE
[LABEL [ = "your_label" | LIKE "label_matcher"]]
[STATE = ["PENDING"|"ETL"|"LOADING"|"FINISHED"|"CANCELLED"]]
]
[ORDER BY ...]
[LIMIT limit][OFFSET offset];| Parameter | Default | Description |
|---|---|---|
db_name | Current database | Database to query. |
LABEL | — | Filter by label. Supports exact match (=) and pattern match (LIKE). |
STATE | — | Filter by job state. |
ORDER BY | — | Sort the results. |
LIMIT | All records | Maximum number of records to return. |
OFFSET | 0 | Number of records to skip. |
Examples:
Query jobs in example_db with labels matching 2014_01_02, returning the 10 oldest:
SHOW LOAD FROM example_db WHERE LABEL LIKE "2014_01_02" LIMIT 10;Query a specific job and sort by start time:
SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" ORDER BY LoadStartTime DESC;Query jobs currently in the LOADING state:
SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" AND STATE = "LOADING";Query with pagination (skip the first 5, return the next 10):
SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC LIMIT 5,10;
SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC LIMIT 10 OFFSET 5;Cancel an import job
Cancel a job that is not yet in the FINISHED or CANCELLED state. After cancellation, any data written by the job is rolled back.
CANCEL LOAD
[FROM <db_name>]
WHERE [LABEL = "<load_label>" | LABEL LIKE "<label_pattern>"];| Parameter | Default | Description |
|---|---|---|
db_name | Current database | Database containing the import job. |
load_label | — | Label of the job to cancel. Supports exact match and LABEL LIKE pattern match. |
Examples:
Cancel a specific job:
CANCEL LOAD
FROM example_db
WHERE LABEL = "example_db_test_load_label";Cancel all jobs whose labels start with example_:
CANCEL LOAD
FROM example_db
WHERE LABEL LIKE "example_";Troubleshooting
Import times out
The default timeout is 4 hours (14400 seconds). If a job exceeds this limit, avoid simply increasing the timeout — long retries are costly when a job fails late.
Instead, split large files into smaller ones and import them in multiple jobs.
If your data volume is within range but the job still times out, increase load_parallelism to run more tasks concurrently, then set timeout accordingly.