All Products
Search
Document Center

ApsaraDB for SelectDB:Use OSS Load to import data

Last Updated:Mar 28, 2026

OSS Load lets you import hundreds of gigabytes of data at a time from Object Storage Service (OSS) into ApsaraDB for SelectDB over an internal network. The import runs asynchronously — after you submit the job, SelectDB processes the files in the background while you continue working.

When to use OSS Load

OSS Load is the right choice when:

  • You need to import large batches of files (tens to hundreds of gigabytes)

  • Your files are stored in an OSS bucket in the same region as your SelectDB instance

  • You need to load CSV, PARQUET, or ORC files

OSS Load uses the S3 protocol internally, so the syntax contains AWS and S3 keywords. This is expected — ApsaraDB for SelectDB supports any S3-compatible object storage, and OSS is accessed using the S3-compatible endpoint.

Prerequisites

Before you begin, ensure that you have:

Syntax

LOAD LABEL <load_label>
(
    data_desc1[, data_desc2, ...]
)
WITH S3
(
    "AWS_PROVIDER" = "OSS",
    "AWS_REGION" = "<region>",
    "AWS_ENDPOINT" = "<endpoint>",
    "AWS_ACCESS_KEY" = "<AccessKey ID>",
    "AWS_SECRET_KEY" = "<AccessKey secret>"
)
PROPERTIES
(
    "key1" = "value1", ...
);
Note

The OSS bucket path must start with s3://.

Parameters

Data description (data_desc)

Each data_desc block describes a group of files to import. The full syntax is:

[MERGE|APPEND|DELETE]
DATA INFILE
(
    "file_path1"[, "file_path2", ...]
)
[NEGATIVE]
INTO TABLE `<table_name>`
[PARTITION (p1, p2, ...)]
[COLUMNS TERMINATED BY "<column_separator>"]
[FORMAT AS "<file_type>"]
[(column_list)]
[COLUMNS FROM PATH AS (c1, c2, ...)]
[PRECEDING FILTER predicate]
[SET (column_mapping)]
[WHERE predicate]
[DELETE ON expr]
[ORDER BY source_sequence]
[PROPERTIES ("key1"="value1", ...)]
ParameterDescription
MERGE|APPEND|DELETEData merge type. Default: APPEND. MERGE and DELETE apply only to tables in the Unique Key model. When set to MERGE, use DELETE ON to specify the delete_flag column. When set to DELETE, the imported data rows are deleted from the table.
DATA INFILEPath of the file(s) to import. Wildcards are supported. Must be a file path, not a directory path.
NEGATIVEImports data using a negative method, which reverses integer values in SUM-aggregated columns to offset previously imported error data. Applies only to tables that use SUM aggregation with integer columns.
PARTITION (p1, p2, ...)Partitions to import into. Data outside the specified partitions is ignored.
COLUMNS TERMINATED BYColumn delimiter. Valid only for CSV files. Must be a single byte.
FORMAT ASFile format. Valid values: CSV, PARQUET, ORC. Default: CSV.
column_listColumn order in the source file. See Converting source data.
COLUMNS FROM PATH ASColumns to extract from the file path.
PRECEDING FILTER predicateThe preset conditions for filtering data. The data is merged into source data rows in sequence based on the values of the column list and COLUMNS FROM PATH AS parameters. Then, the data is filtered based on the preset conditions for filtering data.
SET (column_mapping)Column transformation functions applied to the source data.
WHERE predicateThe conditions based on which the imported data is filtered.
DELETE ON exprUsed with MERGE type only. Specifies the delete_flag column and condition. Applies only to tables in the Unique Key model.
ORDER BYSequence column for the import. Applies only to tables in the Unique Key model, ensuring data order during import.
PROPERTIESAdditional format parameters. For JSON files, you can set json_root, jsonpaths, and fuzzy_parse.

OSS connection parameters (WITH S3)

ParameterDescription
AWS_PROVIDERObject storage provider. Set to OSS.
AWS_REGIONRegion where the OSS bucket resides.
AWS_ENDPOINTThe endpoint used to access OSS data. See Regions and endpoints. The OSS bucket and SelectDB instance must be in the same region.
AWS_ACCESS_KEYAccessKey ID used to access OSS.
AWS_SECRET_KEYAccessKey secret used to access OSS.

Job properties (PROPERTIES)

ParameterDefaultDescription
timeout14400 (4 hours)Timeout for the import job, in seconds.
max_filter_ratio0Maximum proportion of rows that can be filtered out due to data quality issues. Valid values: 0 to 1.
exec_mem_limit2147483648 (2 GiB)Maximum memory available for the import job, in bytes.
strict_modefalseWhether to enable strict mode for the import job.
timezoneAsia/ShanghaiTime zone for time-related functions such as strftime, alignment_timestamp, and from_unixtime. See List of all time zones.
load_parallelism1Number of concurrent import tasks. Increase this value to speed up large imports.
send_batch_parallelismNumber of concurrent tasks for sending data in batches. Capped by the BE configuration max_send_batch_parallelism_per_job.
load_to_single_tabletfalseWhether to import all data into a single tablet per partition. Applies only to tables in the Duplicate Key model with random partitioning.

Import data from OSS

This example creates a table, prepares a sample CSV file in OSS, and loads it using OSS Load.

Step 1: Create the target table.

CREATE TABLE test_table
(
    id int,
    name varchar(50),
    age int,
    address varchar(50),
    url varchar(500)
)
DISTRIBUTED BY HASH(id) BUCKETS 4
PROPERTIES("replication_num" = "1");

Step 2: Upload your data file to OSS.

Upload a CSV file named test_file.txt to your OSS bucket. The file should look like:

1,yang,32,shanghai,http://example.com
2,wang,22,beijing,http://example.com
3,xiao,23,shenzhen,http://example.com

Step 3: Submit the import job.

LOAD LABEL test_db.test_label_1
(
    DATA INFILE("s3://your_bucket_name/test_file.txt")
    INTO TABLE test_table
    COLUMNS TERMINATED BY ","
)
WITH S3
(
    "AWS_PROVIDER" = "OSS",
    "AWS_REGION" = "oss-cn-beijing",
    "AWS_ENDPOINT" = "oss-cn-beijing-internal.aliyuncs.com",
    "AWS_ACCESS_KEY" = "<your_access_key>",
    "AWS_SECRET_KEY" = "<your_secret_key>"
)
PROPERTIES
(
    "timeout" = "3600"
);

The LOAD statement has four sections:

  • LABEL: A unique identifier for this import job, used to query its status and cancel it if needed.

  • Data declaration: The source file path, format, and destination table.

  • WITH S3: OSS connection credentials and endpoint.

  • PROPERTIES: Job-level settings such as timeout.

Monitor and manage import jobs

Check job status

OSS Load is asynchronous. After submitting the job, use SHOW LOAD to track its progress.

SHOW LOAD
[FROM <db_name>]
[
   WHERE
   [LABEL [ = "your_label" | LIKE "label_matcher"]]
   [STATE = ["PENDING"|"ETL"|"LOADING"|"FINISHED"|"CANCELLED"]]
]
[ORDER BY ...]
[LIMIT limit][OFFSET offset];
ParameterDefaultDescription
db_nameCurrent databaseDatabase to query.
LABELFilter by label. Supports exact match (=) and pattern match (LIKE).
STATEFilter by job state.
ORDER BYSort the results.
LIMITAll recordsMaximum number of records to return.
OFFSET0Number of records to skip.

Examples:

Query jobs in example_db with labels matching 2014_01_02, returning the 10 oldest:

SHOW LOAD FROM example_db WHERE LABEL LIKE "2014_01_02" LIMIT 10;

Query a specific job and sort by start time:

SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" ORDER BY LoadStartTime DESC;

Query jobs currently in the LOADING state:

SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" AND STATE = "LOADING";

Query with pagination (skip the first 5, return the next 10):

SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC LIMIT 5,10;
SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC LIMIT 10 OFFSET 5;

Cancel an import job

Cancel a job that is not yet in the FINISHED or CANCELLED state. After cancellation, any data written by the job is rolled back.

CANCEL LOAD
[FROM <db_name>]
WHERE [LABEL = "<load_label>" | LABEL LIKE "<label_pattern>"];
ParameterDefaultDescription
db_nameCurrent databaseDatabase containing the import job.
load_labelLabel of the job to cancel. Supports exact match and LABEL LIKE pattern match.

Examples:

Cancel a specific job:

CANCEL LOAD
FROM example_db
WHERE LABEL = "example_db_test_load_label";

Cancel all jobs whose labels start with example_:

CANCEL LOAD
FROM example_db
WHERE LABEL LIKE "example_";

Troubleshooting

Import times out

The default timeout is 4 hours (14400 seconds). If a job exceeds this limit, avoid simply increasing the timeout — long retries are costly when a job fails late.

Instead, split large files into smaller ones and import them in multiple jobs.

If your data volume is within range but the job still times out, increase load_parallelism to run more tasks concurrently, then set timeout accordingly.