All Products
Search
Document Center

OpenSearch:OSS + API data source

Last Updated:Apr 01, 2026

Configure Object Storage Service (OSS) as a full data source for an OpenSearch Retrieval Engine Edition table. This setup loads bulk data from an OSS bucket into a search index and keeps it current through the API.

The process has two phases:

  1. Prepare OSS — activate the service, create a bucket, upload your data objects, and apply the required tag.

  2. Configure the table in the OpenSearch console.

Prerequisites

Before you begin, ensure that you have:

  • An Alibaba Cloud account with permission to activate OSS and create OpenSearch instances

  • An OpenSearch Retrieval Engine Edition instance already created in the target region

Constraints

ConstraintDetail
Region matchOSS must be activated in the same region as your OpenSearch instance
Bucket typeOSS buckets without regional attributes are not supported
Directory taggingThe OSS directory must either include opensearch in its name or have the opensearch:opensearch tag
Path charactersOSS paths cannot contain ?, =, or &
Object placementObjects must be inside a directory — not in the bucket root

Prepare OSS

Activate OSS and create a bucket

  1. Activate OSS in the same region as your OpenSearch instance.

When you add an OSS data source in the OpenSearch console, the system automatically creates a service-linked role named AliyunServiceRoleForSearchEngine if the role does not already exist. OpenSearch assumes this role to access your OSS resources.

Add the OSS + API data source

Step 1: Configure basic table settings

  1. In the OpenSearch console, go to Instance Details > Table Management and click Add Table.

Step 2: Configure the data source

  1. Under Configure a data source, set the following fields:

    The directory in OSS path must either include opensearch in its name (for example, /opensearch_index_data/) or have the opensearch:opensearch tag. Without one of these, the system cannot read the objects.
    FieldValue / requirement
    Full data sourceObject Storage Service (OSS) + API
    OSS pathThe path to the directory containing your data objects. Must start with /. Cannot contain ?, =, or &. Example: /opensearch_index_data/
    OSS bucketThe bucket name as shown on the Buckets page of the OSS console
    Data formatHA3 or JSON

Step 3: Configure fields and schema

  1. Configure the fields and click Next. The following sample shows two fields — pk and namespace. Download the full sample file (oss_test.txt) to see the complete data format.

    CMD=add
    pk=999000
    namespace=0.00.0039257140.0098142860.0039257140.00
    pk=999000
    namespace=0.00.0039257140

    For object format details, see Object format reference.

Step 4: Create the table and verify

  1. Click Confirm Creation. The system generates the table automatically.

Object format reference

Objects must be UTF-8 encoded. Two formats are supported: HA3 and JSON.

HA3 format

HA3 is a line-oriented format. Each document is a set of key-value pairs separated by ASCII control characters.

Delimiters

C++ encodingASCII hexDescriptionDisplay form
\x1F\n1F0AKey-value delimiter^_ (followed by line feed)
\x1E\n1E0ACommand delimiter^^ (followed by line feed)
\x1D1DMulti-value delimiter^]
\x1C1CSection weight flag^\
\x1D1DSection delimiter^]
\x0303Sub-doc field delimiter^C

Add command

Use CMD=add to insert a document. The first line must be CMD=add, followed by field key-value pairs. Field order must match the schema, and all fields must be declared in the schema.

CMD=add^_
PK=12345321^_
url=http://www.aliyun.com/index.html^_
title=Alibaba Cloud Computing Co., Ltd.^_
body=xxxxxx xxx^_
time=3123423421^_
multi_value_field=1234^]324^]342^_
bidwords=mp3^\price=35.8^Ptime=13867236221^]mp4^\price=32.8^Ptime=13867236221^_
^^

Delete command

Use CMD=delete to remove a document. The first line must be CMD=delete, followed by the primary key field. If the primary key and the partition hashing field differ, include both; if they are the same, include only one.

CMD=delete^_
PK=12345321^_
^^

Complete data file example

A single data file can contain multiple commands:

CMD=add^_
PK=12345321^_
url=http://www.aliyun.com/index.html^_
title=Alibaba Cloud Computing Co., Ltd.^_
body=xxxxxx xxx^_
time=3123423421^_
multi_value_field=1234^]324^]342^_
bidwords=mp3^\price=35.8^Ptime=13867236221^]mp4^\price=32.8^Ptime=13867236221^_
^^
CMD=delete^_
PK=12345321^_
^^

JSON format

Each line in the file is a single JSON record. A record must not contain any line feeds. Multiple records appear on separate lines.

{"field_double": ["100.0", "221.123", "500.3333333"], "field_int32": ["100", "200", "300"], "title": "Huawei Mate 9 Kirin 960 chip Leica dual lens", "color": "Red", "empty_int32": "", "price": "3599", "CMD": "add", "nid": "1", "gather_cn_str": "", "desc": ["str1", "str2", "str3"], "brand": "Huawei", "size": "5.9","__subdocs__":[{"sub_pk":"100","sub_field1":"200","sub_field2":["100","200","300"]},{"sub_pk":"200","sub_field1":"200","sub_field2":["100","200","300"]}]}
{"field_double": ["100.0", "221.123", "500.3333333", "100.0", "221.123", "500.3333333"], "field_int32": ["100", "200", "300", "100", "200", "300"], "title": "Huawei/Huawei P10 Plus all-network phone", "color": "Blue", "empty_int32": "", "price": "4388", "CMD": "add", "nid": "2", "gather_cn_str": "color Blue", "desc": ["str1", "str2", "str3", "str1", "str2", "str3"], "brand": "Huawei", "size": "5.5","__subdocs__":[{"sub_pk":"100","sub_field1":"200","sub_field2":["100","200","300"]},{"sub_pk":"200","sub_field1":"200","sub_field2":["100","200","300"]}]}

Each record includes a CMD field ("add" or "delete") and the document fields. Multi-value fields use JSON arrays. Sub-documents appear under the __subdocs__ key as an array of objects.