Configure Object Storage Service (OSS) as a full data source for an OpenSearch Retrieval Engine Edition table. This setup loads bulk data from an OSS bucket into a search index and keeps it current through the API.
The process has two phases:
Prepare OSS — activate the service, create a bucket, upload your data objects, and apply the required tag.
Configure the table in the OpenSearch console.
Prerequisites
Before you begin, ensure that you have:
An Alibaba Cloud account with permission to activate OSS and create OpenSearch instances
An OpenSearch Retrieval Engine Edition instance already created in the target region
Constraints
| Constraint | Detail |
|---|---|
| Region match | OSS must be activated in the same region as your OpenSearch instance |
| Bucket type | OSS buckets without regional attributes are not supported |
| Directory tagging | The OSS directory must either include opensearch in its name or have the opensearch:opensearch tag |
| Path characters | OSS paths cannot contain ?, =, or & |
| Object placement | Objects must be inside a directory — not in the bucket root |
Prepare OSS
Activate OSS and create a bucket
Activate OSS in the same region as your OpenSearch instance.
When you add an OSS data source in the OpenSearch console, the system automatically creates a service-linked role named AliyunServiceRoleForSearchEngine if the role does not already exist. OpenSearch assumes this role to access your OSS resources.Add the OSS + API data source
Step 1: Configure basic table settings
In the OpenSearch console, go to Instance Details > Table Management and click Add Table.
Step 2: Configure the data source
Under Configure a data source, set the following fields:
The directory in OSS path must either include
opensearchin its name (for example,/opensearch_index_data/) or have theopensearch:opensearchtag. Without one of these, the system cannot read the objects.Field Value / requirement Full data source Object Storage Service (OSS) + API OSS path The path to the directory containing your data objects. Must start with /. Cannot contain?,=, or&. Example:/opensearch_index_data/OSS bucket The bucket name as shown on the Buckets page of the OSS console Data format HA3 or JSON
Step 3: Configure fields and schema
Configure the fields and click Next. The following sample shows two fields —
pkandnamespace. Download the full sample file (oss_test.txt) to see the complete data format.CMD=add pk=999000 namespace=0.00.0039257140.0098142860.0039257140.00 pk=999000 namespace=0.00.0039257140For object format details, see Object format reference.
Step 4: Create the table and verify
Click Confirm Creation. The system generates the table automatically.
Object format reference
Objects must be UTF-8 encoded. Two formats are supported: HA3 and JSON.
HA3 format
HA3 is a line-oriented format. Each document is a set of key-value pairs separated by ASCII control characters.
Delimiters
| C++ encoding | ASCII hex | Description | Display form |
|---|---|---|---|
\x1F\n | 1F0A | Key-value delimiter | ^_ (followed by line feed) |
\x1E\n | 1E0A | Command delimiter | ^^ (followed by line feed) |
\x1D | 1D | Multi-value delimiter | ^] |
\x1C | 1C | Section weight flag | ^\ |
\x1D | 1D | Section delimiter | ^] |
\x03 | 03 | Sub-doc field delimiter | ^C |
Add command
Use CMD=add to insert a document. The first line must be CMD=add, followed by field key-value pairs. Field order must match the schema, and all fields must be declared in the schema.
CMD=add^_
PK=12345321^_
url=http://www.aliyun.com/index.html^_
title=Alibaba Cloud Computing Co., Ltd.^_
body=xxxxxx xxx^_
time=3123423421^_
multi_value_field=1234^]324^]342^_
bidwords=mp3^\price=35.8^Ptime=13867236221^]mp4^\price=32.8^Ptime=13867236221^_
^^Delete command
Use CMD=delete to remove a document. The first line must be CMD=delete, followed by the primary key field. If the primary key and the partition hashing field differ, include both; if they are the same, include only one.
CMD=delete^_
PK=12345321^_
^^Complete data file example
A single data file can contain multiple commands:
CMD=add^_
PK=12345321^_
url=http://www.aliyun.com/index.html^_
title=Alibaba Cloud Computing Co., Ltd.^_
body=xxxxxx xxx^_
time=3123423421^_
multi_value_field=1234^]324^]342^_
bidwords=mp3^\price=35.8^Ptime=13867236221^]mp4^\price=32.8^Ptime=13867236221^_
^^
CMD=delete^_
PK=12345321^_
^^JSON format
Each line in the file is a single JSON record. A record must not contain any line feeds. Multiple records appear on separate lines.
{"field_double": ["100.0", "221.123", "500.3333333"], "field_int32": ["100", "200", "300"], "title": "Huawei Mate 9 Kirin 960 chip Leica dual lens", "color": "Red", "empty_int32": "", "price": "3599", "CMD": "add", "nid": "1", "gather_cn_str": "", "desc": ["str1", "str2", "str3"], "brand": "Huawei", "size": "5.9","__subdocs__":[{"sub_pk":"100","sub_field1":"200","sub_field2":["100","200","300"]},{"sub_pk":"200","sub_field1":"200","sub_field2":["100","200","300"]}]}
{"field_double": ["100.0", "221.123", "500.3333333", "100.0", "221.123", "500.3333333"], "field_int32": ["100", "200", "300", "100", "200", "300"], "title": "Huawei/Huawei P10 Plus all-network phone", "color": "Blue", "empty_int32": "", "price": "4388", "CMD": "add", "nid": "2", "gather_cn_str": "color Blue", "desc": ["str1", "str2", "str3", "str1", "str2", "str3"], "brand": "Huawei", "size": "5.5","__subdocs__":[{"sub_pk":"100","sub_field1":"200","sub_field2":["100","200","300"]},{"sub_pk":"200","sub_field1":"200","sub_field2":["100","200","300"]}]}Each record includes a CMD field ("add" or "delete") and the document fields. Multi-value fields use JSON arrays. Sub-documents appear under the __subdocs__ key as an array of objects.