All Products
Search
Document Center

OpenSearch:MaxCompute data source

Last Updated:Apr 01, 2026

Use a MaxCompute table as a data source in OpenSearch Retrieval Engine Edition to build full-text search indexes over your data warehouse data. When you enable Automatic Reindexing and configure a done table, OpenSearch can automatically rebuild the index each time it detects a new semaphore in the done table.

Prerequisites

Before you begin, make sure that:

  • You are familiar with MaxCompute (formerly known as ODPS). For background, see What is MaxCompute?

  • The MaxCompute table is a partitioned internal table. External tables are not supported.

  • The table fields use only the following data types: STRING, BOOLEAN, DOUBLE, BIGINT, and DATETIME.

  • The account you use to log on to the OpenSearch console has the following permissions on the MaxCompute table: DESCRIBE, SELECT, and DOWNLOAD on the table, and LABEL permission on the table fields.

To grant the required permissions, run the following statements in MaxCompute:

-- Add the account.
add user ****@aliyun.com;

-- Grant table-level permissions.
GRANT describe,select,download ON TABLE table_xxx TO USER ****@aliyun.com;
GRANT describe,select,download ON TABLE table_xxx_done TO USER ****@aliyun.com;

-- Grant LABEL permissions.
-- Option 1: Grant permissions on all fields in the project.
SET LABEL 3 to USER ****@aliyun.com;

-- Option 2: Grant permissions on specific fields in a table.
GRANT LABEL 3 ON TABLE table_xxx(col1, col2) TO ****@aliyun.com;
Important

If field permission verification is enabled on your MaxCompute table, you must grant LABEL permissions on all fields in the table. Otherwise, OpenSearch cannot pull data and index creation fails.

For the CREATE TABLE statement used to build an index from a MaxCompute data source, see CREATE TABLE statement for creating a table in a MaxCompute data source.

How data sync works

MaxCompute data sources support two sync modes, which are typically used together:

ModeHow it worksWhen to use
Full indexingOpenSearch reads the entire MaxCompute table and rebuilds the indexInitial setup; periodic full refreshes triggered by the done table
Incremental syncReal-time updates via an API data sourceAfter full indexing, to keep the index current with row-level changes

This topic covers full indexing. To set up incremental sync, use an API data source alongside your MaxCompute data source.

Add a MaxCompute data source

  1. Log on to the OpenSearch console. In the upper-left corner, select OpenSearch Retrieval Engine Edition.

  2. On the Instances page, find your instance and click Manage in the Actions column.

  3. In the left-side navigation pane, choose Configuration Center > Data Source, then click Add Data Source.

  4. In the panel that appears, select MaxCompute as the data source type and configure the parameters.

    ParameterDescriptionExample
    Data Source NameName of the data source. Format: InstanceName_CustomName. Cannot be changed after creation.myinstance_orders
    ProjectThe MaxCompute project that contains your table.my_project
    AccessKeyThe AccessKey ID of the account.LTAI5tXxx
    AccessKey SecretThe AccessKey secret of the account.
    TableThe MaxCompute table to use as the data source. Must be a partitioned internal table.order_records
    Partition KeyThe partition key of the table. Use the yyyymmddhh format (for example, 2022011314) for hourly partitions to trigger multiple full indexing tasks per day.ds
    Automatic ReindexingWhen enabled, OpenSearch automatically rebuilds indexes each time a change is detected in the data source. Requires a done table — see Configure automatic reindexing.
  5. Click Verify. After the configuration passes verification, click OK.

  6. Configure an index schema to create an index table for this data source. For details, see the Add an index table section of the index schema topic.

  7. Update configurations and trigger reindexing to make the data source available to online clusters. For details, see Update configurations.

Configure automatic reindexing

When automatic reindexing is enabled, OpenSearch watches a done table in MaxCompute and rebuilds the index each time a new partition appears in that table. The done table acts as a signal: you insert a record into it to tell OpenSearch that new data is ready.

Scenario: Your MaxCompute table mytable is partitioned by ds and receives a new daily partition containing the full dataset. Each day, after the new partition is ready, you want OpenSearch to automatically pick it up and rebuild the index.

Done table requirements:

  • Name: {data_table_name}_done (for example, if the data table is mytable, the done table is mytable_done)

  • Partition key: must match the partition key of the data table (for example, ds)

  • Schema: exactly one field named attribute of type STRING

  • The partition you add to the done table must already exist in the data table

Follow these steps to set up the done table and trigger automatic reindexing:

Step 1: Enable automatic reindexing when adding the data source — see Add a MaxCompute data source.

Step 2: Create the done table in MaxCompute:

create table mytable_done (attribute string) partitioned by (ds string);

After creation, both tables are visible in MaxCompute:

odps:sql:xxx> show tables;
ALIYUN$****@aliyun.com:mytable          -- The data table
ALIYUN$****@aliyun.com:mytable_done     -- The done table

Step 3: Signal OpenSearch to reindex after each new partition is ready. When partition ds=20220114 is generated in mytable, run:

-- Add the partition to the done table.
alter table mytable_done add if not exists partition (ds="20220114");

-- Insert the semaphore to trigger automatic full data synchronization.
insert into table mytable_done partition (ds="20220114") select '{"swift_start_timestamp":1642003200}';

The swift_start_timestamp value is a Unix timestamp that specifies the start offset for real-time incremental synchronization.

After the insert, the done table contains:

odps:sql:xxx> select * from mytable_done where ds=20220114 limit 1;
+-----------+----+
| attribute | ds |
+-----------+----+
| {"swift_start_timestamp":1642003200} | 20220114 |
+-----------+----+

OpenSearch scans the done table, detects the new semaphore, and automatically starts a reindexing task.

Important

The attribute field value must be a JSON string in the format {"swift_start_timestamp":<unix_timestamp>}.

Modify a MaxCompute data source

  1. On the Data Source page, find the data source and click Modify in the Actions column.

  2. In the Modify Data Source panel, update the parameters you want to change: Project, AccessKey, AccessKey Secret, Table, or Partition Key.

    The data source name cannot be changed.
  3. Click Verify. After the modified configuration passes verification, click OK.

  4. Update configurations and trigger reindexing to apply the changes to online clusters. For details, see Update configurations.

Delete a MaxCompute data source

  1. On the Data Source page, find the data source and click Delete in the Actions column.

  2. The system checks whether the data source is referenced by an index table:

    • Not referenced: Click OK to delete. Then update configurations and rebuild indexes.

    • Referenced: The system returns an error. Delete the referencing index table first, then delete the data source. For details, see the Delete an index table section of the index schema topic.

Limitations

  • The MaxCompute table must be a partitioned internal table. External tables are not supported.

  • Data source names cannot be changed after creation.

  • Supported field data types: STRING, BOOLEAN, DOUBLE, BIGINT, and DATETIME.

  • Full indexing uses MaxCompute table data directly. For real-time incremental updates, use an API data source alongside the MaxCompute data source.

Next steps