Getting started - OpenSearch - Alibaba Cloud Documentation Center

Prerequisites

1. An Alibaba Cloud account is created and the real-name verification is complete.

2. When you create an Alibaba Cloud account and log on to the console for the first time, the system prompts you to create an AccessKey pair before you perform subsequent operations.

You must specify an AccessKey pair for your Alibaba Cloud account because the AccessKey pair is required when you create and use an OpenSearch application.
After you create an AccessKey pair for your Alibaba Cloud account, you can create an AccessKey pair for a RAM user. This way, you can access the application as the RAM user. For more information about how to grant permissions to RAM users, see RAM authorization.

3. A virtual private cloud (VPC) is available. For more information, see What is a VPC?

Note

If you want to access a Retrieval Engine Edition instance as a RAM user, you must grant the AliyunSearchEngineFullAccess and AliyunSearchEngineReadOnlyAccess permissions to the RAM user by using your Alibaba Cloud account.

Purchase an instance

Log on to the OpenSearch console. In the upper-left corner, switch to OpenSearch Retrieval Engine Edition.

In the left-side navigation pane, click Instance Management. On the page that appears, click Create Instance.

Select Retrieval Engine Edition as Service Edition. Select a region and configure the Query Node Quantity, Query Node Type, Data Node Quantity, Data Node Type, VPC, and vSwitch parameters. Configure the username and password of the instance as prompted. The password is used to verify query permissions and is not the same as that of your Alibaba Cloud account. Click Buy Now.

Note

Specify the numbers and specifications of Query Result Searcher (QRS) workers and Searcher workers that you want to purchase based on your business requirements. After you specify the specifications, the actual fee is automatically displayed on the buy page.

The VPC and vSwitch that you specify must be consistent with those configured for the Elastic Compute Service (ECS) instance that accesses the Retrieval Engine Edition instance. Otherwise, the error {'errors':{'code':'403','message':'Forbidden'}} is returned when you access the Retrieval Engine Edition instance.

On the Confirm Order page, check the configurations and the service agreement and click Activate Now.

After you purchase the instance, click Console. On the Instance Management page, you can view the purchased instance.

By default, the name of the instance is automatically set. You can click Manage in the Actions column to go to the details page of the instance.

Click the Modify icon, modify the instance name as prompted, and then click OK.

Configure a cluster

On the Instance Management page, the instance that you purchase is in the Pending state. An empty cluster is automatically deployed for the instance. The cluster contains the numbers and specifications of QRS workers and Searcher workers that you specify when you purchase the instance. Then, you must configure the data source, indexes, and reindexing before you can use the search service.

Configure the data source. Data from a MaxCompute data source or an API data source is supported. For more information, see MaxCompute data source and API data source. In this example, a MaxCompute data source is used. To configure a MaxCompute data source, perform the following operations: Click Add Data Source. In the panel that appears, specify MaxCompute as the data source type. Configure the Project, AccessKey ID, AccessKey Secret, Table, and Partition Key parameters. Select Yes for Automatic Reindexing.

After the verification is passed, click OK to add the data source.

Important

[References] MaxCompute data source
[References] API data source

After the data source is configured, click Next to configure the index schema.

2.1. Before you configure the index schema, no index schema version exists and you are prompted to add an index table. To do so, click Add Index Table.

2.2. Configure an index table.

Index Table: Enter a custom name.
Data Source: Select the data source that you configure in Step 1.
Data Shards: Enter a value based on the number of Searcher workers that you purchase.

2.3. Configure fields. You must specify at least two fields: a primary key field and a vector field. The vector field must be of the multi-value FLOAT data type.

If you want to configure a vector index with categories, you can add a category field between the primary key field and the vector field. The category field must be a single-value or multi-value field of the INTEGER data type.

Specify whether to compress attribute fields and field data:

Attribute fields: By default, attribute fields are not compressed. If file_compressor is selected for an attribute field, the attribute field is compressed.
Field data: By default, field data is not compressed. For multi-value fields or fields of the STRING type, uniq is selected by default. For single-value fields, equal is selected by default.

When you configure a vector index, you must specify the fields in the order of the primary key field, category field, and vector field. The category field is optional. The preceding figure shows an example.
The primary key field cannot be compressed.

3. Configure the indexes. You must set the type of the primary key index to PRIMARYKEY64 and the type of the vector index to CUSTOMIZED.

Specify whether to compress index fields:

By default, index fields are not compressed. If file_compressor is selected for an index field, the index field is compressed.

The primary key index cannot be compressed.

3.1. Specify the fields contained in the vector index.

Important

The primary key field and the vector field are required. The category field is optional and can be left empty.
You can select only the three fixed fields and cannot add new fields.

3.2. Configure advanced settings. You must configure the parameters for the vector index. The following figure shows an example. For more information, see Vector indexes.

The following figure shows more parameters.

The following sample code provides an example on how to configure the build_index_params parameter.

{
 "proxima.qc.builder.quantizer_class": "Int8QuantizerConverter",
 "proxima.qc.builder.quantize_by_centroid": true,
 "proxima.qc.builder.optimizer_class": "BruteForceBuilder",
 "proxima.qc.builder.thread_count": 10,
 "proxima.qc.builder.optimizer_params": {
 "proxima.linear.builder.column_major_order": true
 },
 "proxima.qc.builder.store_original_features": false,
 "proxima.qc.builder.train_sample_count": 3000000,
 "proxima.qc.builder.train_sample_ratio": 0.5
}

The following sample code provides an example on how to configure the search_index_params parameter.

{
 "proxima.qc.searcher.scan_ratio": 0.01
}

The system automatically configures the parameters for a vector index. If you have no special requirements, you can click OK to complete the configuration.

After the configuration is complete, click Save Version. In the dialog box that appears, enter the description and click Publish. The description is optional.

After the index is published, click Next to rebuild the index.

Rebuild the index. Configure the parameters based on your index rebuilding requirements and click Next.

API data source

MaxCompute data source

In the left-side navigation pane, choose O&M Center > Change History and click Data Source Changes. On the Data Source Changes tab, you can view the reindexing progress. After reindexing is complete, you can perform a query test.

The index table can be queried online immediately after the reindexing operation is complete.

Note

You can select a valid index schema version based on your business requirements.
The version of advanced settings is used for custom tokenization.
By default, only one cluster is selected when you configure the settings for the first time.
When you perform reindexing for a MaxCompute data source, you can select a configured data partition based on your business requirements.

View the reindexing progress

After the data source and index schema are configured, you can view the cluster topology on the Deployment Management page.

After the configuration is updated, you can choose O&M Center > Change History in the left-side navigation pane. On the page that appears, you can view the reindexing progress on the Data Source Changes tab.

On the details page of the instance, you can view the status of the QRS workers and Searcher workers of the cluster. If the status is normal, you can perform a query test.

Perform a query test

Choose Extended Features > Query Test in the left-side navigation pane. On the page that appears, you can perform a basic query test. Both HA3 query clauses and SQL clauses are supported.

Note

[References] Query syntax
[References] SQL syntax

Usage notes:

Important

Retrieval Engine Edition supports only the pay-as-you-go billing method.
The username and password that you set when you purchase an instance can be modified on the details page of the instance.
The cluster is specified by the system when you purchase an instance, and you cannot modify the cluster name.