FAQ about service use - OpenSearch

This topic lists frequently asked questions about using OpenSearch.

System

Q: What is OpenSearch?

OpenSearch is a cloud search service tailored for structured data. You can use OpenSearch to easily create a high-quality, scalable, and customizable search service without needing to manage the underlying technical details. You only need to configure the settings and upload your data. Then, you can retrieve search results using an API. We also provide software development kits (SDKs) for common programming languages. To view the supported languages, go to the Download Hub on the management interface.

Q: What are the benefits of OpenSearch?

OpenSearch is a highly scalable, cloud-based search service that automatically expands hardware resources as your data grows. This lets you create a search service with no hardware investment. The search feature does not add any load to your existing servers. Fast, high-quality search results can better meet the information needs of your users. This improves user engagement and increases traffic and popularity for your product. Ultimately, this can provide significant economic value.

Q: How many search requests can I send?

There is no limit. You can manually configure the settings in the console as needed. Requests for a very high number of queries per second (QPS) require manual review. You should submit these requests as early as possible. If the QPS is too high, the system automatically makes internal adjustments. This process may take several days.

Q: What is the latency of a search request?

The actual access latency is heavily influenced by the complexity of the query and the number of matching documents. You should test your data to determine the specific latency for your use case.

Q: How is OpenSearch billed?

For more information, see Billing overview for standard OpenSearch instances. After the billing model update, two billing methods are available: the legacy method based on storage and QPS, and the new method based on LCU. Note: If you use the legacy billing method, the LCU consumption shown in monitoring is not related to billing. You can use it only as a reference for search performance.

Flow

Q: What is an application and how do I create one?

An application is a collection of searchable documents that share the same application schema. Within an application, you can define data tables, field types, and search properties. You can also upload data and retrieve search results. You can create, manage, and delete applications from the console or using a web API.

Q: What is the purpose of 'index to' when defining an application schema?

When you use a `query` clause to search for a keyword, you must specify the index to search. This is the field that you define in the 'index to' property of the application schema. An index field contains one or more source fields. When you define an index field, the DPI engine builds an inverted index that maps search queries to documents. This allows the system to quickly locate documents based on search queries, which significantly improves query performance.

For example, a forum has two search requirements: 1. A combined search of the title, body, and author. 2. A search of only the title. The source fields are `title`, `body`, and `author`. You can index `title`, `body`, and `author` to a `default` field. Then, you can index `title` to a `title_search` field. This way, `query=default:'keyword'` fulfills the first requirement, and `query=title_search:'keyword'` fulfills the second requirement.

Q: What is a template?

A template is an application container created for typical internet data to reduce your workload. It includes an application schema, sorting methods, and other configurations. You can view details about the available templates when you create an application. More templates will be added in the future.

Q: Can I modify the application schema of a template after selecting it?

Yes, you can. To do so, go to Application Management > Details > Offline Change. For specific steps, see Offline change.

Q: How do I upload data to an application?

If you use RDS, MaxCompute (formerly ODPS), or PolarDB, you can configure automatic synchronization by providing the instance information in the console. Other users can upload data using the data upload API or the upload feature in the management interface.

Q: How many documents can I upload?

There is no upper limit on the number of documents that you can upload to an application. You can configure the document capacity quota in the console. For more information, see the Quota and Billing section in Application Management. Similar to peak QPS, requests for a large capacity require manual review. This quota affects your billing, so you should configure it as needed.

Q: How do I delete an application?

To delete an application, click the name of the index in the management tool. On the index details page, click Delete Application and confirm the action in the prompt.

Q: How do I delete documents from an application?

To delete a specific document, set the operation to `delete` in the data that you upload using the SDK. The Search Test page in the console also provides a feature to delete specific documents.

Q: How do I clear an application?

To keep the application name, you can delete the documents in the application one by one to clear the index. If you do not need to keep the application name, you can create a new application and copy the configuration to it. A separate data clearing feature will be provided in the future. The Premium Edition now supports scheduled cleanup tasks to retain documents for 7 to 180 days.

Data import

Q: Can OpenSearch be used with Alibaba Cloud database services like RDS?

Yes, it can. You can configure the RDS instance information in the application's data source settings. This enables automatic synchronization of RDS operations to the OpenSearch system. After configuration, you must import the data and rebuild the index to import all current data into the system.

Q: Which Alibaba Cloud products does OpenSearch directly integrate with?

Currently, OpenSearch supports RDS, MaxCompute (formerly ODPS), and PolarDB. More products will be integrated in the future.

Q: What is the document format for uploads using the API or SDK?

OpenSearch currently supports the JSON format. For more information, see the sample file on the template page or the data processing section in the API documentation.

Q: What is the difference between the 'add' and 'update' commands?

When you upload a document with the `add` or `update` command, if you do not provide values for some fields, the `add` command overwrites them with default values, while the `update` command does not process them.

Search

Q: What search features does OpenSearch provide?

OpenSearch provides search for primitive data types such as text and numbers. It also offers features such as query, filtering, sorting, statistics, and aggregation. Other features include typical data templates, custom index schemas, custom search result sorting, custom query analysis (such as synonyms and error correction), and drop-down suggestions.

Q: How do I retrieve all documents?

This feature is not supported. A search engine is designed to return the best results in the shortest possible time. Therefore, a feature to view all documents is not provided. OpenSearch has a limit on the maximum number of results that can be returned (see System Limits). Pagination is also limited to this total number of available results.

Q: Why is an Array type returned as a string instead of an array in the search results?

The elements of an Array type in the search results are separated by a tab character (\t).

Q: Can I specify my own tokenization method and dictionary in OpenSearch?

Tokenization is used in two places: when you build an index and when you perform a query. The current dictionaries are configured globally for the entire system. User-defined dictionaries are not supported. However, OpenSearch supports multiple tokenization methods, including custom tokenization. For more information, see "Field types and analyzer types" in the User Guide.

Q: My document contains 'run fast', but a search for 'run fast now' returns no results. Why?

This is because OpenSearch requires all terms from the tokenized query to be present in a document for that document to be retrieved. The query `query=default:'run fast now'` is equivalent to `query=default:'run' AND default:'fast' AND default:'now'`. Because the document does not contain the term 'now', it is not retrieved.

In this case, you can configure the 'word weight' feature in query analysis to apply a RANK operation to unimportant words. This resolves retrieval issues for long-tail queries. For example, a query such as 'Have you eaten?' is automatically rewritten as `query=default:’eaten’ RANK default:’Have’ RANK default:’you’`. This ensures that documents that match the full query are retrieved and ranked higher than documents that only match the primary term, 'eaten'.

Q: I searched for 'mx' and 'player', but my document containing 'mxplayer' was not retrieved. Why?

For English, the minimum tokenization granularity is at the word level. The term 'mxplayer' in the document is treated as a single word. Therefore, searches for 'mx', 'player', or 'mx player' cannot retrieve the document that contains 'mxplayer'.

Q: Does OpenSearch support search in less common languages?

The system currently supports semantic tokenization only for Chinese and English. Other languages that use spaces as delimiters, such as Portuguese and Russian, are supported for basic word matching but not for semantic analysis, such as phrase matching. Languages that do not use spaces as delimiters, such as Japanese and Korean, are not supported. Analyzers for Thai and Vietnamese are now available. For more information, see Text analyzers.

Q: In search results, how can I show only the single most relevant product from each member, and then display the total count?

You can use a combination of the `Aggregate` clause and the `Distinct` clause. The `Distinct` clause can be used to diversify results from the same category. The `Aggregate` clause can be used to provide statistics for results from the same category.

Q: Is there a caching mechanism for search?

Results for the same query are cached for 5 minutes. Caching is enabled by default and cannot be modified or disabled.

API and SDK

Q: What is the endpoint address for pushing data with the SDK?

OpenSearch is deployed in multiple regions, and each region has a unique API endpoint. You can find the endpoint in the application details.

Q: The SDK returned 'ok' after I uploaded data, but I see errors on the page. How can I get the error messages?

When the SDK returns a status of 'OK', this only means that the system has received the data. Errors that occur during subsequent data processing are displayed in the error log for the application in the console. If data is uploaded successfully but is not searchable, first check the information in the error log. You can then correct the data and import it again. An API to retrieve error log information is not currently available.

If the issue persists, submit a ticket.