Independent phase-1 queries
You can include the no_summary parameter in a config clause to specify whether to perform only a phase-1 query. The default value of the no_summary parameter is no. If you set the value of the no_summary parameter to yes, the system returns the queried results after the phase-1 query is complete.
Results of independent phase-1 queries
The result of an independent phase-1 query includes the hit parameter and other parameters such as the gid parameter, the pk parameter if you specify a field as the primary key field in the query statement, and the attribute parameter if you specify attributes in the query statement.
The following information is displayed:
gid :=Cluster name|Version of full data|Version of incremental data|Hash ID|Document ID|Hash value of the primary key|IP address of the Searcher workerSample request
config=no_summary:yes,start:0,hit:2&&cluster=daogou&&query=test&&attribute=attr1Sample response
<Root>
<TotalTime>0.000</TotalTime>
<SortExprMeta><![CDATA[-RANK]]></SortExprMeta>
<hits numhits="1" totalhits="1" coveredPercent="100.00">
<hit cluster_name="daogou" hash_id="49152" docid="13988568" gid="daogou|1495852466|80|49152|13988568|00000000000000000000000000000000|201042447">
<fields>
</fields>
<property>
</property>
<attribute>
<att1>100</att1>
</attribute>
<sortExprValues>1</sortExprValues>
</hit>
</hits>
<AggregateResults>
</AggregateResults>
<Error>
<ErrorCode>0</ErrorCode>
<ErrorDescription></ErrorDescription>
</Error>
</Root>Independent phase-2 queries
You can add a fetch_summary clause to a query statement to perform only a phase-2 query. The phase-2 query fetches the summary of the search result. For information about the syntax of the fetch_summary clause, see the fetch_summary clause.
Parallel queries
Overview
The parallel query feature is an extension of the query feature. Parallel queries are performed based on graphics architectures. After you enable the parallel query feature, the system splits a query into multiple query processes and uses multiple threads to process the query. This helps reduce the overall query latency. When you write a query statement, you can specify the number of threads that you want to use to perform the query. The parallel query feature is suitable for scenarios in which seek timeout errors may occur and incomplete search results are returned. You can perform parallel queries in the following scenarios:
Your business uses complex computing logic, including complex filtering, statistical operations, and calculation.
You use a cluster in which computing and storage are decoupled and you frequently perform index dictionary lookup operations and inverted seek operations to access the remote storage.
How to perform a parallel query
Make sure that your Searcher workers are deployed in a multi-core and multi-thread runtime environment.
OpenSearch Vector Search Edition provides the parallel query feature that supports parallel queries on 2 threads, 4 threads, 8 threads, and 16 threads. You can select the number of threads based on your business requirements.
By default, the parallel query feature is enabled. When you configure Searcher workers, you can specify the number of threads in the paraSearchWays parameter. For example, you can specify -- env paraSearchWays=2,4,8. In this case, the workers can use 2, 4, or 8 parallel threads to perform queries. If you do not specify the paraSearchWays parameter, the default value is used. In this case, each worker can support two and four parallel threads.
In a query statement, you can specify the default cluster as the cluster on which you want to perform the query and the number of threads that you want to use in the para_search parameter. The name of the default cluster is general. For example, config=cluster:general.para_search_2, ...."para_search_2" specifies that the query is performed on the general cluster and is performed by two parallel threads.
You can also specify a custom cluster on which you want to perform the query. For example, config=cluster:daogou.para_search_2, ...."para_search_2" specifies that the query is performed on the daogou cluster and is performed by two parallel threads.
Partition-based queries
Overview
When OpenSearch Vector Search Edition creates an index for a document, the system calculates a hash value based on the value in a specified field by using a specific hash algorithm and allocates the corresponding document to a specific partition based on the hash value. Hash values are in the [0-65535] range. For example, you import a document that describes the information about products to OpenSearch Vector Search Edition and specify the Type field as the hash field. When OpenSearch Vector Search Edition creates an index for this document, the system obtains values from this field and calculates hash values based on the obtained values. If a value in the field is Women's clothing, the hash value is 8. Then, OpenSearch Vector Search Edition distributes data that is related to Women's clothing to a specific partition based on the hash value and the number of partitions that you specify for the document. To specify a hash field, set the hash_field parameter to Type in the xxx_cluster.json configuration file. In the preceding example, the value of the hash_field parameter is Women's clothing.
When you write a query statement, you can include one or more values of the Type field in the value of the hash_field parameter. The system executes the query statement on the partitions that are specified by the hash_field parameter.
Examples
Query Coat from all partitions of the general cluster.
cluster=general&&query=Coat
Query Coat from the partitions in which Women's clothing and Accessories are stored.
cluster=general:hash_field=Women's clothing|Accessories&&query=Coat