Use a Java SDK code example to perform scroll queries - OpenSearch - Alibaba Cloud - OpenSearch

Regular queries return up to 5,000 documents. When a result set exceeds that limit, use scroll queries to retrieve all matching documents in batches.

Important

Scroll queries are designed for bulk data retrieval, not for real-time user-facing search. Each batch is limited to 500 documents.

Limitations

Scroll queries do not support the aggregate, distinct, or rank clause.
The start parameter in the config clause has no effect on scroll queries. The offset is always 0.
Each scroll result set cannot exceed 500 documents.
Determine whether an error has occurred based on the error code and message, not the status field. For a full list of error codes, see Error codes.

Prerequisites

Before you begin, ensure that you have:

An OpenSearch application
A Resource Access Management (RAM) user with the required permissions. See AliyunServiceRoleForOpenSearch and Access authorization rules
An AccessKey pair for the RAM user. See Create an AccessKey pair

Important

Use RAM user credentials instead of your Alibaba Cloud account credentials for API calls. Never embed your AccessKey pair in source code or other publicly accessible materials.

Set environment variables

Set the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables before running the sample code.

Linux and macOS — Run the following commands. Replace <access_key_id> and <access_key_secret> with the AccessKey ID and AccessKey secret of your RAM user.
```
export ALIBABA_CLOUD_ACCESS_KEY_ID=<access_key_id>
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<access_key_secret>
```
Windows — Create an environment variable file, add ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET to the file, set their values to your AccessKey ID and AccessKey secret, and then restart Windows for the changes to take effect.

Implement scroll queries using OpenSearch SDK for Java V3.1

How it works

Submit a search query with a DeepPaging object attached. The first response includes a scroll ID and the first batch of results.
Extract the scroll ID from the response and use it to fetch the next batch.
Repeat until the response returns an empty result set.

The scroll expire setting (for example, 3m) specifies the validity period for the scroll ID used by the next scroll query. Default value: 1m. If you do not want to use the default value, you must set a validity period each time before you run a scroll query.

Sample code

package com.aliyun.opensearch;

import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.SearcherClient;
import com.aliyun.opensearch.sdk.dependencies.com.google.common.collect.Lists;
import com.aliyun.opensearch.sdk.dependencies.org.json.JSONObject;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.search.*;
import com.aliyun.opensearch.sdk.generated.search.general.SearchResult;
import com.aliyun.opensearch.search.SearchParamsBuilder;

import java.nio.charset.Charset;

public class testScroll {

    // Scroll queries do not support the aggregate, distinct, or rank clause,
    // but do support sorting by a single field.
    private static String appName = "The name of the OpenSearch application for which you want to implement scroll queries";
    private static String host = "The endpoint of the OpenSearch API in your region";

    public static void main(String[] args) {
        // Read credentials from environment variables.
        // Configure the environment variables before running this code.
        String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
        String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");

        // Print the default encoding format.
        System.out.println(String.format("file.encoding: %s", System.getProperty("file.encoding")));
        System.out.println(String.format("defaultCharset: %s", Charset.defaultCharset().name()));

        // Create an OpenSearch object.
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);

        // Create an OpenSearchClient object.
        OpenSearchClient serviceClient = new OpenSearchClient(openSearch);

        // Create a SearcherClient object.
        SearcherClient searcherClient = new SearcherClient(serviceClient);

        // Configure the config clause: paging settings, data format, and application name.
        Config config = new Config(Lists.newArrayList(appName));

        // The start parameter has no effect on scroll queries. The offset is always 0.
        config.setStart(0);
        // Return 5 documents per batch.
        config.setHits(5);

        // Set the data format to FULLJSON. Supported formats: JSON, FULLJSON.
        config.setSearchFormat(SearchFormat.FULLJSON);

        // Specify the fields to return.
        config.setFetchFields(Lists.newArrayList("id", "name", "phone", "int_arr", "literal_arr", "float_arr", "cate_id"));
        // Note: Set rerank_size using the setReRankSize method of the Rank class.

        // Create a SearchParams object and set the search query.
        SearchParams searchParams = new SearchParams(config);

        // To query multiple index fields, specify all fields in a single setQuery call.
        // Separate setQuery calls overwrite each other.
        searchParams.setQuery("name:'opensearch'");

        // (Optional) Set a filter condition.
        // searchParams.setFilter("cate_id<=3");

        // Create a DeepPaging object to enable scroll queries.
        DeepPaging deep = new DeepPaging();
        // Specify a validity period for the scroll ID to be used by the next scroll query, in minutes.
        // Default value: 1m. In this example, the value is set to 3m.
        deep.setScrollExpire("3m");

        // Attach the DeepPaging object to the search parameters.
        searchParams.setDeepPaging(deep);

        // Create a SearchParamsBuilder for easier parameter configuration.
        SearchParamsBuilder paramsBuilder = SearchParamsBuilder.create(searchParams);

        // Run the scroll queries.
        // This example assumes 25 matching documents with 5 documents per batch,
        // so 5 batches return results and the 6th batch is empty.
        SearchResult searchResult;
        try {
            searchResult = searcherClient.execute(paramsBuilder);
            String result = searchResult.getResult();
            JSONObject obj = new JSONObject(result);

            for (int i = 1; i <= 6; i++) {
                // When you run the first scroll query, a scroll ID is returned.
                // To obtain document data, use this scroll ID to run the scroll query again.
                deep.setScrollId(new JSONObject(obj.get("result").toString()).get("scroll_id").toString());
                // Specify a validity period for the scroll ID to be used by the next scroll query, in minutes.
                // Default value: 1m. In this example, the value is set to 3m.
                // If you do not want to use the default value, you must set a validity period each time before you run a scroll query.
                deep.setScrollExpire("3m");

                searchResult = searcherClient.execute(paramsBuilder);
                result = searchResult.getResult();
                obj = new JSONObject(result);

                System.out.println("Results for Query No." + i + ": " + obj.get("result"));

                // Sleep for 1 second to avoid exceeding the queries per second (QPS) limit.
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }

        } catch (OpenSearchException e) {
            e.printStackTrace();
        } catch (OpenSearchClientException e) {
            e.printStackTrace();
        }
    }
}

Key parameters

Parameter	Description
`appName`	The name of your OpenSearch application
`host`	The endpoint of the OpenSearch API for your region
`config.setHits(n)`	The number of documents per batch. Maximum: 500
`deep.setScrollExpire("3m")`	The validity period for the scroll ID used by the next scroll query. Default: `1m`. Set this value before each request if you do not want to use the default
`deep.setScrollId(id)`	The scroll ID returned by the previous scroll query

OpenSearch:Demo code for implementing scroll queries