All Products
Search
Document Center

OpenSearch:Demo code for pushing unstructured documents

Last Updated:Nov 21, 2023

To upload data in push mode, you must first generate datasets in the valid format and upload the datasets to the client buffer. Then, call the push method to submit the datasets to the application at a time.

Dependencies

To use OpenSearch SDK to upload files, you must specify the following Maven dependencies:

<dependency>
 <groupId>com.aliyun.opensearch</groupId>
 <artifactId>aliyun-sdk-opensearch</artifactId>
 <version>4.0.0</version>
</dependency>

Configure environment variables

Configure the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables.

Important
  • The AccessKey pair of an Alibaba Cloud account can be used to access all API operations. We recommend that you use a Resource Access Management (RAM) user to call API operations or perform routine O&M. For information about how to use a RAM user, see Create a RAM user.

  • For information about how to create an AccessKey pair, see Create an AccessKey pair.

  • If you use the AccessKey pair of a RAM user, make sure that the required permissions are granted to the AliyunServiceRoleForOpenSearch role by using your Alibaba Cloud account. For more information, see AliyunServiceRoleForOpenSearch and Access authorization rules.

  • We recommend that you do not include your AccessKey pair in materials that are easily accessible to others, such as the project code. Otherwise, your AccessKey pair may be leaked and resources in your account become insecure.

  • Linux and macOS

    Run the following commands. Replace <access_key_id> and <access_key_secret> with the AccessKey ID and AccessKey secret of the RAM user that you use.

    export ALIBABA_CLOUD_ACCESS_KEY_ID=<access_key_id> 
    export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<access_key_secret>
  • Windows

    1. Create an environment variable file, add the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables to the file, and then set the environment variables to your AccessKey ID and AccessKey secret.

    2. Restart Windows for the AccessKey pair to take effect.

Demo code for pushing documents

package com.leiyu.push;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;

import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;


public class PushNonStructuralLLM {
    private static String appName = "The name of the OpenSearch application to which you want to upload data";
    private static String host = "The endpoint of the OpenSearch API in your region";
    private static String path = "/apps/%s/actions/knowledge-bulk";

    public static void main(String[] args) throws IOException {
        // Specify your AccessKey pair.
      	// Obtain the AccessKey ID and the AccessKey secret from environment variables. You must configure the environment variables before you run the sample code.
      	String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
      	String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
        
        String appPath = String.format(path, appName);

        // Create an OpenSearch object.
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
        // Use the OpenSearch object as a parameter to create an OpenSearchClient object.
        OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);

        // Create a JSON object for adding a single document.
        Path path = Paths.get("C:/Users/LEIYU/Desktop/Word/test.docx");
        JSONObject oneRequest = new JSONObject();
        oneRequest.put("cmd", "BASE64");
        JSONObject fields = new JSONObject();
        fields.put("id", 50);
        fields.put("title", "test.docx");
        fields.put("url", "www.baidu.com");
        fields.put("content", Base64.getEncoder().encodeToString(Files.readAllBytes(path)));
        fields.put("category", "docs");
        oneRequest.put("fields",fields);

        // Create a JSON array. You can use the JSON array to add multiple documents at a time.
        final JSONArray request = new JSONArray();
        request.add(oneRequest);
        //request.add(twoRequest);

        Map<String, String> params = new HashMap<String, String>() {{
            put("format", "full_json");
            put("_POST_BODY", request.toString());
        }};
        try {
            OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
            // Display the returned result.
            System.out.println(openSearchResult.getResult());
        } catch (OpenSearchException e) {
            e.printStackTrace();
        } catch (OpenSearchClientException e) {
            e.printStackTrace();
        }
    }

}
Note
  • The command must be encoded in the Base64 format.

  • The content parameter specifies the unstructured content to be pushed. For more information, see the preceding demo code.

  • The title parameter specifies the name of the document to be pushed.