全部產品
Search
文件中心

OpenSearch:非結構化文檔推送Demo

更新時間:Apr 22, 2025

Push 推送資料方式,主要是預先產生符合我們規定格式的待推送資料集合,最後在調用Push方法時,將這些資料集合一次性批量推送到應用中。

相關依賴

使用SDK上傳檔案所需依賴如下。

<dependency>
    <groupId>com.aliyun.opensearch</groupId>
    <artifactId>aliyun-sdk-opensearch</artifactId>
    <version>6.0.0</version>
</dependency>

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.83</version>
</dependency>
pip install alibabacloud_tea_util 
pip install alibabacloud_opensearch_util
pip install alibabacloud_credentials
V3.4.1 (2021-05-11)
下載地址: https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20230719/mxik/opensearch-sdk-php-release-v3.4.1.zip

配置環境變數

配置環境變數ALIBABA_CLOUD_ACCESS_KEY_IDALIBABA_CLOUD_ACCESS_KEY_SECRET

重要
  • 阿里雲帳號AccessKey擁有所有API的存取權限,建議您使用RAM使用者進行API訪問或日常營運,具體操作,請參見建立RAM使用者

  • 建立AccessKey ID和AccessKey Secret,請參考建立AccessKey

  • 如果您使用的是RAM使用者的AccessKey,請確保主帳號已授權AliyunServiceRoleForOpenSearch服務關聯角色,請參考OpenSearch-行業演算法版服務關聯角色,相關文檔參考訪問鑒權規則

  • 請不要將AccessKey ID和AccessKey Secret儲存到工程代碼裡,否則可能導致AccessKey泄露,威脅您帳號下所有資源的安全。

  • LinuxmacOS系統配置方法:

    執行以下命令,其中, <access_key_id>需替換為您RAM使用者的AccessKey ID,<access_key_secret>替換為您RAM使用者的AccessKey Secret。

    export ALIBABA_CLOUD_ACCESS_KEY_ID=<access_key_id> 
    export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<access_key_secret>
  • Windows系統配置方法

    1. 建立環境變數檔案,添加環境變數ALIBABA_CLOUD_ACCESS_KEY_IDALIBABA_CLOUD_ACCESS_KEY_SECRET,並寫入已準備好的AccessKey ID和AccessKey Secret。

    2. 重啟Windows系統生效。

Push Demo 範例代碼

BaseRequest參考:Python client 樣本

package com.leiyu.push;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;

import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;


public class PushNonStructuralLLM {
    private static String appName = "替換為應用程式名稱";
    private static String host = "替換應用的API訪問地址";
    private static String path = "/apps/%s/actions/knowledge-bulk";

    public static void main(String[] args) throws IOException {
        //使用者識別資訊
      	//從環境變數讀取配置的AccessKey ID和AccessKey Secret,運行程式碼範例前必須先配置環境變數
      	String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
      	String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
        
        String appPath = String.format(path, appName);

        //建立並構造OpenSearch對象
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
        //建立OpenSearchClient對象,並以OpenSearch對象作為構造參數
        OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);

        //單個doc構建
        Path path = Paths.get("C:/Users/LEIYU/Desktop/Word/test.docx");
        JSONObject oneRequest = new JSONObject();
        oneRequest.put("cmd", "BASE64");
      	//上傳非機構化文檔(pdf,word,html)cmd為BASE64
        JSONObject fields = new JSONObject();
        fields.put("id", "50"); 
      	//主鍵ID,唯一不重複。
        fields.put("title", "test.docx"); 
      	//帶尾碼的檔案名稱
        fields.put("url", "www.baidu.com");
      	//文檔連結
        fields.put("content", Base64.getEncoder().encodeToString(Files.readAllBytes(path)));
        fields.put("category", "docs");
        oneRequest.put("fields",fields);

        //可以同時添加多條資料
        final JSONArray request = new JSONArray();
        request.add(oneRequest);
        //request.add(twoRequest);

        Map<String, String> params = new HashMap<String, String>() {{
            put("format", "full_json");
            put("_POST_BODY", request.toString());
        }};
        try {
            OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
            //列印返回結果
            System.out.println(openSearchResult.getResult());
        } catch (OpenSearchException e) {
            e.printStackTrace();
        } catch (OpenSearchClientException e) {
            e.printStackTrace();
        }
    }

}
# -*- coding: utf-8 -*-

import time, os
import base64
from Tea.exceptions import TeaException
from Tea.request import TeaRequest
from alibabacloud_tea_util import models as util_models
from BaseRequest import Config, Client


class knowledge:
    def __init__(self, config: Config):
        self.Clients = Client(config=config)
        self.runtime = util_models.RuntimeOptions(
            connect_timeout=10000,
            read_timeout=10000,
            autoretry=False,
            ignore_ssl=False,
            max_idle_conns=50,
            max_attempts=3
        )
        self.header = {}

    def docBulk(self, app_name: str,doc_content: list):
        try:
            response = self.Clients._request(method="POST",
                                             pathname=f'/v3/openapi/apps/{app_name}/actions/knowledge-bulk',
                                             query={}, headers=self.header,
                                             body=doc_content, runtime=self.runtime)
            return response
        except Exception as e:
            print(e)


if __name__ == "__main__":
    # 配置統一的請求入口 注意:host需要去掉http://
    endpoint = "<endpoint>"
    # 支援 protocol 配置 HTTPS/HTTP
    endpoint_protocol = "HTTP"
    # 使用者識別資訊
    # 從環境變數讀取配置的AccessKey ID和AccessKey Secret,
    # 運行程式碼範例前必須先配置環境變數,參考文檔上面“配置環境變數”步驟
    access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
    access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
    # 支援 type 配置 sts/access_key 鑒權. 其中 type 預設為 access_key 鑒權. 使用 sts 可配置 RAM-STS 鑒權.
    # 備選參數為:  sts 或者 access_key
    auth_type = "access_key"
    # 如果使用 RAM-STS 鑒權, 請配置 security_token, 可使用 阿里雲 AssumeRole 擷取 相關 STS 鑒權結構.
    security_token = "<security_token>"
    # 配置請求使用的通用資訊.
    # 注意:security_token和type參數,如果不是子帳號需要省略
    Configs = Config(endpoint=endpoint, access_key_id=access_key_id, access_key_secret=access_key_secret,
                     security_token=security_token, type=auth_type, protocol=endpoint_protocol)
    # 建立 opensearch 智能問答版執行個體
    # 請將<應用程式名稱>替換為您建立的智能問答版執行個體名稱
    ops = knowledge(Configs)
    app_name = "<應用程式名稱>"

    # ---------------  智能問答版文檔非結構化文檔推送 ---------------
    # 只需修改本地的檔案路徑即可
    with open('/Users/liu/Downloads/test.docx', 'rb') as file:
        data = file.read()
        data_b64 = base64.b64encode(data)

        document = [
        {
            "fields": {
                "id": "1",
                "title": "test.docx",
                "url": "www.baidu.com",
                "content": data_b64,
                "category": "opensearch",
                "timestamp": 1691722088645,
                "score": 0.8821945219723084
            },
            "cmd": "BASE64"
        }
    ]

        documents = document
        res5 = ops.docBulk(app_name=app_name, doc_content=documents)
        print(res5)
說明
  • cmd 需要使用 "BASE64"。

  • 需要推送的非結構化內容放到content欄位中,詳情可參考上述範例代碼。

  • 需要推送的檔案名稱放title欄位中 。

delete操作

import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;

import java.util.HashMap;
import java.util.Map;

/**
 * 文檔刪除demo
 */
public class testDeleteDemo {


    private static String appName = "替換為應用程式名稱";
    private static String accesskey = "替換accesskey";
    private static String secret = "替換secret";
    private static String host = "替換應用的API訪問地址";
    private static String path = "/apps/%s/actions/knowledge-bulk";


    public static void main(String[] args) {


        String appPath = String.format(path, appName);


        //建立並構造OpenSearch對象
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
        //建立OpenSearchClient對象,並以OpenSearch對象作為構造參數
        OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);


        //刪除doc構建

        String request = "[{\"cmd\": \"DELETE\", \"fields\": {\"id\": \"測試刪除文檔的id\"}}]";
        Map<String, String> params = new HashMap<String, String>() {{
            put("format", "full_json");
            put("_POST_BODY", request);
        }};
        try {
            OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
            //列印返回結果
            System.out.println(openSearchResult.getResult());
        } catch (OpenSearchException e) {
            e.printStackTrace();
        } catch (OpenSearchClientException e) {
            e.printStackTrace();
        }
    }
}
# -*- coding: utf-8 -*-

import time, os
from typing import Dict, Any
from Tea.exceptions import TeaException
from Tea.request import TeaRequest
from alibabacloud_tea_util import models as util_models
from BaseRequest import Config, Client


class LLMDocumentPush:
    def __init__(self, config: Config):
        self.Clients = Client(config=config)
        self.runtime = util_models.RuntimeOptions(
            connect_timeout=10000,
            read_timeout=10000,
            autoretry=False,
            ignore_ssl=False,
            max_idle_conns=50,
            max_attempts=3
        )
        self.header = {}

    def docBulk(self, app_name: str,doc_content: list) -> Dict[str, Any]:
        try:
            response = self.Clients._request(method="POST",
                                             pathname=f'/v3/openapi/apps/{app_name}/actions/knowledge-bulk',
                                             query={}, headers=self.header,
                                             body=doc_content, runtime=self.runtime)
            return response
        except Exception as e:
            print(e)

if __name__ == "__main__":
    # 配置統一的請求入口 注意:host需要去掉http://
    endpoint = "<endpoint>"
    # 支援 protocol 配置 HTTPS/HTTP
    endpoint_protocol = "HTTP"
    # 使用者識別資訊
    # 從環境變數讀取配置的AccessKey ID和AccessKey Secret,
    # 運行程式碼範例前必須先配置環境變數,參考文檔上面“配置環境變數”步驟
    access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
    access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
    # 支援 type 配置 sts/access_key 鑒權. 其中 type 預設為 access_key 鑒權. 使用 sts 可配置 RAM-STS 鑒權.
    # 備選參數為:  sts 或者 access_key
    auth_type = "access_key"
    # 如果使用 RAM-STS 鑒權, 請配置 security_token, 可使用 阿里雲 AssumeRole 擷取 相關 STS 鑒權結構.
    security_token = "<security_token>"
    # 配置請求使用的通用資訊.
    # 注意:security_token和type參數,如果不是子帳號需要省略
    Configs = Config(endpoint=endpoint, access_key_id=access_key_id, access_key_secret=access_key_secret,
                     security_token=security_token, type=auth_type, protocol=endpoint_protocol)
    # 建立 opensearch 智能問答版執行個體
    # 請將<應用程式名稱>替換為您建立的智能問答版執行個體名稱
    ops = LLMDocumentPush(Configs)
    app_name = "<應用程式名稱>"

    # 刪除
    deletedocument = [{"cmd": "DELETE", "fields": {"id": "測試刪除文檔的id"}}]
    res5 = ops.docBulk(app_name=app_name, doc_content=deletedocument)
    print(res5)