OpenSearch LLM-Based Conversational Search Edition SDK を使用して構造化ドキュメントをプッシュする - OpenSearch

プッシュモードでデータをアップロードするには、まず有効な形式でデータセットを生成し、クライアントバッファにアップロードする必要があります。次に、push メソッドを呼び出して、データセットをアプリケーションに一度に送信します。

依存関係

OpenSearch SDK を使用してファイルをアップロードするには、次の依存関係を指定する必要があります。

Java

<dependency>
    <groupId>com.aliyun.opensearch</groupId>
    <artifactId>aliyun-sdk-opensearch</artifactId>
    <version>6.0.0</version>
</dependency>

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.83</version>
</dependency>

Python

pip install alibabacloud_tea_util 
pip install alibabacloud_opensearch_util
pip install alibabacloud_credentials

PHP

V3.4.1 (2021-05-11)
Download URL: https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20230719/mxik/opensearch-sdk-php-release-v3.4.1.zip

デモコード

BaseRequest の詳細については、「Python クライアントを使用するためのデモコード」をご参照ください。

使用方法に関する注意事項

プッシュできるデータセットは有効な形式である必要があります。有効な形式を表示するには、OpenSearch コンソールにログインし、LLM-Based Conversational Search Edition インスタンスの詳細ページに移動して、左側のペインで [構成センター] > [データ構成] を選択します。表示されるページで、[ファイルのインポート] をクリックします。表示されるパネルで、[データサンプル] をクリックしてサンプルファイルをダウンロードします。サンプルファイルをテンプレートとして使用して、データセットを生成できます。
JSONObject オブジェクトと JSONArray オブジェクトを使用してデータセットを生成し、push メソッドを呼び出して、データセットをアプリケーションに一度に送信することもできます。
一度にプッシュされるドキュメントの数が制限を超えると、エラーが発生し、プッシュは失敗します。詳細については、「制限」をご参照ください。

ADD 操作:

Java

import java.util.HashMap;
import java.util.Map;


import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;


import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;


// ドキュメントを追加または更新するためのデモ
/**
 * Demo for adding or updating documents.
 */
public class testPushDemo {


    // ドキュメントをプッシュする OpenSearch アプリケーションの名前
    private static String appName = "The name of the OpenSearch application to which you want to push documents";
    // AccessKey ID
    private static String accesskey = "The AccessKey ID";
    // AccessKey シークレット
    private static String secret = "The AccessKey secret";
    // OpenSearch アプリケーションの API エンドポイント
    private static String host = "The API endpoint of the OpenSearch application";
    private static String path = "/apps/%s/actions/knowledge-bulk";


    public static void main(String[] args) {


        String appPath = String.format(path, appName);


        // OpenSearch オブジェクトを作成します。
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
        // OpenSearch オブジェクトをパラメーターとして使用して、OpenSearchClient オブジェクトを作成します。
        OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);


        // 単一のドキュメントを追加するための JSON オブジェクトを作成します。
        JSONObject oneRequest = new JSONObject();
        oneRequest.put("cmd", "ADD");
        JSONObject fields = new JSONObject();
        // テストドキュメントの ID
        fields.put("id", "The ID of the test document");
        // テストドキュメントのタイトル
        fields.put("title", "The title of the test document");
        // テストドキュメントの URL
        fields.put("url", "The URL of the test document");
        // テストドキュメントのコンテンツ
        fields.put("content", "The content of the test document");
        // テストドキュメントのカテゴリ
        fields.put("category", "The category of the test document");
        oneRequest.put("fields", fields);


        // JSON 配列を作成します。JSON 配列を使用して、一度に複数のドキュメントを追加できます。
        JSONArray request = new JSONArray();
        request.add(oneRequest);


        Map<String, String> params = new HashMap<String, String>() {{
            put("format", "full_json");
            put("_POST_BODY", request.toJSONString());
        }};
        try {
            OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
            // 返された結果を表示します。
            System.out.println(openSearchResult.getResult());
        } catch (OpenSearchException e) {
            e.printStackTrace();
        } catch (OpenSearchClientException e) {
            e.printStackTrace();
        }
    }
}

Python

# -*- coding: utf-8 -*-

import time, os
from typing import Dict, Any
from Tea.exceptions import TeaException
from Tea.request import TeaRequest
from alibabacloud_tea_util import models as util_models
from BaseRequest import Config, Client


class LLMDocumentPush:
    def __init__(self, config: Config):
        self.Clients = Client(config=config)
        self.runtime = util_models.RuntimeOptions(
            connect_timeout=10000,
            read_timeout=10000,
            autoretry=False,
            ignore_ssl=False,
            max_idle_conns=50,
            max_attempts=3
        )
        self.header = {}

    def docBulk(self, app_name: str,doc_content: list) -> Dict[str, Any]:
        try:
            response = self.Clients._request(method="POST",
                                             pathname=f'/v3/openapi/apps/{app_name}/actions/knowledge-bulk',
                                             query={}, headers=self.header,
                                             body=doc_content, runtime=self.runtime)
            return response
        except Exception as e:
            print(e)

if __name__ == "__main__":
    # OpenSearch API のエンドポイントを指定します。値には http:// プレフィックスは含まれません。
    endpoint = "<endpoint>"
    # リクエストプロトコルを指定します。有効な値：HTTPS および HTTP。
    endpoint_protocol = "HTTP"
    # AccessKey ペアを指定します。
    # 環境変数から AccessKey ID と AccessKey シークレットを取得します。
    # このコードを実行する前に、環境変数を構成する必要があります。
    access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
    access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
    # 認証方式を指定します。デフォルト値：access_key。sts の値は、Resource Access Management (RAM) と Security Token Service (STS) に基づく認証を指定します。
    # 有効な値：sts および access_key。
    auth_type = "access_key"
    # RAM と STS に基づく認証を使用する場合は、security_token パラメーターを指定する必要があります。Alibaba Cloud RAM の AssumeRole 操作を呼び出して、STS トークンを取得できます。
    security_token = "<security_token>"
    # 共通のリクエストパラメーターを指定します。
    # 注：security_token パラメーターと type パラメーターは、SDK を RAM ユーザーとして使用する場合にのみ必要です。
    Configs = Config(endpoint=endpoint, access_key_id=access_key_id, access_key_secret=access_key_secret,
                     security_token=security_token, type=auth_type, protocol=endpoint_protocol)
    # OpenSearch LLM-Based Conversational Search Edition インスタンスを作成します。
    # <アプリケーション名> を OpenSearch LLM-Based Conversational Search Edition インスタンスの名前に置き換えます。
    ops = LLMDocumentPush(Configs)
    app_name = "<Application name>"

    # --------------- 構造化ドキュメントを OpenSearch LLM-Based Conversational Search Edition インスタンスにプッシュします ---------------

    document = [
        {
            "fields": {
                "id": "1",
                "title": "Benefits",
                "url": "https://www.alibabacloud.com/help/document_detail/464900.html",
                "content": "Industry Algorithm Edition: Intelligence: Industry Algorithm Edition provides rich built-in and customized algorithm models and introduces industry retrieval and sorting algorithms based on the search needs of different industries. This way, optimal search results are ensured. Flexibility and customization: Industry Algorithm Edition allows you to customize configurations such as algorithm models, application schema, data processing, query analysis, and sorting to meet personalized search requirements. This improves the click-through rate of search results, accelerates service iteration, and greatly shortens the rollout cycle. Security and stability: O&M services are available on a 24/7 basis. You can get technical support by submitting tickets online or using the telephone. A series of complete fault emergency response mechanisms are provided, such as fault monitoring, automatic alerting, and rapid troubleshooting. Alibaba Cloud assigns AccessKey IDs and AccessKey secrets to users to control permissions on OpenSearch. This ensures data security by isolating the data of different users. Multiple copies of data are backed up to implement data redundancy, which ensures data security. Auto scaling: The auto scaling capability allows you to scale up or down the resources based on your business requirements. Rich extended features: OpenSearch supports a variety of extended search features, such as top searches, hints, drop-down suggestions, and report statistics. This helps you view and analyze search results. Out-of-the-box service: You do not need to deploy or perform O&M operations on clusters before you access OpenSearch. High-performance Search Edition: High throughput: A single table supports tens of thousands of write transactions per second (TPS) and data updates within seconds. Security and stability: O&M services are available on a 24/7 basis. You can get technical support by submitting tickets online or using the telephone. A series of complete fault emergency response mechanisms are provided, such as fault monitoring, automatic alerting, and rapid troubleshooting. Alibaba Cloud assigns AccessKey IDs and AccessKey secrets to users to control permissions on OpenSearch. This ensures data security by isolating the data of different users. Multiple copies of data are backed up to implement data redundancy, which ensures data security. Auto scaling: The auto scaling capability allows you to scale up or down the resources based on your business requirements. Out-of-the-box service: You do not need to deploy or perform O&M operations on clusters before you access OpenSearch. Vector Search Edition: Stability: The underlying layer of Vector Search Edition is developed by using the C++ programming language. After more than ten years of development, Vector Search Edition provides stable search services for various core business systems. Vector Search Edition is suitable for core search scenarios that require high stability. High efficiency: Vector Search Edition provides a distributed search engine that allows you to retrieve large amounts of data. Vector Search Edition supports real-time data updates within seconds. Therefore, Vector Search Edition is applicable to query and search scenarios that are time-sensitive. Cost-effectiveness: Vector Search Edition supports multiple policies for index compression and multi-value index loading tests. You can use Vector Search Edition to meet your query requirements at low costs. Vector algorithm: Vector Search Edition supports vector searches for various types of unstructured data, such as voice data, images, videos, natural languages, and behavior data. SQL query: Vector Search Edition allows you to use SQL syntax and join tables online and provides a variety of built-in user-defined functions (UDFs) and function customization mechanisms to meet different requirements for data retrieval. To facilitate SQL development and testing, an SQL studio is integrated into the O&M system of Vector Search Edition. Retrieval Engine Edition: Stability: The underlying layer of Retrieval Engine Edition is developed by using the C++ programming language. After more than ten years of development, Retrieval Engine Edition provides stable search services for various core business systems. Retrieval Engine Edition is suitable for core search scenarios that require high stability. High efficiency: Retrieval Engine Edition provides a distributed search engine that allows you to retrieve large amounts of data. Retrieval Engine Edition supports real-time data updates within seconds. Therefore, Retrieval Engine Edition is suitable for query and search scenarios that are time-sensitive. Cost-effectiveness: Retrieval Engine Edition supports multiple policies for index compression and multi-value index loading tests. You can use Retrieval Engine Edition to meet your query requirements at low costs. Enriched features: Retrieval Engine Edition supports multiple types of analyzers and indexes and powerful query syntax. This service can meet your data retrieval requirements. Retrieval Engine Edition also supports plug-ins. This way, you can customize your own business logic. SQL query: Retrieval Engine Edition allows you to use SQL syntax and join tables online, and provides a variety of built-in UDFs and function customization mechanisms to meet different requirements for data retrieval. To facilitate SQL development and testing, an SQL studio will be integrated into the O&M system of Retrieval Engine Edition in later versions.",
                "category": "opensearch",
                "timestamp": 1691722088645,
                "score": 0.8821945219723084
            },
            "cmd": "ADD"
        },
        {
            "fields": {
                "id": "2",
                "title": "Scenarios",
                "url": "https://www.alibabacloud.com/help/document_detail/464901.html",
                "content": "Industry Algorithm Edition: Features: provides industry built-in capabilities such as semantic understanding and machine learning-based algorithms, and supports lightweight custom models and search guidance. This helps you build intelligent search services in a quick manner. <br/><img src=\"https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/4685770861/p622804.png\" width=300>Business scenarios: intelligent searches in industries such as e-commerce, content communities, and games, and educational Q&A searches. Target customers: Industry Algorithm Edition is out-of-the-box and suitable for small and medium-sized enterprises and developers that have intelligent search requirements. High-performance Search Edition: Features: Deep optimization is performed for big data search performance. OpenSearch supports quick response within seconds and real-time queries, and provides a one-stop solution for you to build big data search services in various scenarios such as searches for orders, coupons, logistics, and insurance policies. <br/><img src=\"https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/3685770861/p622799.png\" width=300>Business scenarios: searches for orders, coupons, logistics, and insurance policies. Target customers: High-performance Search Edition is out-of-the-box and suitable for small and medium-sized enterprises and developers that have high requirements for search performance. Vector Search Edition: Features: provides a large-scale distributed and high-performance vector search solution in Alibaba Cloud. Vector Search Edition supports multiple search algorithms to achieve a balance between precision and performance. Other features such as building index in streaming mode and instant queries are also supported. <br/><img src=\"https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/4685770861/p622805.png\" width=300>Business scenarios: graph searches, audio or video searches, natural language processing (NLP) vector searches, and intelligent Q&A. Target customers: enterprises and developers that face large-scale vectors and require flexible development. Retrieval Engine Edition: Features: provides you with high-performance, low-cost, easy-to-use, and large-scale online search services. Retrieval Engine Edition supports customized development based on your business requirements and fast iteration of search algorithms. <br/><img src=\"https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/4685770861/p622806.png\" width=300>Business scenarios: searches for enterprise information, tags, and financial research reports, and intelligent searches. Target customers: enterprises and developers that face a large amount of data and require flexible data development.",
                "category": "opensearch",
                "timestamp": 1691722088646,
                "score": 0.8993507402088953
            },
            "cmd": "ADD"
        }
    ]

    documents = document
    res5 = ops.docBulk(app_name=app_name, doc_content=documents)
    print(res5)

PHP

<?php
  
require_once($path . "/OpenSearch/Autoloader/Autoloader.php");

use OpenSearch\Client\OpenSearchClient;

// AccessKey ペアを指定します。
// 環境変数から AccessKey ID と AccessKey シークレットを取得します。
// このコードを実行する前に、環境変数を構成する必要があります。
// AccessKey ID を指定します。
$accessKeyId = getenv('ALIBABA_CLOUD_ACCESS_KEY_ID');
// AccessKey シークレットを指定します。
$secret = getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET');
$end Point = '<OpenSearch アプリケーションの API エンドポイント>';
$appName = '<アプリケーション名>';
$options = array('debug' => true);
$requestBody = "[
 {
  \"fields\":{
   \"id\":\"15739\",
   \"title\":\"Benefits\",
   \"url\":\"https://www.alibabacloud.com/help/document_detail/464900.html\",
   \"content\":\"Industry Algorithm Edition: Features: provides industry built-in capabilities such as semantic understanding and machine learning-based algorithms, and supports lightweight custom models and search guidance. This helps you build intelligent search services in a quick manner. <br/><img src=\"https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/4685770861/p622804.png\"width=300>Business scenarios: intelligent searches in industries such as e-commerce, content communities, and games, and educational Q&A searches. Target customers: Industry Algorithm Edition is out-of-the-box and suitable for small and medium-sized enterprises and developers that have intelligent search requirements. High-performance Search Edition: Features: Deep optimization is performed for big data search performance. OpenSearch supports quick response within seconds, real-time queries, and provides a one-stop solution for you to build big data search services in various scenarios such as searches for orders, coupons, logistics, and insurance policies. \",
   \"category\":\"opensearch\",
   \"timestamp\":1691722088646,\"score\":0.8993507402088953},
   \"cmd\":\"ADD\"
 }
]";

$client = new OpenSearchClient($accessKeyId, $secret, $endPoint, $options);

$uri = "/apps/{$appName}/actions/knowledge-bulk";

try{
    $ret = $client->post($uri, $requestBody);
    print_r(json_decode($ret->result, true));
}catch (\Throwable $e) {
    print_r($e);
}

説明

ドキュメントを更新するための個別の操作は提供されていません。ドキュメントを更新するには、ADD 操作を使用し、リクエストですべてのフィールドを指定する必要があります。指定されていないフィールドは、更新されたドキュメントでは空のままになります。これは、ADD 操作によって既存のデータが新しいデータで上書きされるためです。
title パラメーターの値は、最大 64 文字です。長さが制限を超えると、ドキュメントを追加できません。

DELETE 操作:

Java

import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;

import java.util.HashMap;
import java.util.Map;

/**
 * ドキュメントを削除するためのデモ。
 */
public class testDeleteDemo {


    // データを削除する OpenSearch アプリケーションの名前
    private static String appName = "The name of the OpenSearch application from which you want to delete data";
    // AccessKey ID
    private static String accesskey = "The AccessKey ID";
    // AccessKey シークレット
    private static String secret = "The AccessKey secret";
    // OpenSearch アプリケーションの API エンドポイント
    private static String host = "The API endpoint of the OpenSearch application";
    private static String path = "/apps/%s/actions/knowledge-bulk";


    public static void main(String[] args) {


        String appPath = String.format(path, appName);


        // OpenSearch オブジェクトを作成します。
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
        // OpenSearch オブジェクトをパラメーターとして使用して、OpenSearchClient オブジェクトを作成します。
        OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);


        // ドキュメントを削除するための JSON オブジェクトを作成します。

        // テストドキュメントの ID
        String request = "[{\"cmd\": \"DELETE\", \"fields\": {\"id\": \"The ID of the test document.\"}}]";
        Map<String, String> params = new HashMap<String, String>() {{
            put("format", "full_json");
            put("_POST_BODY", request);
        }};
        try {
            OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
            // 返された結果を表示します。
            System.out.println(openSearchResult.getResult());
        } catch (OpenSearchException e) {
            e.printStackTrace();
        } catch (OpenSearchClientException e) {
            e.printStackTrace();
        }
    }
}

Python

# -*- coding: utf-8 -*-

import time, os
from typing import Dict, Any
from Tea.exceptions import TeaException
from Tea.request import TeaRequest
from alibabacloud_tea_util import models as util_models
from BaseRequest import Config, Client


class LLMDocumentPush:
    def __init__(self, config: Config):
        self.Clients = Client(config=config)
        self.runtime = util_models.RuntimeOptions(
            connect_timeout=10000,
            read_timeout=10000,
            autoretry=False,
            ignore_ssl=False,
            max_idle_conns=50,
            max_attempts=3
        )
        self.header = {}

    def docBulk(self, app_name: str,doc_content: list) -> Dict[str, Any]:
        try:
            response = self.Clients._request(method="POST",
                                             pathname=f'/v3/openapi/apps/{app_name}/actions/knowledge-bulk',
                                             query={}, headers=self.header,
                                             body=doc_content, runtime=self.runtime)
            return response
        except Exception as e:
            print(e)

if __name__ == "__main__":
    # OpenSearch API のエンドポイントを指定します。値には http:// プレフィックスは含まれません。
    endpoint = "<endpoint>"
    # リクエストプロトコルを指定します。有効な値：HTTPS および HTTP。
    endpoint_protocol = "HTTP"
    # AccessKey ペアを指定します。
    # 環境変数から AccessKey ID と AccessKey シークレットを取得します。
    # このコードを実行する前に、環境変数を構成する必要があります。
    access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
    access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
    # 認証方式を指定します。デフォルト値：access_key。sts の値は、RAM と STS に基づく認証を指定します。
    # 有効な値：sts および access_key。
    auth_type = "access_key"
    # RAM と STS に基づく認証を使用する場合は、security_token パラメーターを指定する必要があります。Alibaba Cloud RAM の AssumeRole 操作を呼び出して、STS トークンを取得できます。
    security_token = "<security_token>"
    # 共通のリクエストパラメーターを指定します。
    # 注：security_token パラメーターと type パラメーターは、SDK を RAM ユーザーとして使用する場合にのみ必要です。
    Configs = Config(endpoint=endpoint, access_key_id=access_key_id, access_key_secret=access_key_secret,
                     security_token=security_token, type=auth_type, protocol=endpoint_protocol)
    # OpenSearch LLM-Based Conversational Search Edition インスタンスを作成します。
    # <アプリケーション名> を OpenSearch LLM-Based Conversational Search Edition インスタンスの名前に置き換えます。
    ops = LLMDocumentPush(Configs)
    app_name = "<Application name>"

    # ドキュメントを削除します。
    deletedocument = [{"cmd": "DELETE", "fields": {"id": "The ID of the test document."}}]  // テストドキュメントの ID
    res5 = ops.docBulk(app_name=app_name, doc_content=deletedocument)
    print(res5)