All Products
Search
Document Center

OpenSearch:Service deployment

Last Updated:Aug 05, 2025

You can deploy models from AI Search Open Platform, ModelScope, and custom models. This is to provide inference services with higher concurrency and lower latency.

Model list

Model category

Model name

Model source

Text vectorization

  • OpenSearch Text Vectorization Service-001: Provides multilingual (40+) text vectorization service, with maximum input text length of 300 tokens and output vector dimension of 1536.

  • OpenSearch General Text Vectorization Service-002: Provides multilingual (100+) text vectorization service, with maximum input text length of 8192 tokens and output vector dimension of 1024.

  • OpenSearch Text Vectorization Service-Chinese-001: Provides Chinese text vectorization service, with maximum input text length of 1024 tokens and output vector dimension of 768.

  • OpenSearch Text Vectorization Service-English-001: Provides English text vectorization service, with maximum input text length of 512 tokens and output vector dimension of 768.

For model invocation, see: Text vector.

AI Search Open Platform

GTE Multilingual Universal Text Representation ModelGTE Multilingual General Text Vector Model: Maximum context token length of 8192, supporting over 70 languages.

ModelScope

Text vectorization models independently trained in Model customization.

Model customization

Re-ranking

  • BGE Reranking Model: Provides document scoring service based on the BGE model. It can rank documents from high to low based on the relevance between the query and document content, and output corresponding scoring results. Supports both Chinese and English, with maximum input token length of 512 (Query+Doc length).

  • OpenSearch Self-developed Reranking Model: Trained with multi-industry datasets, provides high-quality reranking service that can rank documents from high to low based on semantic relevance between Query and DOC. Supports both Chinese and English, with maximum input token length of 512 (Query+doc length).

For model invocation, see: Ranking service.

AI Search Open Platform

Multimodal Vector

ModelScope

Deploy a service

  1. In the AI Search Open Platform console, select Model Service > Service Deployment, and then click Deploy Service.

    If you use a RAM account to create, modify, or view service details, you need to grant the RAM account the relevant operation permissions for Model Service-Service Deployment in advance.

  2. On the Deploy Service page, configure the service name, deployment region, and other information.

    image

    • Currently supported deployment regions is Germany (Frankfurt).

    • Resource Type: The type for model deployment.

    • Estimated Price: The cost of model deployment.

  3. Click Deploy, and the system starts deploying the service. Service status descriptions:

    • Deploying: The system is deploying the service, the service is temporarily unavailable. In the service list, click Manage to view service details or click Delete to delete the task.

    • Normal: Indicates successful deployment. In the service list, you can click Manage to view service details. On the service details page, you can use Change Configuration to modify the resource configuration of the service. In the service list, you can click Delete to delete the service.

    • Deployment Failed: View deployment details, redeploy, or delete the deployment task.

View service invocation information

Log on to the AI Search Open Platform console, select Model Service > Service Deployment, and click Manage in the service list.

image

  • Service ID: This parameter is required when calling the service through SDK.

  • Public and private API: You can choose to call the model service through a public or private address.

  • Token: The credential for service invocation. Divided into public network Token and private network Token. When calling the service through a public or private address, you need to fill in the corresponding Token.

  • API-KEY: Used for identity authentication when calling services through API-KEY.

    image

Test the service

When testing the model service using curl commands, you need to provide the API-KEY and Token information.

Execute the following code to call the text vectorization model to embedding the input content "Science and technology is the primary productive force" and "opensearch product documentation":

curl -X POST \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer Your-API-KEY" \
     -H "Token: NjU0ZDkzYjUwZTQ1NDI1OGRiN2ExMmFmNjQxMDYyN2M5*******==" \
     "http://default-0fm.platform-cn-hangzhou.opensearch.aliyuncs.com/v3/openapi/deployments/******_1zj19x_1yc/predict" \
     -d '{
           "input": [
             "Science and technology is the primary productive force",
             "opensearch product documentation"
           ],
           "input_type": "query",
           "dimension" : 567 # Only effective when deploying a custom model with vector dimensionality reduction enabled, and the dimension cannot be greater than the foundation model dimension
         }'

Correct response result:

{
  "embeddings": [
    {
      "index": 0,
      "embedding": [
        -0.028656005859375,
        0.0218963623046875,
        -0.04168701171875,
        -0.0440673828125,
        0.02142333984375,
        0.012345678901234568,
        ...
        0.0009876543210987654
      ]
    }
  ]
}

Call the service through SDK

After test, refer to the following Python SDK invocation example to integrate the SDK into your business system for service invocation.

import json

from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetPredictionRequest
from alibabacloud_searchplat20240529.models import GetPredictionHeaders
from alibabacloud_tea_util import models as util_models

if __name__ == '__main__':
    config = Config(bearer_token="API-KEY",
                    # endpoint configuration for unified request entry, remove http:// or https://
                    endpoint="default-xxx.platform-cn-shanghai.opensearch.aliyuncs.com",
                    # protocol supports HTTPS and HTTP
                    protocol="http")
    client = Client(config=config)

    # --------------- Request body parameters ---------------
    request = GetPredictionRequest().from_map({"body":{"input_type": "document", "input": ["search", "test"]}})

    headers = GetPredictionHeaders(token="xxxxxxxxYjIyNjNjMjc2MTU1MTQ3MmI0ZmQ3OGQ0ZjJlMxxxxxxxx==")

    runtime = util_models.RuntimeOptions()

    # deploymentId: deployment id
    response = client.get_prediction_with_options("Service ID of the deployed service" ,request, headers, runtime)
    print(response)