You can deploy models from AI Search Open Platform, ModelScope, and custom models. This is to provide inference services with higher concurrency and lower latency.
Model list
Model category | Model name | Model source |
Text vectorization |
For model invocation, see: Text vector. | AI Search Open Platform |
GTE Multilingual Universal Text Representation ModelGTE Multilingual General Text Vector Model: Maximum context token length of 8192, supporting over 70 languages. | ModelScope | |
Text vectorization models independently trained in Model customization. | Model customization | |
Re-ranking |
For model invocation, see: Ranking service. | AI Search Open Platform |
Multimodal Vector |
| ModelScope |
Deploy a service
In the AI Search Open Platform console, select Model Service > Service Deployment, and then click Deploy Service.
If you use a RAM account to create, modify, or view service details, you need to grant the RAM account the relevant operation permissions for Model Service-Service Deployment in advance.
On the Deploy Service page, configure the service name, deployment region, and other information.

Currently supported deployment regions is Germany (Frankfurt).
Resource Type: The type for model deployment.
Estimated Price: The cost of model deployment.
Click Deploy, and the system starts deploying the service. Service status descriptions:
Deploying: The system is deploying the service, the service is temporarily unavailable. In the service list, click Manage to view service details or click Delete to delete the task.
Normal: Indicates successful deployment. In the service list, you can click Manage to view service details. On the service details page, you can use Change Configuration to modify the resource configuration of the service. In the service list, you can click Delete to delete the service.
Deployment Failed: View deployment details, redeploy, or delete the deployment task.
View service invocation information
Log on to the AI Search Open Platform console, select Model Service > Service Deployment, and click Manage in the service list.

Service ID: This parameter is required when calling the service through SDK.
Public and private API: You can choose to call the model service through a public or private address.
Token: The credential for service invocation. Divided into public network Token and private network Token. When calling the service through a public or private address, you need to fill in the corresponding Token.
API-KEY: Used for identity authentication when calling services through API-KEY.

Test the service
When testing the model service using curl commands, you need to provide the API-KEY and Token information.
Execute the following code to call the text vectorization model to embedding the input content "Science and technology is the primary productive force" and "opensearch product documentation":
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer Your-API-KEY" \
-H "Token: NjU0ZDkzYjUwZTQ1NDI1OGRiN2ExMmFmNjQxMDYyN2M5*******==" \
"http://default-0fm.platform-cn-hangzhou.opensearch.aliyuncs.com/v3/openapi/deployments/******_1zj19x_1yc/predict" \
-d '{
"input": [
"Science and technology is the primary productive force",
"opensearch product documentation"
],
"input_type": "query",
"dimension" : 567 # Only effective when deploying a custom model with vector dimensionality reduction enabled, and the dimension cannot be greater than the foundation model dimension
}'
Correct response result:
{
"embeddings": [
{
"index": 0,
"embedding": [
-0.028656005859375,
0.0218963623046875,
-0.04168701171875,
-0.0440673828125,
0.02142333984375,
0.012345678901234568,
...
0.0009876543210987654
]
}
]
}Call the service through SDK
After test, refer to the following Python SDK invocation example to integrate the SDK into your business system for service invocation.
import json
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetPredictionRequest
from alibabacloud_searchplat20240529.models import GetPredictionHeaders
from alibabacloud_tea_util import models as util_models
if __name__ == '__main__':
config = Config(bearer_token="API-KEY",
# endpoint configuration for unified request entry, remove http:// or https://
endpoint="default-xxx.platform-cn-shanghai.opensearch.aliyuncs.com",
# protocol supports HTTPS and HTTP
protocol="http")
client = Client(config=config)
# --------------- Request body parameters ---------------
request = GetPredictionRequest().from_map({"body":{"input_type": "document", "input": ["search", "test"]}})
headers = GetPredictionHeaders(token="xxxxxxxxYjIyNjNjMjc2MTU1MTQ3MmI0ZmQ3OGQ0ZjJlMxxxxxxxx==")
runtime = util_models.RuntimeOptions()
# deploymentId: deployment id
response = client.get_prediction_with_options("Service ID of the deployed service" ,request, headers, runtime)
print(response)