Deploy models from Alibaba Cloud Model Studio in Hologres using an API key, then call them with AI Functions. Your data stays in the database throughout inference and AI application development.
This integration is in beta.
Hologres integrates with Model Studio so you can call large language models (LLMs) directly from SQL. Authenticate once with an API key, deploy a supported model, and invoke it without moving data out of the database. Model Studio provides OpenAI-compatible APIs, visual application building tools, and out-of-the-box model services — supporting Qwen and leading third-party models including DeepSeek, Kimi, GLM, and MiniMax — with no infrastructure to manage.
Prerequisites
Before you begin, make sure that you have:
-
A Hologres instance running V4.0.18 or later, or V4.1.2 or later, in the Ulanqab or Beijing region
-
An Alibaba Cloud Model Studio API key — see Get an API key
Billing
Network fees: Model Studio is available in the Beijing and Singapore regions. Calling it from a Hologres instance may incur cross-region network fees. During the beta period, no network fees apply. Check the official website for updates on when billing begins.
Model invocation fees: Model Studio charges per invocation based on usage volume. See Model Invocation Billing and the Model Studio console for details.
Limitations
-
Supported Hologres versions: V4.0.18 or later, and V4.1.2 or later.
-
Supported regions: Ulanqab and Beijing only (currently).
Deploy a model
In the Hologres Management Console, go to Instances, find your target instance, and select AI Models at the top of the instance details page. On the Models page, select Alibaba Cloud Model Studio as the model provider and configure the following:
| Parameter | Description |
|---|---|
| Model Type | The model to deploy. Must be from the model list below. Unlisted models are not supported. |
| API key | Your Model Studio API key, used for authentication. Get one from the Model Studio console. |
| Model parameter settings | Model-specific parameters configured after selecting a model type. See Parameter descriptions below. |
Parameter descriptions
Parameters vary by model category. For full details, see the Model Studio console and API reference.
Text models support:
| Parameter | Description | Valid range |
|---|---|---|
max_tokens |
Maximum tokens returned per request. The per-model maximum appears in the Model Studio documentation. | Model-specific |
temperature |
Sampling temperature controlling output diversity. | [0, 2.0) |
top_p |
Nucleus sampling probability threshold. | (0, 1.0] |
temperatureandtop_pboth control output diversity. Configure only one.
Qwen-Omni series models support additional parameters: modalities (text or audio output), audio.voice (voice tone), and audio.format (audio format, such as WAV).
Translation models support parameters to improve translation quality. See Translation Models for the full language list and usage.
| Parameter | Description |
|---|---|
source_lang |
Source language code. See the language list in the Model Studio documentation. |
terms |
Translation terms as a JSON array of source-target pairs. |
tm_list |
Translation memory — source-target sentence pairs in JSON format used as examples. |
domains |
Domain context passed as plain text to improve domain-specific translation. |
Example configuration:
{
"extra_body": {
"translation_options": {
"source_lang": "zh",
"domains": "The sentence is from the Alibaba Cloud IT domain.",
"terms": [
{"source": "生物传感器", "target": "biological sensor"},
{"source": "身体健康状况", "target": "health status of the body"}
],
"tm_list": [
{"source": "您可以通过如下方式查看集群的内核版本信息:", "target": "You can use one of the following methods to query the engine version of a cluster:"},
{"source": "bla", "target": "bla"}
]
}
}
}
Embedding models support a dimension parameter to set vector output dimensions. Only some models allow changes to this value. See Embedding Models for full usage details.
| Model | Supported dimensions |
|---|---|
text-embedding-v4 |
2,048 / 1,536 / 1,024 (default) / 768 / 512 / 256 / 128 / 64 |
text-embedding-v3 |
1,024 (default) / 768 / 512 / 256 / 128 / 64 |
qwen3-vl-embedding |
2,560 (default) / 2,048 / 1,536 / 1,024 / 768 / 512 / 256 |
Model retry mechanism
Configure retry behavior for failed invocations at deployment time.
| Parameter | Description | Default | Valid range |
|---|---|---|---|
max_retries |
Maximum retry attempts. | 2 | [0, 100] |
initial_retry_delay |
Delay before the first retry, in seconds. | 0.5 | [0.5, 8] |
max_retry_delay |
Maximum delay between retries, in seconds. | 8 | [1, 60] |
timeout |
Timeout for a single request, in seconds. | 600 | [1, 1,200] |
Model list
Model Studio supports text generation, translation, embedding, and multimodal models. All models support cross-region access unless noted otherwise.
Text generation
All text generation models use the chat/completions task type and support temperature, top_p, and max_tokens. Input is text; output is text.
| Model |
|---|
qwen3-max |
qwen3-max-2026-01-23 |
qwen3-max-preview |
qwen-max |
qwen-max-latest |
qwen-plus |
qwen-plus-latest |
qwen-flash |
qwen-long |
qwen-long-latest |
qwq-plus |
qwq-plus-latest |
deepseek-v3.2 |
deepseek-v3.2-exp |
deepseek-v3.1 |
deepseek-r1 |
deepseek-r1-0528 |
deepseek-v3 |
deepseek-r1-distill-qwen-1.5b |
deepseek-r1-distill-qwen-7b |
deepseek-r1-distill-qwen-14b |
deepseek-r1-distill-qwen-32b |
kimi-k2-thinking |
Moonshot-Kimi-K2-Instruct |
glm-4.6 |
glm-4.7 |
glm-5 |
MiniMax-M2.1 |
MiniMax-M2.5 |
MiniMax/MiniMax-M2.1 |
MiniMax/MiniMax-M2.5 |
Vision models — accept image or video input and return text:
| Model | Notes |
|---|---|
qwen3-vl-235b-a22b-instruct |
|
qwen3-vl-235b-a22b-thinking |
|
qwen3-vl-32b-instruct |
|
qwen3-vl-32b-thinking |
|
qwen3-vl-8b-instruct |
|
qwen3-vl-8b-thinking |
|
qwen3-vl-plus |
|
qwen3-vl-flash |
|
qwen-vl-ocr |
Accepts image input only. |
qwen-vl-ocr-latest |
Accepts image input only. |
Omni model — accepts text, image, audio, or video input and returns text or audio:
| Model | Notes |
|---|---|
qwen3-omni-flash |
Also supports modalities and audio parameters. |
Translation
Translation models use the translation task type and the ai_translate AI Function.
| Model | Notes |
|---|---|
qwen-mt-plus |
Supports source_lang, terms, tm_list, and domains. |
qwen-mt-flash |
|
qwen-mt-turbo |
|
qwen-mt-lite |
Embedding
Embedding models use the embedding task type and the ai_embed AI Function. Output is float[].
Text embedding — text input only:
| Model | Vector dimensions |
|---|---|
text-embedding-v1 |
1,536 |
text-embedding-v2 |
1,536 |
text-embedding-v3 |
1,024 (default) / 768 / 512 / 256 / 128 / 64 |
text-embedding-v4 |
2,048 / 1,536 / 1,024 (default) / 768 / 512 / 256 / 128 / 64 |
Multimodal embedding — accepts text, image, or video input:
| Model | Vector dimensions | Cross-region support |
|---|---|---|
tongyi-embedding-vision-plus |
1,152 | Images: yes. Video: no. |
tongyi-embedding-vision-flash |
768 | Images: yes. Video: no. |
multimodal-embedding-v1 |
1,024 | Images: yes. Video: no. |
qwen3-vl-embedding |
2,560 (default) / 2,048 / 1,536 / 1,024 / 768 / 512 / 256 | Images: yes. Video: no. |
Video inputs for multimodal embedding models are only supported in the Beijing and Singapore regions.
Use a model
After deployment, call the model from Hologres using AI Functions. Data stays in the database during inference. See AI Functions for usage and Best practices: High-performance autonomous driving image analysis system for a real-world example.