Model Studio Models - Hologres - Alibaba Cloud Documentation Center

Deploy models from Alibaba Cloud Model Studio in Hologres using an API key, then call them with AI Functions. Your data stays in the database throughout inference and AI application development.

This integration is in beta.

Hologres integrates with Model Studio so you can call large language models (LLMs) directly from SQL. Authenticate once with an API key, deploy a supported model, and invoke it without moving data out of the database. Model Studio provides OpenAI-compatible APIs, visual application building tools, and out-of-the-box model services — supporting Qwen and leading third-party models including DeepSeek, Kimi, GLM, and MiniMax — with no infrastructure to manage.

Prerequisites

Before you begin, make sure that you have:

A Hologres instance running V4.0.18 or later, or V4.1.2 or later, in the Ulanqab or Beijing region
An Alibaba Cloud Model Studio API key — see Get an API key

Billing

Network fees: Model Studio is available in the Beijing and Singapore regions. Calling it from a Hologres instance may incur cross-region network fees. During the beta period, no network fees apply. Check the official website for updates on when billing begins.

Model invocation fees: Model Studio charges per invocation based on usage volume. See Model Invocation Billing and the Model Studio console for details.

Limitations

Supported Hologres versions: V4.0.18 or later, and V4.1.2 or later.
Supported regions: Ulanqab and Beijing only (currently).

Deploy a model

In the Hologres Management Console, go to Instances, find your target instance, and select AI Models at the top of the instance details page. On the Models page, select Alibaba Cloud Model Studio as the model provider and configure the following:

Parameter	Description
Model Type	The model to deploy. Must be from the model list below. Unlisted models are not supported.
API key	Your Model Studio API key, used for authentication. Get one from the Model Studio console.
Model parameter settings	Model-specific parameters configured after selecting a model type. See Parameter descriptions below.

Parameter descriptions

Parameters vary by model category. For full details, see the Model Studio console and API reference.

Text models support:

Parameter	Description	Valid range
`max_tokens`	Maximum tokens returned per request. The per-model maximum appears in the Model Studio documentation.	Model-specific
`temperature`	Sampling temperature controlling output diversity.	[0, 2.0)
`top_p`	Nucleus sampling probability threshold.	(0, 1.0]

temperature and top_p both control output diversity. Configure only one.

Qwen-Omni series models support additional parameters: modalities (text or audio output), audio.voice (voice tone), and audio.format (audio format, such as WAV).

Translation models support parameters to improve translation quality. See Translation Models for the full language list and usage.

Parameter	Description
`source_lang`	Source language code. See the language list in the Model Studio documentation.
`terms`	Translation terms as a JSON array of source-target pairs.
`tm_list`	Translation memory — source-target sentence pairs in JSON format used as examples.
`domains`	Domain context passed as plain text to improve domain-specific translation.

Example configuration:

{
  "extra_body": {
    "translation_options": {
      "source_lang": "zh",
      "domains": "The sentence is from the Alibaba Cloud IT domain.",
      "terms": [
        {"source": "生物传感器", "target": "biological sensor"},
        {"source": "身体健康状况", "target": "health status of the body"}
      ],
      "tm_list": [
        {"source": "您可以通过如下方式查看集群的内核版本信息:", "target": "You can use one of the following methods to query the engine version of a cluster:"},
        {"source": "bla", "target": "bla"}
      ]
    }
  }
}

Embedding models support a dimension parameter to set vector output dimensions. Only some models allow changes to this value. See Embedding Models for full usage details.

Model	Supported dimensions
`text-embedding-v4`	2,048 / 1,536 / 1,024 (default) / 768 / 512 / 256 / 128 / 64
`text-embedding-v3`	1,024 (default) / 768 / 512 / 256 / 128 / 64
`qwen3-vl-embedding`	2,560 (default) / 2,048 / 1,536 / 1,024 / 768 / 512 / 256

Model retry mechanism

Configure retry behavior for failed invocations at deployment time.

Parameter	Description	Default	Valid range
`max_retries`	Maximum retry attempts.	2	[0, 100]
`initial_retry_delay`	Delay before the first retry, in seconds.	0.5	[0.5, 8]
`max_retry_delay`	Maximum delay between retries, in seconds.	8	[1, 60]
`timeout`	Timeout for a single request, in seconds.	600	[1, 1,200]

Model list

Model Studio supports text generation, translation, embedding, and multimodal models. All models support cross-region access unless noted otherwise.

Text generation

All text generation models use the chat/completions task type and support temperature, top_p, and max_tokens. Input is text; output is text.

Model
`qwen3-max`
`qwen3-max-2026-01-23`
`qwen3-max-preview`
`qwen-max`
`qwen-max-latest`
`qwen-plus`
`qwen-plus-latest`
`qwen-flash`
`qwen-long`
`qwen-long-latest`
`qwq-plus`
`qwq-plus-latest`
`deepseek-v3.2`
`deepseek-v3.2-exp`
`deepseek-v3.1`
`deepseek-r1`
`deepseek-r1-0528`
`deepseek-v3`
`deepseek-r1-distill-qwen-1.5b`
`deepseek-r1-distill-qwen-7b`
`deepseek-r1-distill-qwen-14b`
`deepseek-r1-distill-qwen-32b`
`kimi-k2-thinking`
`Moonshot-Kimi-K2-Instruct`
`glm-4.6`
`glm-4.7`
`glm-5`
`MiniMax-M2.1`
`MiniMax-M2.5`
`MiniMax/MiniMax-M2.1`
`MiniMax/MiniMax-M2.5`

Vision models — accept image or video input and return text:

Model	Notes
`qwen3-vl-235b-a22b-instruct`
`qwen3-vl-235b-a22b-thinking`
`qwen3-vl-32b-instruct`
`qwen3-vl-32b-thinking`
`qwen3-vl-8b-instruct`
`qwen3-vl-8b-thinking`
`qwen3-vl-plus`
`qwen3-vl-flash`
`qwen-vl-ocr`	Accepts image input only.
`qwen-vl-ocr-latest`	Accepts image input only.

Omni model — accepts text, image, audio, or video input and returns text or audio:

Model	Notes
`qwen3-omni-flash`	Also supports `modalities` and `audio` parameters.

Translation

Translation models use the translation task type and the ai_translate AI Function.

Model	Notes
`qwen-mt-plus`	Supports `source_lang`, `terms`, `tm_list`, and `domains`.
`qwen-mt-flash`
`qwen-mt-turbo`
`qwen-mt-lite`

Embedding

Embedding models use the embedding task type and the ai_embed AI Function. Output is float[].

Text embedding — text input only:

Model	Vector dimensions
`text-embedding-v1`	1,536
`text-embedding-v2`	1,536
`text-embedding-v3`	1,024 (default) / 768 / 512 / 256 / 128 / 64
`text-embedding-v4`	2,048 / 1,536 / 1,024 (default) / 768 / 512 / 256 / 128 / 64

Multimodal embedding — accepts text, image, or video input:

Model	Vector dimensions	Cross-region support
`tongyi-embedding-vision-plus`	1,152	Images: yes. Video: no.
`tongyi-embedding-vision-flash`	768	Images: yes. Video: no.
`multimodal-embedding-v1`	1,024	Images: yes. Video: no.
`qwen3-vl-embedding`	2,560 (default) / 2,048 / 1,536 / 1,024 / 768 / 512 / 256	Images: yes. Video: no.

Video inputs for multimodal embedding models are only supported in the Beijing and Singapore regions.

Use a model

After deployment, call the model from Hologres using AI Functions. Data stays in the database during inference. See AI Functions for usage and Best practices: High-performance autonomous driving image analysis system for a real-world example.