All Products
Search
Document Center

Hologres:Model Studio Models

Last Updated:Mar 27, 2026

Deploy models from Alibaba Cloud Model Studio in Hologres using an API key, then call them with AI Functions. Your data stays in the database throughout inference and AI application development.

This integration is in beta.

Hologres integrates with Model Studio so you can call large language models (LLMs) directly from SQL. Authenticate once with an API key, deploy a supported model, and invoke it without moving data out of the database. Model Studio provides OpenAI-compatible APIs, visual application building tools, and out-of-the-box model services — supporting Qwen and leading third-party models including DeepSeek, Kimi, GLM, and MiniMax — with no infrastructure to manage.

Prerequisites

Before you begin, make sure that you have:

  • A Hologres instance running V4.0.18 or later, or V4.1.2 or later, in the Ulanqab or Beijing region

  • An Alibaba Cloud Model Studio API key — see Get an API key

Billing

Network fees: Model Studio is available in the Beijing and Singapore regions. Calling it from a Hologres instance may incur cross-region network fees. During the beta period, no network fees apply. Check the official website for updates on when billing begins.

Model invocation fees: Model Studio charges per invocation based on usage volume. See Model Invocation Billing and the Model Studio console for details.

Limitations

  • Supported Hologres versions: V4.0.18 or later, and V4.1.2 or later.

  • Supported regions: Ulanqab and Beijing only (currently).

Deploy a model

In the Hologres Management Console, go to Instances, find your target instance, and select AI Models at the top of the instance details page. On the Models page, select Alibaba Cloud Model Studio as the model provider and configure the following:

Parameter Description
Model Type The model to deploy. Must be from the model list below. Unlisted models are not supported.
API key Your Model Studio API key, used for authentication. Get one from the Model Studio console.
Model parameter settings Model-specific parameters configured after selecting a model type. See Parameter descriptions below.

Parameter descriptions

Parameters vary by model category. For full details, see the Model Studio console and API reference.

Text models support:

Parameter Description Valid range
max_tokens Maximum tokens returned per request. The per-model maximum appears in the Model Studio documentation. Model-specific
temperature Sampling temperature controlling output diversity. [0, 2.0)
top_p Nucleus sampling probability threshold. (0, 1.0]
temperature and top_p both control output diversity. Configure only one.

Qwen-Omni series models support additional parameters: modalities (text or audio output), audio.voice (voice tone), and audio.format (audio format, such as WAV).

Translation models support parameters to improve translation quality. See Translation Models for the full language list and usage.

Parameter Description
source_lang Source language code. See the language list in the Model Studio documentation.
terms Translation terms as a JSON array of source-target pairs.
tm_list Translation memory — source-target sentence pairs in JSON format used as examples.
domains Domain context passed as plain text to improve domain-specific translation.

Example configuration:

{
  "extra_body": {
    "translation_options": {
      "source_lang": "zh",
      "domains": "The sentence is from the Alibaba Cloud IT domain.",
      "terms": [
        {"source": "生物传感器", "target": "biological sensor"},
        {"source": "身体健康状况", "target": "health status of the body"}
      ],
      "tm_list": [
        {"source": "您可以通过如下方式查看集群的内核版本信息:", "target": "You can use one of the following methods to query the engine version of a cluster:"},
        {"source": "bla", "target": "bla"}
      ]
    }
  }
}

Embedding models support a dimension parameter to set vector output dimensions. Only some models allow changes to this value. See Embedding Models for full usage details.

Model Supported dimensions
text-embedding-v4 2,048 / 1,536 / 1,024 (default) / 768 / 512 / 256 / 128 / 64
text-embedding-v3 1,024 (default) / 768 / 512 / 256 / 128 / 64
qwen3-vl-embedding 2,560 (default) / 2,048 / 1,536 / 1,024 / 768 / 512 / 256

Model retry mechanism

Configure retry behavior for failed invocations at deployment time.

Parameter Description Default Valid range
max_retries Maximum retry attempts. 2 [0, 100]
initial_retry_delay Delay before the first retry, in seconds. 0.5 [0.5, 8]
max_retry_delay Maximum delay between retries, in seconds. 8 [1, 60]
timeout Timeout for a single request, in seconds. 600 [1, 1,200]

Model list

Model Studio supports text generation, translation, embedding, and multimodal models. All models support cross-region access unless noted otherwise.

Text generation

All text generation models use the chat/completions task type and support temperature, top_p, and max_tokens. Input is text; output is text.

Model
qwen3-max
qwen3-max-2026-01-23
qwen3-max-preview
qwen-max
qwen-max-latest
qwen-plus
qwen-plus-latest
qwen-flash
qwen-long
qwen-long-latest
qwq-plus
qwq-plus-latest
deepseek-v3.2
deepseek-v3.2-exp
deepseek-v3.1
deepseek-r1
deepseek-r1-0528
deepseek-v3
deepseek-r1-distill-qwen-1.5b
deepseek-r1-distill-qwen-7b
deepseek-r1-distill-qwen-14b
deepseek-r1-distill-qwen-32b
kimi-k2-thinking
Moonshot-Kimi-K2-Instruct
glm-4.6
glm-4.7
glm-5
MiniMax-M2.1
MiniMax-M2.5
MiniMax/MiniMax-M2.1
MiniMax/MiniMax-M2.5

Vision models — accept image or video input and return text:

Model Notes
qwen3-vl-235b-a22b-instruct
qwen3-vl-235b-a22b-thinking
qwen3-vl-32b-instruct
qwen3-vl-32b-thinking
qwen3-vl-8b-instruct
qwen3-vl-8b-thinking
qwen3-vl-plus
qwen3-vl-flash
qwen-vl-ocr Accepts image input only.
qwen-vl-ocr-latest Accepts image input only.

Omni model — accepts text, image, audio, or video input and returns text or audio:

Model Notes
qwen3-omni-flash Also supports modalities and audio parameters.

Translation

Translation models use the translation task type and the ai_translate AI Function.

Model Notes
qwen-mt-plus Supports source_lang, terms, tm_list, and domains.
qwen-mt-flash
qwen-mt-turbo
qwen-mt-lite

Embedding

Embedding models use the embedding task type and the ai_embed AI Function. Output is float[].

Text embedding — text input only:

Model Vector dimensions
text-embedding-v1 1,536
text-embedding-v2 1,536
text-embedding-v3 1,024 (default) / 768 / 512 / 256 / 128 / 64
text-embedding-v4 2,048 / 1,536 / 1,024 (default) / 768 / 512 / 256 / 128 / 64

Multimodal embedding — accepts text, image, or video input:

Model Vector dimensions Cross-region support
tongyi-embedding-vision-plus 1,152 Images: yes. Video: no.
tongyi-embedding-vision-flash 768 Images: yes. Video: no.
multimodal-embedding-v1 1,024 Images: yes. Video: no.
qwen3-vl-embedding 2,560 (default) / 2,048 / 1,536 / 1,024 / 768 / 512 / 256 Images: yes. Video: no.
Video inputs for multimodal embedding models are only supported in the Beijing and Singapore regions.

Use a model

After deployment, call the model from Hologres using AI Functions. Data stays in the database during inference. See AI Functions for usage and Best practices: High-performance autonomous driving image analysis system for a real-world example.