All Products
Search
Document Center

Alibaba Cloud Model Studio:GLM

Last Updated:Mar 15, 2026

Call GLM models on Alibaba Cloud Model Studio.

Overview

GLM models are hybrid reasoning models by Zhipu AI for agents, offering thinking and non-thinking modes.

Model

Context window

Max input

Max CoT

Max response

(tokens)

glm-5

202,752

202,752

32,768

16,384

glm-4.7

169,984

glm-4.6

These models are deployed on Model Studio servers, not third-party services.

Getting started

Control the mode with the enable_thinking parameter. The following examples call glm-5 in thinking mode.

Before using the API, get an API key and set it as an environment variable. If you are using an SDK, install it first.

OpenAI compatible

Note

enable_thinking is not a standard OpenAI parameter. Pass it via extra_body in the Python SDK, or as a top-level parameter in the Node.js SDK.

Python

Sample code

from openai import OpenAI
import os

# Initialize the OpenAI client
client = OpenAI(
    # If no environment variable is configured, replace $DASHSCOPE_API_KEY with your API key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
    model="glm-5",
    messages=messages,
    # Set enable_thinking to true via extra_body to enable thinking mode
    extra_body={"enable_thinking": True},
    stream=True,
    stream_options={
        "include_usage": True
    },
)

reasoning_content = ""  # Complete thinking process
answer_content = ""     # Complete response
is_answering = False    # Indicates whether the model has started generating the response
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
    if not chunk.choices:
        print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
        print(chunk.usage)
        continue

    delta = chunk.choices[0].delta

    # Collect only the thinking content
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        if not is_answering:
            print(delta.reasoning_content, end="", flush=True)
        reasoning_content += delta.reasoning_content

    # Received content, model starts generating the response
    if hasattr(delta, "content") and delta.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
            is_answering = True
        print(delta.content, end="", flush=True)
        answer_content += delta.content

Sample response

====================Thinking Process====================

Let me carefully consider this seemingly simple but actually profound question from the user.

This is a self-introduction question that may contain multiple layers of meaning.

First, as a language model, I should honestly state my identity and nature. I am neither human nor do I possess true emotional consciousness; I am an AI assistant trained with deep learning technology. This is the fundamental truth.

Second, considering the user's potential needs, they might want to know:
1. What services I can provide
2. What my areas of expertise are
3. What my limitations are
4. How to interact with me more effectively

In my response, I should express a friendly and open attitude while remaining professional and accurate. I should explain my main areas of expertise, such as knowledge Q&A, writing assistance, and creative support, but also frankly point out my limitations, such as lacking genuine emotional experience.

Furthermore, to make the answer more complete, I should express a proactive attitude towards helping users solve problems. I can appropriately guide users to ask more specific questions, which can better showcase my capabilities.

Considering this is an open-ended opening, the answer should be concise yet contain enough information to give the user a clear understanding of my basic situation, while laying a good foundation for subsequent conversations.

Finally, the tone should remain humble and professional, neither overly technical nor too casual, making the user feel comfortable and natural.
====================Full Response====================

I am a GLM large language model trained by Zhipu AI, designed to provide information and help users solve problems. I am designed to understand and generate human language, and can answer questions, provide explanations, or participate in various topic discussions.

I do not store your personal data, and our conversations are anonymous. Is there anything I can help you understand or discuss?
====================Token Usage====================

CompletionUsage(completion_tokens=344, prompt_tokens=7, total_tokens=351, completion_tokens_details=None, prompt_tokens_details=None)

Node.js

Sample code

import OpenAI from "openai";
import process from 'process';

// Initialize the OpenAI client
const openai = new OpenAI({
    // If no environment variable is configured, replace $DASHSCOPE_API_KEY with your API key
    apiKey: process.env.DASHSCOPE_API_KEY, 
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = ''; // Complete thinking process
let answerContent = ''; // Complete response
let isAnswering = false; // Indicates whether the model has started generating the response

async function main() {
    try {
        const messages = [{ role: 'user', content: 'Who are you' }];
        
        const stream = await openai.chat.completions.create({
            model: 'glm-5',
            messages,
            // Note: In Node.js SDK, non-standard parameters like enable_thinking are passed as top-level properties, not in extra_body
            enable_thinking: true,
            stream: true,
            stream_options: {
                include_usage: true
            },
        });

        console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\n' + '='.repeat(20) + 'Token Usage' + '='.repeat(20) + '\n');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;
            
            // Collect only the thinking content
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // Received content, model starts generating the response
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Full Response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

Sample response

====================Thinking Process====================

Let me carefully consider the user's question, "Who are you?" This requires analysis and response from multiple perspectives.

First, this is a fundamental identity recognition question. As a GLM large language model, I need to accurately express my identity. I should clearly state that I am an AI assistant developed by Zhipu AI.

Second, consider the user's possible intent behind this question. They might be new to it and want to understand basic functions; they might want to confirm if specific help can be provided; or they might just want to test the response method. Therefore, I need to provide an open and friendly answer.

Also, consider the completeness of the answer. Besides introducing my identity, I should briefly explain my main functions, such as Q&A, creation, and analysis, so users understand how to use this assistant.

Finally, ensure a friendly and approachable tone, expressing a willingness to help. Phrases like "I am happy to serve you" can make users feel comfortable.

Based on these considerations, I can formulate a concise and clear answer that both addresses the user's question and guides further interaction.
====================Full Response====================

I am GLM, a large language model trained by Zhipu AI. I am trained on large-scale text data, capable of understanding and generating human language, helping users answer questions, provide information, and engage in conversational exchanges.

I will continue to learn and improve to provide better services. I am happy to answer your questions or provide assistance! Is there anything I can do for you?
====================Token Usage====================

{ prompt_tokens: 7, completion_tokens: 248, total_tokens: 255 }

HTTP

Sample code

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "glm-5",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you"
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "enable_thinking": true
}'

DashScope

Python

Sample code

import os
from dashscope import Generation

# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]

completion = Generation.call(
    # If no environment variable is configured, replace $DASHSCOPE_API_KEY with your API key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="glm-5",
    messages=messages,
    result_format="message",  # Set result format to message
    enable_thinking=True,     # Enable thinking mode
    stream=True,              # Enable streaming output
    incremental_output=True,  # Enable incremental output
)

reasoning_content = ""  # Complete thinking process
answer_content = ""     # Complete response
is_answering = False    # Indicates whether the model has started generating the response

print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
    message = chunk.output.choices[0].message
    # Collect only the thinking content
    if "reasoning_content" in message:
        if not is_answering:
            print(message.reasoning_content, end="", flush=True)
        reasoning_content += message.reasoning_content

    # Received content, model starts generating the response
    if message.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
            is_answering = True
        print(message.content, end="", flush=True)
        answer_content += message.content

print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
print(chunk.usage)

Sample response

====================Thinking Process====================

Let me carefully consider the user's question, "Who are you?" First, analyze the user's intent. This could be initial curiosity or a desire to understand my specific functions and capabilities.

From a professional perspective, I should clearly state my identity. As a GLM large language model, I need to explain my basic positioning and main functions. Avoid overly technical language; explain in an easy-to-understand way.

Also, consider practical issues users might care about, such as privacy protection and data security. These are key concerns for users of AI services.

Furthermore, to demonstrate professionalism and friendliness, proactively guide the conversation after the introduction. Ask if the user needs specific help. This helps users understand me better and sets the stage for future dialogue.

Finally, ensure the answer is concise and highlights key points, allowing users to quickly grasp my identity and purpose. Such an answer satisfies user curiosity and demonstrates professionalism and service orientation.
====================Full Response====================

I am a GLM large language model developed by Zhipu AI, designed to provide information and assistance to users through natural language processing technology. I am trained on large-scale text data, capable of understanding and generating human language, answering questions, providing knowledge support, and participating in conversations.

My design goal is to be a useful AI assistant while ensuring user privacy and data security. I do not store users' personal information and will continue to learn and improve to provide higher quality services.

Is there anything I can help you answer or any task I can assist with?
====================Token Usage====================

{"input_tokens": 8, "output_tokens": 269, "total_tokens": 277}

Java

Sample code

Important

This requires DashScope Java SDK 2.19.4 or later.

// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;

public class Main {
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;
    private static void handleGenerationResult(GenerationResult message) {
        String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();
        if (reasoning != null && !reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking Process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }
        if (content != null && !content.isEmpty()) {
            finalContent.append(content);
            if (!isFirstPrint) {
                System.out.println("\n====================Full Response====================");
                isFirstPrint = true;
            }
            System.out.print(content);
        }
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // If no environment variable is configured, replace with your API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("glm-5")
                .incrementalOutput(true)
                .resultFormat("message")
                .messages(Arrays.asList(userMsg))
                .build();
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
    }
    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.err.println("An exception occurred: " + e.getMessage());
        }
    }
}

Sample response

====================Thinking Process====================
Let me consider how to answer the user's question. First, this is a simple identity recognition question, requiring a clear and direct answer.

As a large language model, I should accurately state my basic identity information. This includes:
- Name: GLM
- Developer: Zhipu AI
- Main functions: Language understanding and generation

Considering the user's question might stem from initial contact, I need to introduce myself in an easy-to-understand way, avoiding overly technical terms. At the same time, I should briefly explain my main capabilities, which can help users better understand how to interact with me.

I should also express a friendly and open attitude, welcoming users to ask various questions, which can lay a good foundation for subsequent conversations. However, the introduction should be concise and clear, not overly detailed, to avoid overwhelming the user with information.

Finally, to encourage further interaction, I can proactively ask if the user needs specific help, which can better serve their actual needs.
====================Full Response====================
I am GLM, a large language model developed by Zhipu AI. I am trained on massive text data, capable of understanding and generating human language, answering questions, providing information, and engaging in conversations.

My design purpose is to help users solve problems, provide knowledge, and support various language tasks. I will continuously learn and update to provide more accurate and useful answers.

Is there anything I can help you answer or discuss?

HTTP

Sample code

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "glm-5",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true,
        "result_format": "message"
    }
}'

Streaming tool calling

glm-5, glm-4.7, and glm-4.6 support the tool_stream parameter (boolean, default: false). It takes effect only when stream is true. When enabled, tool_call arguments are returned incrementally as a stream instead of all at once.

The combined behavior of stream and tool_stream:

Stream

Tool stream

Tool call return method

true

true

Arguments are returned incrementally in multiple chunks.

true

false (default)

Arguments are returned completely in a single chunk.

false

true/false

tool_stream has no effect; arguments are returned all at once in the full response.

OpenAI compatible

Python

Sample code

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather information for the specified city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

messages = [{"role": "user", "content": "What's the weather like in Beijing?"}]

completion = client.chat.completions.create(
    model="glm-5",
    tools=tools,
    messages=messages,
    extra_body={
        "tool_stream": True,
    },
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if hasattr(delta, 'content') and delta.content:
            print(f"[content] {delta.content}")
        if hasattr(delta, 'tool_calls') and delta.tool_calls:
            for tc in delta.tool_calls:
                print(f"[tool_call] id={tc.id}, name={tc.function.name}, args={tc.function.arguments}")
        if chunk.choices[0].finish_reason:
            print(f"[finish_reason] {chunk.choices[0].finish_reason}")
    if not chunk.choices and chunk.usage:
        print(f"[usage] {chunk.usage}")

Node.js

Sample code

import OpenAI from "openai";
import process from 'process';

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

const tools = [
    {
        type: "function",
        function: {
            name: "get_weather",
            description: "Get weather information for the specified city",
            parameters: {
                type: "object",
                properties: {
                    city: { type: "string", description: "City name" }
                },
                required: ["city"]
            }
        }
    }
];

async function main() {
    try {
        const stream = await openai.chat.completions.create({
            model: 'glm-5',
            messages: [{ role: 'user', content: 'What's the weather like in Beijing?' }],
            tools: tools,
            tool_stream: true,
            stream: true,
            stream_options: {
                include_usage: true
            },
        });

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                if (chunk.usage) {
                    console.log(`[usage] ${JSON.stringify(chunk.usage)}`);
                }
                continue;
            }

            const delta = chunk.choices[0].delta;

            if (delta.content) {
                console.log(`[content] ${delta.content}`);
            }

            if (delta.tool_calls) {
                for (const tc of delta.tool_calls) {
                    console.log(`[tool_call] id=${tc.id}, name=${tc.function.name}, args=${tc.function.arguments}`);
                }
            }

            if (chunk.choices[0].finish_reason) {
                console.log(`[finish_reason] ${chunk.choices[0].finish_reason}`);
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

HTTP

Sample code

cURL

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "glm-5",
    "messages": [
        {
            "role": "user",
            "content": "What's the weather like in Beijing?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather information for the specified city",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string", "description": "City name"}
                    },
                    "required": ["city"]
                }
            }
        }
    ],
    "stream": true,
    "stream_options": {"include_usage": true},
    "tool_stream": true
}'

DashScope

Python

Sample code

import os
from dashscope import Generation

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather information for the specified city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

messages = [{"role": "user", "content": "What's the weather like in Beijing?"}]

completion = Generation.call(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="glm-5",
    messages=messages,
    tools=tools,
    result_format="message",
    stream=True,
    tool_stream=True,
    incremental_output=True,
)

for chunk in completion:
    msg = chunk.output.choices[0].message
    if msg.content:
        print(f"[content] {msg.content}")
    if "tool_calls" in msg and msg.tool_calls:
        for tc in msg.tool_calls:
            fn = tc.get("function", {})
            print(f"[tool_call] id={tc.get('id','')}, name={fn.get('name','')}, args={fn.get('arguments','')}")
    finish = chunk.output.choices[0].get("finish_reason", "")
    if finish and finish != "null":
        print(f"[finish_reason] {finish}")

HTTP

Sample code

cURL

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "glm-5",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": "What's the weather like in Beijing?"
            }
        ]
    },
    "parameters": {
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "description": "Get weather information for the specified city",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "city": {"type": "string", "description": "City name"}
                        },
                        "required": ["city"]
                    }
                }
            }
        ],
        "tool_stream": true,
        "incremental_output": true,
        "result_format": "message"
    }
}'

Features

Model

Multi-turn conversation

Function calling

Structured output

Web search

Partial mode

Context cache

glm-5

Supported

Supported

Supported

Non-thinking mode only

Not supported

Not supported

Supported

Implicit cache only.

glm-4.7

Supported

Supported

Supported

Non-thinking mode only

Not supported

Not supported

Not supported

glm-4.6

Supported

Supported

Supported

Non-thinking mode only

Not supported

Not supported

Not supported

Default parameter values

Model

enable_thinking

temperature

top_p

top_k

repetition_penalty

glm-5

true

1.0

0.95

20

1.0

glm-4.7

true

1.0

0.95

20

1.0

glm-4.6

true

1.0

0.95

20

1.0

Billing

Billing is based on input and output token counts. See GLM for details.

Thinking mode: thinking tokens are billed as output tokens.

FAQ

Q: How to use the model in Dify?

A: No. Model Studio GLM models are not supported in Dify. Use Qwen3 models via the TONGYI card instead. See Dify for details.

Error codes

If a request fails, see Error messages for troubleshooting.