All Products
Search
Document Center

Alibaba Cloud Model Studio:GLM

Last Updated:Feb 04, 2026

This topic describes how to call GLM models on Alibaba Cloud Model Studio.

Model availability

The GLM models are hybrid reasoning models designed by Zhipu AI for agents. They provide two modes: thinking mode and non-thinking mode.

Model

Context window

Max input

Max chain-of-thought

Max response

(Tokens)

glm-4.7

202,752

169,984

32,768

16,384

glm-4.6

These models are not third-party services. They are deployed on Model Studio servers.

Getting started

glm-4.7 is the latest model in the series. Control the thinking and non-thinking modes using the enable_thinking parameter. Run the following code to quickly call glm-4.7 in thinking mode.

Before you start, create an API key and set it as an environment variable. If you call the model using a SDK, install the OpenAI or DashScope SDK.

OpenAI compatible

Note

enable_thinking is not a standard OpenAI parameter. In the OpenAI Python SDK, it is passed using extra_body. In the Node.js SDK, it is passed as a top-level parameter.

Python

Sample code

from openai import OpenAI
import os

# Initialize the OpenAI client
client = OpenAI(
    # If the environment variable is not configured, replace the following with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
    model="glm-4.7",
    messages=messages,
    # Set enable_thinking in extra_body to enable thinking mode
    extra_body={"enable_thinking": True},
    stream=True,
    stream_options={
        "include_usage": True
    },
)

reasoning_content = ""  # Complete thought process
answer_content = ""  # Complete response
is_answering = False  # Indicates whether the response phase has started
print("\n" + "=" * 20 + "Thought Process" + "=" * 20 + "\n")

for chunk in completion:
    if not chunk.choices:
        print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
        print(chunk.usage)
        continue

    delta = chunk.choices[0].delta

    # Collect only the thinking content
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        if not is_answering:
            print(delta.reasoning_content, end="", flush=True)
        reasoning_content += delta.reasoning_content

    # After receiving content, start generating the response
    if hasattr(delta, "content") and delta.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
            is_answering = True
        print(delta.content, end="", flush=True)
        answer_content += delta.content

Response

====================Thought Process====================

Let me think carefully about this seemingly simple but profound question from the user.

From a linguistic perspective, the user is using Chinese, which means I should respond in Chinese. This is a basic self-introduction question, but it may have multiple layers of meaning.

First, I need to be clear and honest about my identity and nature as a language model. I am not a human, nor do I possess true emotional consciousness. I am an AI assistant trained using deep learning technology. This is the fundamental fact.

Second, considering the user's potential scenarios, they might want to know:
1. What services I can provide
2. What my areas of expertise are
3. What my limitations are
4. How to interact with me more effectively

In my response, I should express a friendly and open attitude while remaining professional and accurate. I need to state my main areas of expertise, such as knowledge Q&A, writing assistance, and creative support, while also honestly pointing out my limitations, such as the lack of real emotional experience.

Additionally, to make the response more complete, I should express a positive attitude and willingness to help users solve problems. I can guide the user to ask more specific questions to better demonstrate my capabilities.

Considering this is an open-ended opening, the answer should be concise and clear, yet contain enough information to give the user a clear understanding of my basic situation and lay a good foundation for subsequent conversations.

Finally, the tone should remain humble and professional, neither too technical nor too casual, to make the user feel comfortable and natural.
====================Complete Response====================

I am a GLM large language model trained by Zhipu AI, designed to provide users with information and help solve problems. I am designed to understand and generate human language, and I can answer questions, provide explanations, or participate in discussions on various topics.

I do not store your personal data, and our conversation is anonymous. Is there any topic I can help you understand or discuss?
====================Token Usage====================

CompletionUsage(completion_tokens=344, prompt_tokens=7, total_tokens=351, completion_tokens_details=None, prompt_tokens_details=None)

Node.js

Sample code

import OpenAI from "openai";
import process from 'process';

// Initialize the OpenAI client
const openai = new OpenAI({
    // If the environment variable is not configured, replace the following with your Model Studio API key: apiKey: "sk-xxx"
    apiKey: process.env.DASHSCOPE_API_KEY, 
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = ''; // Complete thought process
let answerContent = ''; // Complete response
let isAnswering = false; // Indicates whether the response phase has started

async function main() {
    try {
        const messages = [{ role: 'user', content: 'Who are you' }];
        
        const stream = await openai.chat.completions.create({
            model: 'glm-4.7',
            messages,
            // Note: In the Node.js SDK, non-standard parameters like enable_thinking are passed as top-level properties, not in extra_body
            enable_thinking: true,
            stream: true,
            stream_options: {
                include_usage: true
            },
        });

        console.log('\n' + '='.repeat(20) + 'Thought Process' + '='.repeat(20) + '\n');

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\n' + '='.repeat(20) + 'Token Usage' + '='.repeat(20) + '\n');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;
            
            // Collect only the thinking content
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // After receiving content, start generating the response
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

Response

====================Thought Process====================

Let me think carefully about the user's question, "Who are you." This requires analysis and a response from multiple perspectives.

First, this is a basic identity question. As a GLM large language model, I need to accurately state my identity. I should clearly state that I am an AI assistant developed by Zhipu AI.

Second, I need to consider the user's possible intent. They might be a first-time user wanting to understand basic features, or they might want to confirm if I can provide specific help, or they might just be testing my response style. Therefore, I need to provide an open and friendly answer.

I also need to consider the completeness of the answer. In addition to introducing myself, I should also briefly explain my main features, such as Q&A, creation, and analysis, to let the user know how to use this assistant.

Finally, I must ensure a friendly and approachable tone, expressing a willingness to help. I can use expressions like "I'm happy to help" to make the user feel the warmth of the conversation.

Based on these thoughts, I can formulate a concise and clear answer that both answers the user's question and guides subsequent conversation.
====================Complete Response====================

I am GLM, a large language model trained by Zhipu AI. I am trained on massive amounts of text data, enabling me to understand and generate human language to help users answer questions, provide information, and engage in conversations.

I continuously learn and improve to provide better service. I am happy to answer your questions or provide assistance! What can I do for you?
====================Token Usage====================

{ prompt_tokens: 7, completion_tokens: 248, total_tokens: 255 }

HTTP

Sample code

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "glm-4.7",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you"
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "enable_thinking": true
}'

DashScope

Python

Sample code

import os
from dashscope import Generation

# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]

completion = Generation.call(
    # If the environment variable is not configured, replace the following with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="glm-4.7",
    messages=messages,
    result_format="message",  # Set the result format to message
    enable_thinking=True,     # Enable thinking mode
    stream=True,              # Enable streaming output
    incremental_output=True,  # Enable incremental output
)

reasoning_content = ""  # Complete thought process
answer_content = ""     # Complete response
is_answering = False    # Indicates whether the response phase has started

print("\n" + "=" * 20 + "Thought Process" + "=" * 20 + "\n")

for chunk in completion:
    message = chunk.output.choices[0].message
    # Collect only the thinking content
    if "reasoning_content" in message:
        if not is_answering:
            print(message.reasoning_content, end="", flush=True)
        reasoning_content += message.reasoning_content

    # After receiving content, start generating the response
    if message.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
            is_answering = True
        print(message.content, end="", flush=True)
        answer_content += message.content

print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
print(chunk.usage)

Response

====================Thought Process====================

Let me think carefully about the user's question, "Who are you?" First, I need to analyze the user's intent. This could be curiosity from a first-time user, or they might want to understand my specific features and capabilities.

From a professional perspective, I should clearly state my identity. As a GLM large language model, I need to explain my basic positioning and main features. I should avoid overly technical terms and explain in a way that is easy to understand.

At the same time, I should also consider practical issues that users might care about, such as privacy protection and data security. These are points of great concern for users when using AI services.

In addition, to show professionalism and friendliness, I can proactively guide the conversation after the introduction by asking if the user needs specific help. This helps the user understand me better and also paves the way for subsequent conversation.

Finally, I must ensure the answer is concise and clear, with key points highlighted, so the user can quickly understand my identity and purpose. Such a response can satisfy the user's curiosity while demonstrating professionalism and a service-oriented attitude.
====================Complete Response====================

I am a GLM large language model developed by Zhipu AI, designed to provide information and assistance to users through natural language processing technology. Trained on massive amounts of text data, I can understand and generate human language, answer questions, provide knowledge support, and participate in conversations.

My design goal is to be a useful AI assistant while ensuring user privacy and data security. I do not store users' personal information, and I will continuously learn and improve to provide higher quality service.

Are there any questions I can answer or tasks I can assist you with?
====================Token Usage====================

{"input_tokens": 8, "output_tokens": 269, "total_tokens": 277}

Java

Sample code

Important

The DashScope Java SDK version must be 2.19.4 or later.

// The DashScope SDK version must be 2.19.4 or later.
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;

public class Main {
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;
    private static void handleGenerationResult(GenerationResult message) {
        String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();
        if (reasoning != null && !reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thought Process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }
        if (content != null && !content.isEmpty()) {
            finalContent.append(content);
            if (!isFirstPrint) {
                System.out.println("\n====================Complete Response====================");
                isFirstPrint = true;
            }
            System.out.print(content);
        }
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("glm-4.7")
                .incrementalOutput(true)
                .resultFormat("message")
                .messages(Arrays.asList(userMsg))
                .build();
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
    }
    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.err.println("An exception occurred: " + e.getMessage());
        }
    }
}

Response

====================Thought Process====================
Let me think about how to answer the user's question. First, this is a simple identity question that needs a clear and direct answer.

As a large language model, I should accurately state my basic identity information. This includes:
- Name: GLM
- Developer: Zhipu AI
- Main features: Language understanding and generation

Considering the user's question may stem from a first-time interaction, I need to introduce myself in an easy-to-understand way, avoiding overly technical terms. At the same time, I should also briefly explain my main capabilities to help the user better understand how to interact with me.

I should also express a friendly and open attitude, welcoming users to ask various questions, which lays a good foundation for subsequent conversations. However, the introduction should be concise and clear, not overly detailed, to avoid overwhelming the user with information.

Finally, to encourage further communication, I can proactively ask if the user needs specific help, which allows me to better serve their actual needs.
====================Complete Response====================
I am GLM, a large language model developed by Zhipu AI. I am trained on vast amounts of text data, enabling me to understand and generate human language, answer questions, provide information, and engage in conversations.

My purpose is to help users solve problems, provide knowledge, and support various language tasks. I continuously learn and update to provide more accurate and useful answers.

Are there any questions I can answer or discuss for you?

HTTP

Sample code

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "glm-4.7",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true,
        "result_format": "message"
    }
}'

Features

Model

Multi-turn conversation

Function calling

Structured output

Web search

Partial mode

Context cache

glm-4.7

Supported

Supported

Supported

Non-thinking mode only

Not supported

Not supported

Not supported

glm-4.6

Supported

Supported

Supported

Non-thinking mode only

Not supported

Not supported

Not supported

Default parameter values

Model

enable_thinking

temperature

top_p

top_k

repetition_penalty

glm-4.7

true

1.0

0.95

20

1.0

glm-4.6

true

1.0

0.95

20

1.0

Billing

Billing is based on the number of input and output tokens for the model, see GLM.

In thinking mode, the chain-of-thought is billed as output tokens.

FAQ

Q: How to use GLM in Dify?

A: You cannot currently integrate the GLM models from Model Studio with Dify. We recommend Qwen3 models through the TONGYI card instead, see Dify.

Error codes

If an error occurs, see Error messages for troubleshooting.