Call GLM models on Alibaba Cloud Model Studio.
Overview
GLM models are hybrid reasoning models by Zhipu AI for agents, offering thinking and non-thinking modes.
|
Model |
Context window |
Max input |
Max CoT |
Max response |
|
(tokens) |
||||
|
glm-5 |
202,752 |
202,752 |
32,768 |
16,384 |
|
glm-4.7 |
169,984 |
|||
|
glm-4.6 |
||||
These models are deployed on Model Studio servers, not third-party services.
Getting started
Control the mode with the enable_thinking parameter. The following examples call glm-5 in thinking mode.
Before using the API, get an API key and set it as an environment variable. If you are using an SDK, install it first.
OpenAI compatible
enable_thinking is not a standard OpenAI parameter. Pass it via extra_body in the Python SDK, or as a top-level parameter in the Node.js SDK.
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If no environment variable is configured, replace $DASHSCOPE_API_KEY with your API key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
model="glm-5",
messages=messages,
# Set enable_thinking to true via extra_body to enable thinking mode
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Complete thinking process
answer_content = "" # Complete response
is_answering = False # Indicates whether the model has started generating the response
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the thinking content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# Received content, model starts generating the response
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Sample response
====================Thinking Process====================
Let me carefully consider this seemingly simple but actually profound question from the user.
This is a self-introduction question that may contain multiple layers of meaning.
First, as a language model, I should honestly state my identity and nature. I am neither human nor do I possess true emotional consciousness; I am an AI assistant trained with deep learning technology. This is the fundamental truth.
Second, considering the user's potential needs, they might want to know:
1. What services I can provide
2. What my areas of expertise are
3. What my limitations are
4. How to interact with me more effectively
In my response, I should express a friendly and open attitude while remaining professional and accurate. I should explain my main areas of expertise, such as knowledge Q&A, writing assistance, and creative support, but also frankly point out my limitations, such as lacking genuine emotional experience.
Furthermore, to make the answer more complete, I should express a proactive attitude towards helping users solve problems. I can appropriately guide users to ask more specific questions, which can better showcase my capabilities.
Considering this is an open-ended opening, the answer should be concise yet contain enough information to give the user a clear understanding of my basic situation, while laying a good foundation for subsequent conversations.
Finally, the tone should remain humble and professional, neither overly technical nor too casual, making the user feel comfortable and natural.
====================Full Response====================
I am a GLM large language model trained by Zhipu AI, designed to provide information and help users solve problems. I am designed to understand and generate human language, and can answer questions, provide explanations, or participate in various topic discussions.
I do not store your personal data, and our conversations are anonymous. Is there anything I can help you understand or discuss?
====================Token Usage====================
CompletionUsage(completion_tokens=344, prompt_tokens=7, total_tokens=351, completion_tokens_details=None, prompt_tokens_details=None)
Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
// If no environment variable is configured, replace $DASHSCOPE_API_KEY with your API key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = ''; // Complete thinking process
let answerContent = ''; // Complete response
let isAnswering = false; // Indicates whether the model has started generating the response
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
model: 'glm-5',
messages,
// Note: In Node.js SDK, non-standard parameters like enable_thinking are passed as top-level properties, not in extra_body
enable_thinking: true,
stream: true,
stream_options: {
include_usage: true
},
});
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\n' + '='.repeat(20) + 'Token Usage' + '='.repeat(20) + '\n');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the thinking content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// Received content, model starts generating the response
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Sample response
====================Thinking Process====================
Let me carefully consider the user's question, "Who are you?" This requires analysis and response from multiple perspectives.
First, this is a fundamental identity recognition question. As a GLM large language model, I need to accurately express my identity. I should clearly state that I am an AI assistant developed by Zhipu AI.
Second, consider the user's possible intent behind this question. They might be new to it and want to understand basic functions; they might want to confirm if specific help can be provided; or they might just want to test the response method. Therefore, I need to provide an open and friendly answer.
Also, consider the completeness of the answer. Besides introducing my identity, I should briefly explain my main functions, such as Q&A, creation, and analysis, so users understand how to use this assistant.
Finally, ensure a friendly and approachable tone, expressing a willingness to help. Phrases like "I am happy to serve you" can make users feel comfortable.
Based on these considerations, I can formulate a concise and clear answer that both addresses the user's question and guides further interaction.
====================Full Response====================
I am GLM, a large language model trained by Zhipu AI. I am trained on large-scale text data, capable of understanding and generating human language, helping users answer questions, provide information, and engage in conversational exchanges.
I will continue to learn and improve to provide better services. I am happy to answer your questions or provide assistance! Is there anything I can do for you?
====================Token Usage====================
{ prompt_tokens: 7, completion_tokens: 248, total_tokens: 255 }
HTTP
Sample code
curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'
DashScope
Python
Sample code
import os
from dashscope import Generation
# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If no environment variable is configured, replace $DASHSCOPE_API_KEY with your API key
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="glm-5",
messages=messages,
result_format="message", # Set result format to message
enable_thinking=True, # Enable thinking mode
stream=True, # Enable streaming output
incremental_output=True, # Enable incremental output
)
reasoning_content = "" # Complete thinking process
answer_content = "" # Complete response
is_answering = False # Indicates whether the model has started generating the response
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
message = chunk.output.choices[0].message
# Collect only the thinking content
if "reasoning_content" in message:
if not is_answering:
print(message.reasoning_content, end="", flush=True)
reasoning_content += message.reasoning_content
# Received content, model starts generating the response
if message.content:
if not is_answering:
print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
is_answering = True
print(message.content, end="", flush=True)
answer_content += message.content
print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
print(chunk.usage)
Sample response
====================Thinking Process====================
Let me carefully consider the user's question, "Who are you?" First, analyze the user's intent. This could be initial curiosity or a desire to understand my specific functions and capabilities.
From a professional perspective, I should clearly state my identity. As a GLM large language model, I need to explain my basic positioning and main functions. Avoid overly technical language; explain in an easy-to-understand way.
Also, consider practical issues users might care about, such as privacy protection and data security. These are key concerns for users of AI services.
Furthermore, to demonstrate professionalism and friendliness, proactively guide the conversation after the introduction. Ask if the user needs specific help. This helps users understand me better and sets the stage for future dialogue.
Finally, ensure the answer is concise and highlights key points, allowing users to quickly grasp my identity and purpose. Such an answer satisfies user curiosity and demonstrates professionalism and service orientation.
====================Full Response====================
I am a GLM large language model developed by Zhipu AI, designed to provide information and assistance to users through natural language processing technology. I am trained on large-scale text data, capable of understanding and generating human language, answering questions, providing knowledge support, and participating in conversations.
My design goal is to be a useful AI assistant while ensuring user privacy and data security. I do not store users' personal information and will continue to learn and improve to provide higher quality services.
Is there anything I can help you answer or any task I can assist with?
====================Token Usage====================
{"input_tokens": 8, "output_tokens": 269, "total_tokens": 277}
Java
Sample code
This requires DashScope Java SDK 2.19.4 or later.
// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
public class Main {
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (reasoning != null && !reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (content != null && !content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If no environment variable is configured, replace with your API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("glm-5")
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.err.println("An exception occurred: " + e.getMessage());
}
}
}
Sample response
====================Thinking Process====================
Let me consider how to answer the user's question. First, this is a simple identity recognition question, requiring a clear and direct answer.
As a large language model, I should accurately state my basic identity information. This includes:
- Name: GLM
- Developer: Zhipu AI
- Main functions: Language understanding and generation
Considering the user's question might stem from initial contact, I need to introduce myself in an easy-to-understand way, avoiding overly technical terms. At the same time, I should briefly explain my main capabilities, which can help users better understand how to interact with me.
I should also express a friendly and open attitude, welcoming users to ask various questions, which can lay a good foundation for subsequent conversations. However, the introduction should be concise and clear, not overly detailed, to avoid overwhelming the user with information.
Finally, to encourage further interaction, I can proactively ask if the user needs specific help, which can better serve their actual needs.
====================Full Response====================
I am GLM, a large language model developed by Zhipu AI. I am trained on massive text data, capable of understanding and generating human language, answering questions, providing information, and engaging in conversations.
My design purpose is to help users solve problems, provide knowledge, and support various language tasks. I will continuously learn and update to provide more accurate and useful answers.
Is there anything I can help you answer or discuss?
HTTP
Sample code
curl
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "glm-5",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'
Streaming tool calling
glm-5, glm-4.7, and glm-4.6 support the tool_stream parameter (boolean, default: false). It takes effect only when stream is true. When enabled, tool_call arguments are returned incrementally as a stream instead of all at once.
The combined behavior of stream and tool_stream:
|
Stream |
Tool stream |
Tool call return method |
|
true |
true |
Arguments are returned incrementally in multiple chunks. |
|
true |
false (default) |
Arguments are returned completely in a single chunk. |
|
false |
true/false |
tool_stream has no effect; arguments are returned all at once in the full response. |
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for the specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
messages = [{"role": "user", "content": "What's the weather like in Beijing?"}]
completion = client.chat.completions.create(
model="glm-5",
tools=tools,
messages=messages,
extra_body={
"tool_stream": True,
},
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
delta = chunk.choices[0].delta
if hasattr(delta, 'content') and delta.content:
print(f"[content] {delta.content}")
if hasattr(delta, 'tool_calls') and delta.tool_calls:
for tc in delta.tool_calls:
print(f"[tool_call] id={tc.id}, name={tc.function.name}, args={tc.function.arguments}")
if chunk.choices[0].finish_reason:
print(f"[finish_reason] {chunk.choices[0].finish_reason}")
if not chunk.choices and chunk.usage:
print(f"[usage] {chunk.usage}")
Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get weather information for the specified city",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "City name" }
},
required: ["city"]
}
}
}
];
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'glm-5',
messages: [{ role: 'user', content: 'What's the weather like in Beijing?' }],
tools: tools,
tool_stream: true,
stream: true,
stream_options: {
include_usage: true
},
});
for await (const chunk of stream) {
if (!chunk.choices?.length) {
if (chunk.usage) {
console.log(`[usage] ${JSON.stringify(chunk.usage)}`);
}
continue;
}
const delta = chunk.choices[0].delta;
if (delta.content) {
console.log(`[content] ${delta.content}`);
}
if (delta.tool_calls) {
for (const tc of delta.tool_calls) {
console.log(`[tool_call] id=${tc.id}, name=${tc.function.name}, args=${tc.function.arguments}`);
}
}
if (chunk.choices[0].finish_reason) {
console.log(`[finish_reason] ${chunk.choices[0].finish_reason}`);
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
HTTP
Sample code
cURL
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5",
"messages": [
{
"role": "user",
"content": "What's the weather like in Beijing?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for the specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
],
"stream": true,
"stream_options": {"include_usage": true},
"tool_stream": true
}'
DashScope
Python
Sample code
import os
from dashscope import Generation
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for the specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
messages = [{"role": "user", "content": "What's the weather like in Beijing?"}]
completion = Generation.call(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="glm-5",
messages=messages,
tools=tools,
result_format="message",
stream=True,
tool_stream=True,
incremental_output=True,
)
for chunk in completion:
msg = chunk.output.choices[0].message
if msg.content:
print(f"[content] {msg.content}")
if "tool_calls" in msg and msg.tool_calls:
for tc in msg.tool_calls:
fn = tc.get("function", {})
print(f"[tool_call] id={tc.get('id','')}, name={fn.get('name','')}, args={fn.get('arguments','')}")
finish = chunk.output.choices[0].get("finish_reason", "")
if finish and finish != "null":
print(f"[finish_reason] {finish}")
HTTP
Sample code
cURL
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "glm-5",
"input": {
"messages": [
{
"role": "user",
"content": "What's the weather like in Beijing?"
}
]
},
"parameters": {
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for the specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
],
"tool_stream": true,
"incremental_output": true,
"result_format": "message"
}
}'
Features
|
Model |
||||||
|
glm-5 |
|
|
Non-thinking mode only |
|
|
Implicit cache only. |
|
glm-4.7 |
|
|
Non-thinking mode only |
|
|
|
|
glm-4.6 |
|
|
Non-thinking mode only |
|
|
|
Default parameter values
|
Model |
enable_thinking |
temperature |
top_p |
top_k |
repetition_penalty |
|
glm-5 |
true |
1.0 |
0.95 |
20 |
1.0 |
|
glm-4.7 |
true |
1.0 |
0.95 |
20 |
1.0 |
|
glm-4.6 |
true |
1.0 |
0.95 |
20 |
1.0 |
Billing
Billing is based on input and output token counts. See GLM for details.
Thinking mode: thinking tokens are billed as output tokens.
FAQ
Q: How to use the model in Dify?
A: No. Model Studio GLM models are not supported in Dify. Use Qwen3 models via the TONGYI card instead. See Dify for details.
Error codes
If a request fails, see Error messages for troubleshooting.