Topik ini menjelaskan cara memanggil model seri GLM menggunakan API pada platform Alibaba Cloud Model Studio.
Daftar model
Model seri GLM adalah model penalaran hibrida yang dikembangkan oleh Zhipu AI untuk aplikasi berbasis agen. Model ini mendukung mode berpikir dan mode non-berpikir.
|
Nama model |
Panjang konteks |
Maksimum input |
Panjang maksimum chain-of-thought |
Panjang maksimum respons |
|
(Jumlah token) |
||||
|
glm-5 |
202.752 |
202.752 |
32.768 |
16.384 |
|
glm-4.7 |
169.984 |
|||
|
glm-4.6 |
||||
Model-model ini bukan layanan pihak ketiga. Semuanya di-deploy pada server Alibaba Cloud Model Studio.
Mulai
glm-5 adalah model terbaru dalam seri GLM. Model ini mendukung peralihan antara mode berpikir dan mode non-berpikir melalui parameter enable_thinking. Jalankan kode berikut untuk memanggil model glm-5 dalam mode berpikir secara cepat.
Sebelum memulai, dapatkan Kunci API dan konfigurasikan sebagai Variabel lingkungan. Jika Anda menggunakan kit pengembangan perangkat lunak (SDK), instal SDK OpenAI atau DashScope.
Kompatibel dengan OpenAI
enable_thinking bukan parameter standar OpenAI. Pada SDK Python OpenAI, teruskan melalui extra_body. Pada SDK Node.js, teruskan sebagai parameter tingkat atas.
Python
Kode contoh
from openai import OpenAI
import os
# Inisialisasi klien OpenAI
client = OpenAI(
# Jika Anda belum mengonfigurasi variabel lingkungan, ganti nilai berikut dengan Kunci API Model Studio Anda: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
model="glm-5",
messages=messages,
# Atur enable_thinking dalam extra_body untuk mengaktifkan mode berpikir
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Proses pemikiran lengkap
answer_content = "" # Respons lengkap
is_answering = False # Menunjukkan apakah fase respons telah dimulai
print("\n" + "=" * 20 + "Thought Process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Kumpulkan hanya konten pemikiran
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# Setelah menerima konten, mulai menghasilkan respons
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Respons
====================Thought Process====================
Let me think carefully about this seemingly simple but profound question from the user.
Based on the language used, the user is speaking Chinese, so I should respond in Chinese. This is a basic self-introduction question, but it might have multiple layers of meaning.
First, as a language model, I must honestly state my identity and nature. I am not a human, nor do I have real emotional consciousness. I am an AI assistant trained with deep learning technology. This is the fundamental fact.
Second, considering the user's potential scenarios, they might want to know:
1. What services can I provide?
2. What are my areas of expertise?
3. What are my limitations?
4. How can we interact better?
In my response, I should be friendly and open, yet professional and accurate. I should state my main areas of expertise, such as knowledge Q&A, writing assistance, and creative support, while also frankly pointing out my limitations, such as the lack of real emotional experience.
Additionally, to make the response more complete, I should express a positive attitude and willingness to help the user solve problems. I can guide the user to ask more specific questions to better showcase my abilities.
Given that this is an open-ended opening, the response should be concise and clear, yet contain enough information for the user to have a clear understanding of my basic situation and to lay a good foundation for subsequent conversations.
Finally, the tone should remain humble and professional, neither too technical nor too casual, to make the user feel comfortable and natural.
====================Complete Response====================
I am a GLM large language model trained by Zhipu AI, designed to provide users with information and help solve problems. I am designed to understand and generate human language, and I can answer questions, provide explanations, or discuss various topics.
I do not store your personal data. Our conversations are anonymous. Is there any topic I can help you understand or explore?
====================Token Usage====================
CompletionUsage(completion_tokens=344, prompt_tokens=7, total_tokens=351, completion_tokens_details=None, prompt_tokens_details=None)
Node.js
Kode contoh
import OpenAI from "openai";
import process from 'process';
// Inisialisasi klien OpenAI
const openai = new OpenAI({
// Jika Anda belum mengonfigurasi variabel lingkungan, ganti nilai berikut dengan Kunci API Model Studio Anda: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = ''; // Proses pemikiran lengkap
let answerContent = ''; // Respons lengkap
let isAnswering = false; // Menunjukkan apakah fase respons telah dimulai
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
model: 'glm-5',
messages,
// Catatan: Pada SDK Node.js, parameter non-standar seperti enable_thinking diteruskan sebagai properti tingkat atas, bukan di dalam extra_body.
enable_thinking: true,
stream: true,
stream_options: {
include_usage: true
},
});
console.log('\n' + '='.repeat(20) + 'Thought Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\n' + '='.repeat(20) + 'Token Usage' + '='.repeat(20) + '\n');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Kumpulkan hanya konten pemikiran
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// Setelah menerima konten, mulai menghasilkan respons
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Respons
====================Thought Process====================
Let me think carefully about the user's question, 'Who are you?'. This needs to be analyzed and responded to from multiple angles.
First, this is a basic identity question. As a GLM large language model, I need to accurately state my identity. I should clearly state that I am an AI assistant developed by Zhipu AI.
Second, I need to consider the user's possible intent in asking this question. They might be first-time users wanting to know basic features, or they might want to confirm if I can provide specific help, or they might just be testing my response style. Therefore, I need to give an open and friendly answer.
I also need to consider the completeness of the answer. In addition to introducing my identity, I should also briefly explain my main functions, such as Q&A, creation, and analysis, to let the user know how to use this assistant.
Finally, I must ensure a friendly and approachable tone and express a willingness to help. I can use expressions like 'I am happy to serve you' to make the user feel the warmth of the communication.
Based on these thoughts, I can organize a concise and clear answer that both answers the user's question and guides subsequent communication.
====================Complete Response====================
I am GLM, a large language model trained by Zhipu AI. I am trained on massive amounts of text data to understand and generate human language, helping users answer questions, provide information, and engage in conversations.
I will continue to learn and improve to provide better services. I am happy to answer your questions or provide assistance. What can I do for you?
====================Token Usage====================
{ prompt_tokens: 7, completion_tokens: 248, total_tokens: 255 }
HTTP
Kode contoh
curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'
DashScope
Python
Kode contoh
import os
from dashscope import Generation
# Inisialisasi parameter permintaan
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# Jika Anda belum mengonfigurasi variabel lingkungan, ganti nilai berikut dengan Kunci API Model Studio Anda: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="glm-5",
messages=messages,
result_format="message", # Atur format hasil ke message
enable_thinking=True, # Aktifkan mode berpikir
stream=True, # Aktifkan keluaran streaming
incremental_output=True, # Aktifkan keluaran inkremental
)
reasoning_content = "" # Proses pemikiran lengkap
answer_content = "" # Respons lengkap
is_answering = False # Menunjukkan apakah fase respons telah dimulai
print("\n" + "=" * 20 + "Thought Process" + "=" * 20 + "\n")
for chunk in completion:
message = chunk.output.choices[0].message
# Kumpulkan hanya konten pemikiran
if "reasoning_content" in message:
if not is_answering:
print(message.reasoning_content, end="", flush=True)
reasoning_content += message.reasoning_content
# Setelah menerima konten, mulai menghasilkan respons
if message.content:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
print(message.content, end="", flush=True)
answer_content += message.content
print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
print(chunk.usage)
Respons
====================Thought Process====================
Let me think carefully about the user's question, 'Who are you?'. First, I need to analyze the user's intent. This could be curiosity from a first-time user, or they might want to know about my specific functions and capabilities.
From a professional perspective, I should clearly state my identity as a GLM large language model, explaining my basic positioning and main functions. I should avoid overly technical descriptions and explain in an easy-to-understand way.
At the same time, I should also consider some practical issues that users might care about, such as privacy protection and data security. These are points of great concern for users when using AI services.
In addition, to show professionalism and friendliness, I can proactively guide the conversation after the introduction by asking if the user needs specific help. This will help the user understand me better and pave the way for subsequent conversations.
Finally, I must ensure the answer is concise and clear, with key points highlighted, so that the user can quickly understand my identity and purpose. Such an answer can both satisfy the user's curiosity and demonstrate professionalism and a service-oriented attitude.
====================Complete Response====================
I am a GLM large language model developed by Zhipu AI, designed to provide users with information and help through natural language processing technology. I am trained on massive amounts of text data and can understand and generate human language, answer questions, provide knowledge support, and participate in conversations.
My design goal is to be a useful AI assistant while ensuring user privacy and data security. I do not store users' personal information and will continue to learn and improve to provide higher quality services.
Is there any question I can answer or any task I can assist you with?
====================Token Usage====================
{"input_tokens": 8, "output_tokens": 269, "total_tokens": 277}
Java
Kode contoh
Gunakan SDK Java DashScope versi 2.19.4 atau yang lebih baru.
// Versi SDK DashScope harus 2.19.4 atau lebih baru.
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
public class Main {
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (reasoning != null && !reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thought Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (content != null && !content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("glm-5")
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.err.println("An exception occurred: " + e.getMessage());
}
}
}
Respons
====================Thought Process====================
Let me think about how to answer the user's question. First, this is a simple identity question that needs a clear and direct answer.
As a large language model, I should accurately state my basic identity information. This includes:
- Name: GLM
- Developer: Zhipu AI
- Main functions: Language understanding and generation
Considering that the user's question may stem from their first interaction, I need to introduce myself in an easy-to-understand way, avoiding overly technical terms. At the same time, I should also briefly explain my main capabilities to help the user better understand how to interact with me.
I should also express a friendly and open attitude, welcoming users to ask various questions to lay a good foundation for subsequent conversations. However, the introduction should be concise and clear, without being too detailed, to avoid overwhelming the user with information.
Finally, to promote further communication, I can proactively ask if the user needs specific help to better serve their actual needs.
====================Complete Response====================
I am GLM, a large language model developed by Zhipu AI. I am trained on massive amounts of text data and can understand and generate human language, answer questions, provide information, and engage in conversations.
My design purpose is to help users solve problems, provide knowledge, and support various language tasks. I will continuously learn and update to provide more accurate and useful answers.
Is there any question I can help you answer or discuss?
HTTP
Kode contoh
curl
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "glm-5",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'
Pemanggilan alat streaming
Model glm-5, glm-4.7, dan glm-4.6 mendukung parameter tool_stream. Parameter Boolean ini secara default bernilai false dan hanya berlaku ketika parameter stream diatur ke true. Jika diaktifkan, bidang `arguments` dalam respons `tool_call` pada pemanggilan fungsi akan dikembalikan secara inkremental melalui aliran, bukan sekaligus setelah proses generasi selesai sepenuhnya.
Perilaku gabungan antara parameter stream dan tool_stream adalah sebagai berikut:
|
stream |
tool_stream |
metode pengembalian tool_call |
|
true |
true |
Argumen dikembalikan secara inkremental dalam beberapa chunk. |
|
true |
false (default) |
Argumen dikembalikan sepenuhnya dalam satu chunk. |
|
false |
true/false |
tool_stream tidak berlaku. Argumen dikembalikan sekaligus dalam respons lengkap. |
Kompatibel dengan OpenAI
Python
Kode contoh
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The name of the city"}
},
"required": ["city"]
}
}
}
]
messages = [{"role": "user", "content": "What is the weather like in Beijing"}]
completion = client.chat.completions.create(
model="glm-5",
tools=tools,
messages=messages,
extra_body={
"tool_stream": True,
},
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
delta = chunk.choices[0].delta
if hasattr(delta, 'content') and delta.content:
print(f"[content] {delta.content}")
if hasattr(delta, 'tool_calls') and delta.tool_calls:
for tc in delta.tool_calls:
print(f"[tool_call] id={tc.id}, name={tc.function.name}, args={tc.function.arguments}")
if chunk.choices[0].finish_reason:
print(f"[finish_reason] {chunk.choices[0].finish_reason}")
if not chunk.choices and chunk.usage:
print(f"[usage] {chunk.usage}")
Node.js
Kode contoh
import OpenAI from "openai";
import process from 'process';
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get weather information for a specified city",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "The name of the city" }
},
required: ["city"]
}
}
}
];
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'glm-5',
messages: [{ role: 'user', content: 'What is the weather like in Beijing' }],
tools: tools,
tool_stream: true,
stream: true,
stream_options: {
include_usage: true
},
});
for await (const chunk of stream) {
if (!chunk.choices?.length) {
if (chunk.usage) {
console.log(`[usage] ${JSON.stringify(chunk.usage)}`);
}
continue;
}
const delta = chunk.choices[0].delta;
if (delta.content) {
console.log(`[content] ${delta.content}`);
}
if (delta.tool_calls) {
for (const tc of delta.tool_calls) {
console.log(`[tool_call] id=${tc.id}, name=${tc.function.name}, args=${tc.function.arguments}`);
}
}
if (chunk.choices[0].finish_reason) {
console.log(`[finish_reason] ${chunk.choices[0].finish_reason}`);
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
HTTP
Kode contoh
curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5",
"messages": [
{
"role": "user",
"content": "What is the weather like in Beijing"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The name of the city"}
},
"required": ["city"]
}
}
}
],
"stream": true,
"stream_options": {"include_usage": true},
"tool_stream": true
}'
DashScope
Python
Kode contoh
import os
from dashscope import Generation
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The name of the city"}
},
"required": ["city"]
}
}
}
]
messages = [{"role": "user", "content": "What is the weather like in Beijing"}]
completion = Generation.call(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="glm-5",
messages=messages,
tools=tools,
result_format="message",
stream=True,
tool_stream=True,
incremental_output=True,
)
for chunk in completion:
msg = chunk.output.choices[0].message
if msg.content:
print(f"[content] {msg.content}")
if "tool_calls" in msg and msg.tool_calls:
for tc in msg.tool_calls:
fn = tc.get("function", {})
print(f"[tool_call] id={tc.get('id','')}, name={fn.get('name','')}, args={fn.get('arguments','')}")
finish = chunk.output.choices[0].get("finish_reason", "")
if finish and finish != "null":
print(f"[finish_reason] {finish}")
HTTP
Kode contoh
curl
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "glm-5",
"input": {
"messages": [
{
"role": "user",
"content": "What is the weather like in Beijing"
}
]
},
"parameters": {
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The name of the city"}
},
"required": ["city"]
}
}
}
],
"tool_stream": true,
"incremental_output": true,
"result_format": "message"
}
}'
Fitur model
|
Model |
||||||
|
glm-5 |
|
|
Hanya dalam mode non-berpikir |
|
|
Saat ini, hanya caching implisit yang didukung |
|
glm-4.7 |
|
|
Hanya dalam mode non-berpikir |
|
|
|
|
glm-4.6 |
|
|
Hanya dalam mode non-berpikir |
|
|
|
Nilai parameter default
|
Model |
enable_thinking |
temperature |
top_p |
top_k |
repetition_penalty |
|
glm-5 |
true |
1,0 |
0,95 |
20 |
1,0 |
|
glm-4.7 |
true |
1,0 |
0,95 |
20 |
1,0 |
|
glm-4.6 |
true |
1,0 |
0,95 |
20 |
1,0 |
Penagihan
Penagihan didasarkan pada jumlah token input dan output yang digunakan oleh model. Untuk informasi selengkapnya mengenai harga, lihat GLM.
Dalam mode berpikir, output chain-of-thought ditagih berdasarkan jumlah token output.
FAQ
T: Bagaimana cara mengonfigurasi Dify?
J: Saat ini, Anda tidak dapat mengintegrasikan model seri GLM dari Alibaba Cloud Model Studio dengan Dify. Sebagai gantinya, gunakan model Qwen3 melalui kartu Qwen. Untuk informasi selengkapnya, lihat Dify.
Kode error
Jika terjadi error selama eksekusi, lihat Pesan Error untuk troubleshooting.