FAQ - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center

This document answers frequently asked questions about Alibaba Cloud Model Studio.

Billing

What are the unit prices for the models in Alibaba Cloud Model Studio?
For model descriptions, see the Model Studio console. For pricing information, see Model inference pricing.
Are there any prepaid services available?
Yes, some models can be used with a prepaid service. For more information, see Savings plans.
Are pay-as-you-go bills settled monthly?
Bills are generated by the minute and settled monthly.
How can I view my charges and billing details?
Go to Expenses and Costs to view the details.
How can I request an invoice for my expenses?
On the Overview of Monthly Bill page, click Download Invoice in the Actions column for the desired account.
Wan membership Does Wan membership support Alibaba Cloud Model Studio API calls?
No. Wan membership benefits do not apply to Model Studio API calls because they use separate billing systems.

API/SDK

How can I find information about error codes?
API calls to Alibaba Cloud Model Studio return a status code that indicates the result of the call. For details about each code and its solution, see Error codes.
How do I install the SDK?
Alibaba Cloud Model Studio provides SDKs for Java and Python. For instructions, see Install the SDK.
When using a function call with the Assistant API, can I call two local functions in sequence?
a. Calling two separate functions sequentially is not currently supported.
b. As a workaround, you can create two separate Assistant APIs and handle the return value from each one.
Does the Assistant API have memory-related capabilities?
The memory configuration feature is not currently supported.

Product questions

How do I activate the Alibaba Cloud Model Studio service?
Alibaba Cloud Model Studio must be activated on a per-region basis. Log on with your Alibaba Cloud account and go to the Alibaba Cloud Model Studio console (Singapore), Alibaba Cloud Model Studio console (US (Virginia)), or Alibaba Cloud Model Studio console (China (Beijing)). Switch the target region in the upper-right corner of the console. After reading and agreeing to the service agreement, Alibaba Cloud Model Studio is activated automatically. If the agreement does not appear, the service is already activated for that region.
How can I deactivate the Alibaba Cloud Model Studio service?
Currently, the Alibaba Cloud Model Studio service cannot be deactivated. If you use the API to call models or applications, you can prevent future calls by deleting your API key on the API-Key (Singapore), API-Key (US (Virginia)), or API-Key (Beijing) page.
How can I try out the large model services?
You can go to the Playground (Singapore), Playground (US (Virginia)), or Playground (Beijing) page to try them out.
What is the difference between Alibaba Cloud Model Studio and Qwen?
Alibaba Cloud Model Studio is a large language model service platform that provides a variety of large models, including the Qwen series.
How can I implement business data isolation to ensure that data from different users is not associated?
You can use your Alibaba Cloud account to grant different workspace permissions to different RAM users. Data is isolated between workspaces. For more details, see Workspace permission management.
Does Alibaba Cloud Model Studio save data generated during model calls?
Alibaba Cloud strictly protects data privacy and never uses your data for model training. All data transmitted when you build applications or train large models is encrypted with AES-256 (Advanced Encryption Standard).

For details on how your data is handled, see the terms regarding Alibaba Cloud Model Studio in the Alibaba Cloud International Website Product Terms of Service.
How long are conversation histories kept in the Playground, and is there a limit to the number of saved conversations?
The Model Studio console displays a maximum of 100 historical conversation records with no time limit. If you manually delete some records, the system automatically displays older ones. Conversations from trial sessions while not logged in or those that result in inference errors are not saved.
Does Alibaba Cloud Model Studio support adding implicit identifiers to generated text?
No.
Does Alibaba Cloud Model Studio have a mobile app?
Alibaba Cloud Model Studio does not currently offer an official standalone mobile app. The service is primarily accessed through the web console.

Model center

How are the parameters of a large model stored?
You can download open source models from the ModelScope community. Their structure is typically defined in JSON files. You usually need to use open source Python libraries to parse these files, which contain vector information that helps in understanding the storage process.
How many languages do the Qwen series models support?
They support 14 languages: Chinese, English, Arabic, Spanish, French, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Indonesian.
Can the current models connect to structured data, such as MySQL or Hive?
This is not currently supported. However, this feature is under development, with priority given to integrating with ApsaraDB RDS.
Is the text generation speed for models like Qwen-3 and Qwen-Max fixed for all users? Is there a way to adjust the speed?
The generation speed is not fixed. It varies based on factors like the current overall service load and the concurrency of your requests.
After model rate limiting is triggered, how long should I typically wait before trying again?
The waiting time depends on your specific rate limit value, such as requests per second (RPS) or requests per minute (RPM). For example, if your limit is 120 RPM (which is 2 requests per second) and you submit 2 requests consecutively within 0.2 seconds, the third request will be throttled. You will need to wait approximately 0.8 seconds before you can successfully submit another request.

Model hallucination

What is model hallucination?
Model hallucination refers to the phenomenon where a large language model (LLM) generates content that is nonsensical, factually incorrect, distorted, or logically contradictory. The output may seem plausible and fluent but is inconsistent with the input prompt, real-world facts, or logical context. It is important to distinguish hallucination from factual errors (such as those caused by outdated training data), subjective opinions, or creative writing (like when a model is explicitly asked to write a story). The core of hallucination is a confident assertion without a factual basis.
How can I reduce model hallucination?
You can reduce model hallucination in the following ways:
1. Choose a more powerful model: Generally, selecting a larger and more advanced model can reduce hallucinations. For example, in the Qwen series, Max-level models perform better than Plus-level models, which in turn perform better than Turbo-level models.
2. Prompt engineering: Modifying the prompt is a simple and effective way to reduce model hallucination. For example, in a Retrieval-Augmented Generation (RAG) scenario, add instructions like, "Please answer based only on the provided documents. If the information is not available, say 'I don't know.'" You can also add, "Please cite specific data or reports to support your conclusion," use prompts to break down a task into multiple steps, or define a strict role for the model in the prompt.
3. Retrieval-Augmented Generation (RAG): With RAG, you can provide the model with reference materials for its responses and strictly limit its answers to the scope of the retrieved knowledge, significantly reducing hallucinations. When building a RAG system, ensure the retrieval system is high-quality, clearly labels information sources, and gracefully handles cases where no relevant information can be found.
4. Plugins/MCP: Use the capabilities of plugins or MCP to reduce model hallucination. For example, when using a large model to summarize data from a structured database, you can use plugins or MCP to call a database client to perform the calculations. The results can then be returned to the model for summarization, which avoids hallucinations that can occur when a model tries to perform numerical calculations directly.
5. Model parameter tuning: Lowering randomness parameters such astemperature,top_k, andtop_p makes the output more deterministic and less likely to generate bizarre content, though it may sacrifice creativity. In some scenarios, reducingmax_tokens can prevent the model from fabricating content after it has provided the key information.
6. Post-processing verification: After the model's inference is complete, use a subsequent step to verify the correctness of the response. This usually involves using another AI-driven process to check the response for hallucinations. This method increases costs and slows the overall response time.