DeepSeek-V3 is a mixture of experts (MoE) large language model with 671 billion parameters, launched by DeepSeek. DeepSeek-R1 is a high-performance inference model trained on the DeepSeek-V3-Base. The Model Gallery offers vLLM or BladeLLM accelerated deployment features, enabling you to deploy the DeepSeek-V3 and DeepSeek-R1 series models with a single click.
Note: The full version models of DeepSeek-R1 and DeepSeek-V3 have a large parameter size (671B), which requires a high configuration and cost (8 cards with 96 GB video memory or more). Consider choosing distilled models, which offer more machine resources and lower deployment costs.
Tests indicate that the DeepSeek-R1-Distill-Qwen-32B model delivers better performance and cost-efficiency, making it suitable for cloud deployment. Other distilled models such as 7B, 8B, and 14B are also available for deployment. The Model Gallery also features a model evaluation tool to assess the actual performance of the model (the evaluation entry is located in the upper right corner of the model product page).
The table below shows the DeepSeek models supported for deployment on PAI (Platform for AI), along with their corresponding configurations and pricing. (When deploying DeepSeek through the PAI Model Gallery following the official tutorial, the platform will automatically preselect the recommended model configuration.)
Deployment Method Description:
For optimal performance and maximum supported token count, accelerated deployment (BladeLLM, vLLM) is recommended.
Accelerated deployment supports only the API call method. Standard deployment supports both the API call method and the WebUI chat interface.
1. Navigate to the Model Gallery page.
2. On the Model Gallery page, locate the model card you want to deploy, such as the DeepSeek-R1-Distill-Qwen-32B model, and click to access the model product page.
3. Click Deploy in the upper right corner, select the deployment method and resources, and deploy with one click to create a PAI-EAS service.
Note: For deploying DeepSeek-R1 and DeepSeek-V3, in addition to the ml.gu8v.c192m1024.8-gu120 and ecs.gn8v-8x.48xlarge models in the public resource group (inventory may be limited), the ecs.ebmgn8v.48xlarge model is also an option. Please note that this model is not available through public resources. You must purchase EAS dedicated resources.
After successful deployment, click View Call Information on the service page to obtain the Endpoint and Token for the call.
The service call methods vary depending on the deployment method. Detailed instructions are available on the model introduction page of the Model Gallery.
BladeLLM deployment | vLLM deployment | Standard deployment | |
---|---|---|---|
WebUI |
Not supported. You can download the Web UI code and start a Web UI locally. Note: The Web UI codes for BladeLLM and vLLM are different. BladeLLM: BladeLLM_github, BladeLLM_oss vLLM: vLLM_github, vLLM_oss python webui_client.py --eas_endpoint "<EAS API Endpoint>" --eas_token "<EAS API Token>"
|
Supported | |
Online debugging | Supported. You can select the deployment task in Task Management-Deployment Tasks to enter the product page and find the entry for online debugging. | ||
API calls | completions interface:<EAS_ENDPOINT>/v1/completions chat interface: <EAS_ENDPOINT>/v1/chat/completions |
API description file:<EAS_ENDPOINT>/openapi.json model list: <EAS_ENDPOINT>/v1/models completions interface: <EAS_ENDPOINT>/v1/completions chat interface: <EAS_ENDPOINT>/v1/chat/completions |
<EAS_ENDPOINT> |
Compatible with OpenAI SDK | Not compatible | Compatible | Not compatible |
Request data format |
The request data formats for completions and chat are different. |
Compared to BladeLLM, the model parameter needs to be added. The value of the model parameter can be obtained through the model list interface <EAS_ENDPOINT>/v1/models .
|
Supports string and JSON types. |
Completions request data:
{"prompt":"hello world", "stream":"true"}
Chat request data:
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello World!!"
}
]
}
In the example below, substitute <model_name>
with the name of the model retrieved from the model list interface at <EAS_ENDPOINT>/v1/models
API.
Completions request data:
{"model": "<model_name>", "prompt":"hello world"}
Chat request data:
{
"model": "<model_name>",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}
{
"max_new_tokens": 4,096,
"use_stream_chat": false,
"prompt": "What is the capital of Canada?",
"system_prompt": "Act like you are a knowledgeable assistant who can provide information on geography and related topics.",
"history": [
[
"Can you tell me what's the capital of France?",
"The capital of France is Paris."
]
],
"temperature": 0.8,
"top_k": 10,
"top_p": 0.8,
"do_sample": true,
"use_cache": true
}
For standard deployments, Web applications are supported. In PAI-Model Gallery > Task Management > Deployment Tasks, click the deployed service name. On the Service Product Page, click View WEB Application in the upper right corner for real-time interaction through the ChatLLM WebUI.
For API calls, see how to use the API for model inference.
Evolution of Text-to-SQL Technology - An Analysis of Alibaba Cloud OpenSearch-SQL
44 posts | 1 followers
FollowAlibaba Cloud Native Community - February 20, 2025
Alibaba Cloud Native Community - February 13, 2025
Alibaba Cloud Native Community - February 26, 2025
Alibaba Cloud Native Community - February 28, 2025
Alibaba Cloud Native Community - March 10, 2025
Alibaba Cloud Community - March 6, 2025
44 posts | 1 followers
FollowTop-performance foundation models from Alibaba Cloud
Learn MoreAccelerate innovation with generative AI to create new business success
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreMore Posts by Alibaba Cloud Data Intelligence