All Products
Search
Document Center

Platform For AI:ChatLLM-WebUI release notes

Last Updated:Mar 05, 2026

This topic provides important release information for ChatLLM-WebUI.

Important release information

Date

Image version

Built-in library version

Updates

2024.6.21

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4

    Tag: chat-llm-webui:3.0

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4-flash-attn

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4-vllm

    Tag: chat-llm-webui:3.0-vllm

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4-vllm-flash-attn

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4-blade

    Tag: chat-llm-webui:3.0-blade

  • Torch: 2.3.0

  • Torchvision: 0.18.0

  • Transformers: 4.41.2

  • vLLM: 0.5.0.post1

  • vllm-flash-attn: 2.5.9

  • Blade: 0.7.0

  • Supports Rerank model deployment.

  • Supports simultaneous or separate deployment of Embedding, Rerank, and LLM models.

  • The Transformers backend supports Deepseek-V2, Yi1.5, and Qwen2.

  • Changes the model type of Qwen1.5 to qwen1.5.

  • The vLLM backend supports Qwen2.

  • The BladeLLM backend supports Llama3 and Qwen2.

  • The HuggingFace (HF) backend supports batch inputs.

  • The BladeLLM backend supports OpenAI Chat.

  • Fixes BladeLLM Metrics access.

  • The Transformers backend supports 8-bit floating point (FP8) model deployment.

  • The Transformers backend supports multiple quantization tools, such as AWQ, HQQ, and Quanto.

  • The vLLM backend supports FP8.

  • The vLLM and Blade inference parameters support setting stop words.

  • The Transformers backend is adapted for H-series GPUs.

2024.4.30

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.3

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.3-flash-attn

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.3-vllm

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.3-vllm-flash-attn

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.3-blade

  • Torch: 2.3.0

  • Torchvision: 0.18.0

  • Transformers: 4.40.2

  • vllm: 0.4.2

  • Blade: 0.5.1

  • Supports embedding model deployment.

  • The vLLM backend supports returning token usage.

  • Supports Sentence-Transformers model deployment.

  • The Transformers backend supports yi-9B, qwen2-moe, llama3, qwencode, qwen1.5-32G/110B, phi-3, and gemma-1.1-2/7B.

  • The vLLM backend supports yi-9B, qwen2-moe, SeaLLM, llama3, and phi-3.

  • The Blade backend supports qwen1.5 and SeaLLM.

  • Supports multi-model deployment of LLM and Embedding models.

  • Releases a flash-attn runtime image for the Transformers backend.

  • Releases a flash-attn runtime image for the vLLM backend.

2024.3.28

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.2

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.2-vllm

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.2-blade

  • Torch: 2.1.2

  • Torchvision: 0.16.2

  • Transformers: 4.38.2

  • Vllm: 0.3.3

  • Blade: 0.4.8

  • Adds the Blade inference backend, which supports multi-GPU configurations on a single machine and quantization settings.

  • The Transformers backend performs inference based on tokenizer chat templates.

  • The HF backend supports Multi-LoRA inference.

  • Blade supports quantized model deployment.

  • Blade automatically splits models.

  • The Transformers backend supports Deepseek and Gemma.

  • The vLLM backend supports Deepseek and Gemma.

  • The Blade backend supports qwen1.5 and yi models.

  • The vLLM and Blade runtime images provide access to /metrics.

  • The Transformers backend supports token statistics in streaming returns.

2024.2.22

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.1

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.1-vllm

  • Torch: 2.1.2

  • Torchvision: 0.16.0

  • Transformers: 4.37.2

  • vLLM: 0.3.0

  • Extends vLLM parameter settings to support changing all inference parameters during inference.

  • vLLM supports Multi-LoRA.

  • vLLM supports quantized model deployment.

  • The vLLM runtime image no longer depends on the LangChain demo.

  • The Transformers inference backend supports qwen1.5 and qwen2 models.

  • The vLLM inference backend supports qwen-1.5 and qwen-2 models.

2024.1.23

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0

  • eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0-vllm

  • Torch: 2.1.2

  • Torchvision: 0.16.2

  • Transformers: 4.37.2

  • vLLM: 0.2.6

  • Splits backend runtime images for independent compilation and publishing. Adds the new BladeLLM backend.

  • Supports the standard OpenAI API.

  • Models such as Baichuan support performance statistics.

  • Supports models such as yi-6b-chat, yi-34b-chat, and secgpt.

  • The openai/v1/chat/completions endpoint is adapted for the chatglm3 history format.

  • Optimizes asynchronous streaming.

  • vLLM model support is aligned with HF.

  • Optimizes backend API calls.

  • Improves error logs.

2023.12.6

eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:2.1

Tag: chat-llm-webui:2.1

  • Torch: 2.0.1

  • Torchvision: 0.15.2

  • Transformers: 4.33.3

  • vLLM: 0.2.0

  • The HF backend supports mistral, zephyr, yi-6b, yi-34b, qwen-72b, qwen-1.8b, qwen7b-int4, qwen14b-int4, qwen7b-int8, qwen14b-int8, qwen-72b-int4, qwen-72b-int8, qwen-1.8b-int4, and qwen-1.8b-int8 models.

  • The vLLM backend supports Qwen and ChatGLM1/2/3 models.

  • The HF inference backend supports flash attention.

  • The ChatGLM series of models supports performance statistics.

  • Adds the --history-format command-line parameter to support setting roles.

  • The LangChain demo supports the Qwen model.

  • Optimizes the FastAPI streaming access interface.

2023.9.13

eas-registry.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:2.0

Tag: chat-llm-webui:2.0

  • Torch: 2.0.1+cu117

  • Torchvision: 0.15.2+cu117

  • Transformers: 4.33.3

  • vLLM: 0.2.0

  • Supports multiple backends: vLLM and HF.

  • The LangChain demo supports ChatLLM and Llama2 models.

  • Supports models such as Baichuan, Baichuan2, Qwen, Falcon, Llama2, ChatGLM, ChatGLM2, ChatGLM3, and yi.

  • Adds HTTP and WebSocket support for conversation streaming.

  • Non-streaming responses include the number of generated tokens.

  • All models support multi-turn conversations.

  • Supports exporting conversation records.

  • Supports System Prompt settings and prompt concatenation for template-free inputs.

  • Inference parameters are configurable.

  • Supports Debug mode for logs, which includes inference time in the output.

  • The vLLM backend supports the transactional processing (TP) parallel solution by default for multi-GPU configurations on a single machine.

  • Supports model deployment with Float32, Float16, Int8, and Int4 precision.

References

EAS provides a scenario-based method to deploy ChatLLM. This method simplifies the deployment of popular open source large language model (LLM) applications because it requires only a few parameter configurations. For more information about deploying and calling LLM services, see Deploy large language models.