All Products
Search
Document Center

Intelligent Media Services:Real-time Conversational AI

Last Updated:Nov 12, 2025

This topic describes the billing details for Real-time Conversational AI.

Purchase guide

To use the Alibaba Cloud Real-time Conversational AI service, you must meet the following requirements:

  • Ensure that you have activated the Real-time Conversational AI feature. If it is not activated, go to Activate Service to activate it. If the service is already active, you can use it directly.

    Note

    If you see the message "The quantity you are purchasing exceeds the available limit. Please select a new quantity!", it means the service is already active.

Product pricing

Standard pricing for the AI agent service

  • If the agent's input and output are audio-only, billing is based on the audio specification. The standard audio pricing includes fees for Speech-to-Text (STT), which covers Automatic Speech Recognition (ASR) and acoustic modeling, Text-to-Speech (TTS), and agent runtime.

  • If the agent's input or output includes video, billing is based on the video specification. The standard video pricing includes fees for STT, which covers ASR and acoustic modeling, TTS, and agent runtime (excluding digital humans).

The standard pricing model uses a bundled billing approach. The fees for all three services are charged in full.

Specification/Region

The Chinese mainland

(USD/minute)

Singapore

(USD/minute)

Audio

0.014

0.028

Video

0.0502

0.1003

Pay-per-feature billing model

AI agent platform service pricing

The platform service fee is the basic charge for using the AI agent public cloud service. This fee is cost-based and covers expenses incurred during the agent's runtime, such as computing power and egress bandwidth on Alibaba Cloud. If the agent's input and output are audio-only, billing is based on the audio specification. If the agent's input or output includes video, billing is based on the video specification.

Billable item

Specification

Price (CNY/minute)

AI agent platform service

Audio

0.0328

Video

0.286

STT and TTS service pricing (optional)

If you use the built-in capabilities of the Real-time Conversational AI solution, you will be charged for them. If you use external services (third-party, self-developed, or Alibaba Cloud Model Studio), you will not be charged under this product.

Billable item

Price (CNY/minute)

Speech-to-Text (STT)

0.058

Text-to-Speech (TTS)

0.0072

Note

If you use an external large language model (LLM), you will incur corresponding LLM fees. For specific billing details, see the billing documentation for that product.

Example of pay-per-feature billing

User A has 10 audio-only calls with an AI agent. Each call lasts 2 minutes. The fees for each module are calculated as follows:

  • AI agent platform service: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is CNY 0.656 (20 minutes × CNY 0.0328/minute).

  • Speech-to-Text: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is CNY 1.16 (20 minutes × CNY 0.058/minute).

  • Text-to-Speech: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is CNY 0.144 (20 minutes × CNY 0.0072/minute).

  • Alibaba Real-Time Communication (ARTC): Because the calls are bidirectional, the billable duration is 40 minutes (10 calls × 2 minutes × 2). The fee is CNY 0.24 (40 minutes × 0.006).

  • Total fee: CNY 2.2 = CNY 0.656 + CNY 1.16 + CNY 0.144 + CNY 0.24.

ApsaraVideo Real-time Communication service pricing

ApsaraVideo Real-time Communication provides calling capabilities and is billed based on call duration. The price is the same globally, regardless of the region. For more information about pricing, see ApsaraVideo Real-time Communication fees.

Digital human (optional)

Real-time Conversational AI lets you integrate digital human nodes. It currently supports FaceUnity and Lingjing digital humans.

  • FaceUnity: Go to the official FaceUnity website and contact their customer service for billing information.

  • Lingjing: To activate and use this service, submit a ticket.

Large language model (optional)

  • If you choose the system's preset large language model, this service is currently free of charge.

  • If you use an external large language model, you will incur corresponding LLM fees. For specific billing details, see the billing documentation for that product.

Billing rules

Total Real-time Conversational AI fee = AI agent service fee + ApsaraVideo Real-time Communication service fee

Fee for each item = Unit price of each service × Billable duration

Billing cycle: Fees are settled on an hourly basis, within 30 minutes after an AI agent session ends. Any duration less than one minute is rounded up to one minute.

Billing example

User A has 10 audio-only calls with an AI agent in the Chinese mainland region. Each call lasts 2 minutes. The fees for each module are calculated as follows:

  • AI agent service fee: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is USD 0.28 (20 minutes × USD 0.014/minute).

  • ARTC: Because the calls are bidirectional, the billable duration is 40 minutes (10 calls × 2 minutes × 2). The fee is USD 0.0344 (40 minutes × USD 0.00086/minute).

  • Total fee: USD 0.3144 = USD 0.28 + USD 0.0344.