All Products
Search
Document Center

Intelligent Media Services:Real-time Conversational AI

Last Updated:Jan 10, 2026

This topic describes the billing for Real-time Conversational AI.

Purchase guide

To use the Alibaba Cloud Real-time Conversational AI service, you must meet the following requirements:

  • You must activate the Real-time Conversational AI feature. To do so, go to Activate Service. If the service is already active, you can use it immediately.

    Note

    If the message "The quantity you are purchasing exceeds the available limit. Please select a new quantity!" appears, it means the service is already active.

Product pricing

Standard pricing for the AI agent service

  • If an AI agent's input and output are audio-only, billing is based on the audio specifications. The standard audio pricing includes fees for speech-to-text (STT) (Automatic Speech Recognition (ASR) and acoustics), text-to-speech (TTS), and agent runtime.

  • If an AI agent's input or output includes video, billing is based on the video specifications. The standard video pricing includes fees for STT (ASR and acoustics), TTS, and agent runtime (excluding digital humans).

The standard pricing model uses bundled billing. This means the fees for all three services are charged together.

Specification/Region

The Chinese mainland

(USD/minute)

Singapore

(USD/minute)

Audio

0.014

0.028

Video

0.0502

0.1003

Itemized billing model

AI agent platform service pricing

Platform service fees are the basic costs for the AI agent runtime when you call the public AI agent cloud service. These fees are based on costs such as computing power and outbound bandwidth on Alibaba Cloud. If the agent's input and output are audio-only, billing is based on the audio specifications. If the input or output includes video, billing is based on the video specifications.

Billable item

Specification

Price (CNY/minute)

AI agent platform service

Audio

0.0328

Video

0.286

STT and TTS service pricing (Optional)

You are charged fees for using the built-in capabilities of the Real-time Conversational AI solution. No fees are incurred under this product if you use non-built-in services, such as third-party, self-developed, or Alibaba Cloud Model Studio services.

Billable item

Price (CNY/minute)

Speech-to-text (STT)

0.058

Text-to-speech (TTS)

0.0072

Note

If you use a non-built-in large language model (LLM), you will incur corresponding LLM fees. For billing details, see the pricing documentation for that product.

Itemized billing example

User A has 10 audio-only calls with an AI agent. Each call lasts 2 minutes. The fees for each module are calculated as follows:

  • AI agent platform service: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is CNY 0.656 (20 minutes × CNY 0.0328/minute).

  • STT: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is CNY 1.16 (20 minutes × CNY 0.058/minute).

  • TTS: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is CNY 0.144 (20 minutes × CNY 0.0072/minute).

  • Alibaba Real-Time Communication (ARTC): Because the calls are bidirectional, the billable duration is 40 minutes (10 calls × 2 minutes × 2). The fee is CNY 0.24 (40 minutes × CNY 0.006/minute).

  • Total fee: CNY 2.2 = CNY 0.656 + CNY 1.16 + CNY 0.144 + CNY 0.24.

ApsaraVideo Real-time Communication service pricing

ApsaraVideo Real-time Communication provides calling capabilities and is billed based on call duration. The price is the same globally, regardless of the region. For more information about pricing, see ApsaraVideo Real-time Communication Fees.

Important

By default, latency statistics are enabled for the agent. Voice calls are billed for ApsaraVideo Real-time Communication at the 480P rate. To disable this, contact Alibaba Cloud sales for configuration.

Digital human (Optional)

Real-time Conversational AI supports connecting to digital human nodes. It currently supports FaceUnity and Lingjing digital humans.

  • FaceUnity: Go to the official FaceUnity website and contact their customer service for billing information.

  • Lingjing: Submit a ticket to activate and use this service.

Large language model (Optional)

  • If you choose a pre-configured system large language model, no fee is currently charged.

  • If you use a non-built-in large language model, you will incur corresponding LLM fees. For billing details, see the pricing documentation for that product.

Billing rules

Total Real-time Conversational AI fee = AI agent service fee + ApsaraVideo Real-time Communication service fee

Fee for each item = Unit price of each service × Billable duration

Billing cycle: Fees are billed hourly and calculated within 30 minutes after an AI agent session ends. Durations of less than one minute are rounded up to one minute.

Billing example

User A has 10 audio-only calls with an AI agent in the Chinese mainland region. Each call lasts 2 minutes. The fees for each module are calculated as follows:

  • AI agent service fee: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is USD 0.28 (20 minutes × USD 0.014/minute).

  • ARTC: Because the calls are bidirectional, the billable duration is 40 minutes (10 calls × 2 minutes × 2). The fee is USD 0.0344 (40 minutes × USD 0.00086/minute).

  • Total fee: USD 0.3144 = USD 0.28 + USD 0.0344.