This topic describes the billing for Real-time Conversational AI.
Purchase guide
To use the Alibaba Cloud Real-time Conversational AI service, you must meet the following requirements:
You must activate the Real-time Conversational AI feature. To do so, go to Activate Service. If the service is already active, you can use it immediately.
NoteIf the message "The quantity you are purchasing exceeds the available limit. Please select a new quantity!" appears, it means the service is already active.
Product pricing
Standard pricing for the AI agent service
If an AI agent's input and output are audio-only, billing is based on the audio specifications. The standard audio pricing includes fees for speech-to-text (STT) (Automatic Speech Recognition (ASR) and acoustics), text-to-speech (TTS), and agent runtime.
If an AI agent's input or output includes video, billing is based on the video specifications. The standard video pricing includes fees for STT (ASR and acoustics), TTS, and agent runtime (excluding digital humans).
The standard pricing model uses bundled billing. This means the fees for all three services are charged together.
Specification/Region | The Chinese mainland (USD/minute) | Singapore (USD/minute) |
Audio | 0.014 | 0.028 |
Video | 0.0502 | 0.1003 |
Itemized billing model
ApsaraVideo Real-time Communication service pricing
ApsaraVideo Real-time Communication provides calling capabilities and is billed based on call duration. The price is the same globally, regardless of the region. For more information about pricing, see ApsaraVideo Real-time Communication Fees.
By default, latency statistics are enabled for the agent. Voice calls are billed for ApsaraVideo Real-time Communication at the 480P rate. To disable this, contact Alibaba Cloud sales for configuration.
Digital human (Optional)
Real-time Conversational AI supports connecting to digital human nodes. It currently supports FaceUnity and Lingjing digital humans.
FaceUnity: Go to the official FaceUnity website and contact their customer service for billing information.
Lingjing: Submit a ticket to activate and use this service.
Large language model (Optional)
If you choose a pre-configured system large language model, no fee is currently charged.
If you use a non-built-in large language model, you will incur corresponding LLM fees. For billing details, see the pricing documentation for that product.
Billing rules
Total Real-time Conversational AI fee = AI agent service fee + ApsaraVideo Real-time Communication service fee
Fee for each item = Unit price of each service × Billable duration
Billing cycle: Fees are billed hourly and calculated within 30 minutes after an AI agent session ends. Durations of less than one minute are rounded up to one minute.
Billing example
User A has 10 audio-only calls with an AI agent in the Chinese mainland region. Each call lasts 2 minutes. The fees for each module are calculated as follows:
AI agent service fee: The billable duration is 20 minutes (10 calls × 2 minutes). The fee is USD 0.28 (20 minutes × USD 0.014/minute).
ARTC: Because the calls are bidirectional, the billable duration is 40 minutes (10 calls × 2 minutes × 2). The fee is USD 0.0344 (40 minutes × USD 0.00086/minute).
Total fee: USD 0.3144 = USD 0.28 + USD 0.0344.