Initiate a call using a wake word - Intelligent Media Services

This topic describes how to quickly initiate a call on smart hardware devices using a wake word.

Background

For smart hardware, a wake word is often used to activate a voice assistant for an online conversation. For example:

The user says the wake word, waits for the device to respond (such as "I'm here"), and then gives a command.
```
User: Hey, Siri.
Siri: I'm here.
User: What's the weather like today?
Siri: It's cloudy today...
```
The user says the wake word and the question in a single phrase, and the device responds directly to the question.
```
User: Hey, Siri, what's the weather like today?
Siri: It's cloudy today...
```

This guide is intended for hardware devices that have integrated a wake up SDK. It explains how to initiate a call upon detecting the wake word to ensure an immediate response.

Prerequisite

You have integrated the call agent and implemented basic voice call features. For more information, see Integration overview.

Implementation

Core process:

Pre-fetch token: Your application server pre-fetches the required call token and cache it on the device. This reduces network request latency when initiating a call.
Pre-start audio capture: Enable RTC audio capture and pre-join capture buffering when creating the call engine. This shortens the time to the first upstream audio frame.
Mutual exclusion: When a call starts, the wake up SDK (especially its microphone capture) must be stopped. Restart the wake word function after the call ends.
Response handling:
- For a normal start (playing a response such as "I'm here"):
  - Option 1: Use the RTC engine to play the prompt tone. This leverages its built-in acoustic echo cancellation (AEC) to prevent self-capture.
  - Option 2: Play the prompt tone from the application layer. During playback, you must mute the RTC microphone to prevent the sound from being re-recorded.
- For a Q&A start: The question captured with the wake word is passed as a parameter to the agent to trigger an immediate answer.

The following sections explain how to call the SDK's APIs to implement these key steps.

Configure and initiate the call

The following configuration is recommended for wake word-initiated calls:

Disable the agent's welcome message.
If the wake word detection includes a question, set it up for a Q&A start.

// Disable the agent's welcome message. This is not needed if no welcome message is configured in the workflow.
artcaiCallConfig.agentConfig.agentGreeting = "";
// For a Q&A start, set wakeUpQuery to the question asked.
artcaiCallConfig.agentConfig.wakeUpQuery = "What's the weather like today?";

// Set other parameters.
...

// Initialize the configuration.
mAICallEngine.init(artcaiCallConfig);

// Start the call.
mAICallEngine.call(token);

Get the RTC Engine and enable audio capture

@Override
public void onAliRtcEngineCreated(AliRtcEngine engine) {
    // Cache the engine instance for later use.
    mRtcEngine = engine;

    // Start audio capture in advance so that audio can be captured during the connection process.
    engine.startAudioCapture();
    engine.startAudioPlayer();
    // Enable pre-join capture buffering for the RTC.
    engine.setParameter("{\"audio\":{\"user_specified_ahead_push_stream\":true}}");
}

Play a welcome message

For more information, see Play raw audio data or local files.

Note that the audio player must be started in advance.

Mute and unmute the microphone

If your application layer plays the welcome message, you must mute the microphone to prevent the audio from being re-captured. Re-enable the microphone after playback is complete.

// Mute the microphone.
mAICallEngine.muteMicrophone(true);

// Unmute the microphone.
mAICallEngine.muteMicrophone(false);