This topic explains how to use the AICallKit SDK to push external audio PCM data to the SDK, enabling custom audio capture.
Feature introduction
During a call, AICallKit uses its default audio capture module. If the default capture does not meet your requirements due to microphone hardware variations, you can implement a custom audio capture module and use the SDK's external audio input interface to send captured PCM data for communication with the AI agent.
Before you begin
Integrate the AI agent and implement basic calling functionality. For more information, see Integration solution.
Implement a custom capture module capable of obtaining audio PCM data. We provide sample code that demonstrates how to read PCM data from a local file or the microphone.
Implementation
The AICallKit SDK does not directly provide an external audio input interface. This feature relies on the underlying AliVCSDK_ARTC SDK. You can get the AliRtcEngine instance by calling the getRtcEngine() method of ARTCAICallEngine, and call its API to enable external audio capture and push data. The process is as follows:
Step 1: Disable the SDK's audio capture module
To disable the SDK's built-in audio capture module, set the user_specified_use_external_audio_record field in ARTCAICallBase.artcDefaultExtras to TRUE before initializing ARTCAICallEngine.
You must configure ARTCAICallBase.artcDefaultExtras before calling the init method of ARTCAICallEngine. Otherwise, the configuration will not take effect.
Android
// artcDefaultExtras is a static variable of type JSONObject, with a default value of null.
ARTCAICallBase.artcDefaultExtras = new JSONObject();
try {
// Disable the SDK's audio capture module.
ARTCAICallBase.artcDefaultExtras.put("user_specified_use_external_audio_record", "TRUE");
} catch (JSONException e) {
e.printStackTrace();
}iOS
// ARTCAICallBase.artcDefaultExtras is a static variable of type [String: Any], with a default value of [:].
// Disable the SDK's audio capture module.
ARTCAICallBase.artcDefaultExtras = ["user_specified_use_external_audio_record": "TRUE"]Step 2: Get the AliRtcEngine object
After AICallKit initializes the underlying ARTC engine, you can get the AliRtcEngine instance in one of the following ways:
(Recommended) Retrieve the instance within the
onAliRtcEngineCreatedcallback. this is the optimal time.Android
@Override public void onAliRtcEngineCreated(AliRtcEngine engine) { if(engine != null) { // Get the AliRtcEngine instance and add an external audio stream. } }iOS
public func onAICallRTCEngineCreated() { guard let engine = self.engine.getRTCInstance() as? AliRtcEngine else { return } // Save the AliRtcEngine instance and add an external audio stream. self.rtcEngine = engine }(Alternative) Call the
getRtcEngine()method ofARTCAICallEngine. Ensure you call this method only after theonAliRtcEngineCreatedcallback is triggered.
Step 3: Add an external audio stream
After getting the AliRtcEngine instance, call its addExternalAudioStream method to add an external audio stream. Perform this operation within the onAliRtcEngineCreated callback.
Android
private void addExternalAudio() {
// Configure based on your use case.
AliRtcEngine.AliRtcExternalAudioStreamConfig config = new AliRtcEngine.AliRtcExternalAudioStreamConfig();
config.sampleRate = SAMPLE_RATE; // The sample rate of the external audio stream.
config.channels = CHANNEL; // The number of channels of the external audio stream.
config.publishVolume = 100;
config.playoutVolume = 0;
config.enable3A = true;
int result = mAliRtcEngine.addExternalAudioStream(config);
if (result <= 0) {
return;
}
// The return value is the stream ID.
mExternalAudioStreamId = result;
}iOS
func addExternalAudio() {
let config = AliRtcExternalAudioStreamConfig();
config.sampleRate = SAMPLE_RATE // The sample rate of the external audio stream.
config.channels = CHANNEL // The number of channels of the external audio stream.
config.publishVolume = 100
config.playoutVolume = 0
config.enable3A = true
let streamId = self.rtcEngine?.addExternalAudioStream(config)
if (streamId <= 0) {
// failed
}
else {
self.externalPublishStreamId = streamId;
}
}Step 4: Push PCM data to the SDK
Begin pushing audio PCM data within the onCallBegin callback. Your application can control the size and frequency of the data pushed. For example, you can push 30 ms of audio data every 20 ms.
When pushing data, you must correctly set the
numSamplesparameter ofAliRtcAudioFramebased on the actual audio length. On some devices, such as when getting data viaAudioRecord.read, the actual amount of data read may be less than the buffer's capacity. Use the return value to determine the actual data length.If the
pushExternalAudioStreamRawDataoperation fails because the SDK's internal buffer is full (Error code:ERR_SDK_AUDIO_INPUT_BUFFER_FULL: 0x01070101), your code must handle this error by pausing (e.g., withsleep) and retrying to avoid blocking.
Sample code
Android
@Override
public void onCallBegin() {
startPushAudioRawData();
}
public void startPushAudioRawData() {
// 1. Read data from AudioRecord.
int bytesRead = 0;
// Read data based on the audio source type.
bytesRead = audioRecord.read(buffer, 0, buffer.length);
// 2. Push data to the SDK using the pushExternalAudioStreamRawData interface.
if (mAliRtcEngine != null && bytesRead > 0) {
// Construct an AliRtcAudioFrame object.
AliRtcEngine.AliRtcAudioFrame sample = new AliRtcEngine.AliRtcAudioFrame();
sample.data = audioData;
sample.numSamples = bytesRead / (channels * (bitsPerSample / 8)); // Calculate the number of samples based on the actual bytes read.
sample.numChannels = CHANNEL;
sample.samplesPerSec = SAMPLE_RATE; // Sample rate.
sample.bytesPerSample = bitsPerSample / 8; // The number of bytes per sample.
// Push the captured data to the SDK.
int ret = 0;
// Retry when the push operation fails because the buffer is full.
int retryCount = 0;
final int MAX_RETRY_COUNT = 20;
final int BUFFER_WAIT_MS = 10;
do {
ret = mAliRtcEngine.pushExternalAudioStreamRawData(mExternalAudioStreamId, sample);
if(ret == ErrorCodeEnum.ERR_SDK_AUDIO_INPUT_BUFFER_FULL) {
// Handle the case where the buffer is full. Wait and retry.
retryCount++;
if(mExternalAudioStreamId <= 0 || retryCount >= MAX_RETRY_COUNT) {
// Stop the loop if pushing is stopped or the maximum number of retries is reached.
break;
}
try {
// Pause for a short period.
Thread.sleep(BUFFER_WAIT_MS);
} catch (InterruptedException e) {
e.printStackTrace();
break;
}
} else {
// Exit the loop if the push is successful or another error occurs.
break;
}
} while (retryCount < MAX_RETRY_COUNT);
// Record a log if the push operation fails.
if(ret != 0) {
if(ret == ErrorCodeEnum.ERR_SDK_AUDIO_INPUT_BUFFER_FULL) {
// If the operation still fails after retries, record a log.
Log.w("CustomAudioCapture", "Failed to push audio data after retries. Error code: " + ret + ", retries: " + retryCount);
} else {
Log.e("CustomAudioCapture", "Failed to push audio data. Error code: " + ret);
}
}
}
}iOS
let sample = AliRtcAudioFrame()
sample.dataPtr = dataPtr
sample.samplesPerSec = SAMPLE_RATE
sample.bytesPerSample = bytesPerSample
sample.numOfChannels = CHANNEL
sample.numOfSamples = numOfSamples
var retryCount = 0
let MAX_RETRY_COUNT = 20
while retryCount < MAX_RETRY_COUNT {
if !(self.externalPublishStreamId > 0) {
// Exit the loop if pushing is no longer required.
break
}
let rc = self.rtcEngine?.pushExternalAudioStream(self.externalPublishStreamId, rawData: sample) ?? 0
// 0x01070101 ERR_SDK_AUDIO_INPUT_BUFFER_FULL: The buffer is full. A retry is required.
if rc == 0x01070101 && !(pcmInputThread?.isCancelled ?? true) {
Thread.sleep(forTimeInterval: 0.03) // Wait for 30ms before retrying.
retryCount += 1;
} else {
if rc < 0 {
"pushExternalAudioStream error, ret: \(rc)".printLog()
}
break
}
}Step 5: Remove the external audio stream
Android
engine.removeExternalAudioStream(mExternalAudioStreamId);iOS
self.rtcEngine?.removeExternalAudioStream(self.externalPublishStreamId)