All Products
Search
Document Center

Intelligent Media Services:Capture and push external audio data

Last Updated:Dec 04, 2025

This topic explains how to use the AICallKit SDK to push external audio PCM data to the SDK, enabling custom audio capture.

Feature introduction

During a call, AICallKit uses its default audio capture module. If the default capture does not meet your requirements due to microphone hardware variations, you can implement a custom audio capture module and use the SDK's external audio input interface to send captured PCM data for communication with the AI agent.

Before you begin

  • Integrate the AI agent and implement basic calling functionality. For more information, see Integration solution.

  • Implement a custom capture module capable of obtaining audio PCM data. We provide sample code that demonstrates how to read PCM data from a local file or the microphone.

Implementation

The AICallKit SDK does not directly provide an external audio input interface. This feature relies on the underlying AliVCSDK_ARTC SDK. You can get the AliRtcEngine instance by calling the getRtcEngine() method of ARTCAICallEngine, and call its API to enable external audio capture and push data. The process is as follows:

image

Step 1: Disable the SDK's audio capture module

To disable the SDK's built-in audio capture module, set the user_specified_use_external_audio_record field in ARTCAICallBase.artcDefaultExtras to TRUE before initializing ARTCAICallEngine.

Note

You must configure ARTCAICallBase.artcDefaultExtras before calling the init method of ARTCAICallEngine. Otherwise, the configuration will not take effect.

Android

// artcDefaultExtras is a static variable of type JSONObject, with a default value of null.
ARTCAICallBase.artcDefaultExtras = new JSONObject();
try {
    // Disable the SDK's audio capture module.
    ARTCAICallBase.artcDefaultExtras.put("user_specified_use_external_audio_record", "TRUE");
} catch (JSONException e) {
    e.printStackTrace();
}

iOS

// ARTCAICallBase.artcDefaultExtras is a static variable of type [String: Any], with a default value of [:].
// Disable the SDK's audio capture module.
ARTCAICallBase.artcDefaultExtras = ["user_specified_use_external_audio_record": "TRUE"]

Step 2: Get the AliRtcEngine object

After AICallKit initializes the underlying ARTC engine, you can get the AliRtcEngine instance in one of the following ways:

  • (Recommended) Retrieve the instance within the onAliRtcEngineCreated callback. this is the optimal time.

    Android

    @Override
    public void onAliRtcEngineCreated(AliRtcEngine engine) {
        if(engine != null) {
         // Get the AliRtcEngine instance and add an external audio stream.
    
        }
    }

    iOS

    public func onAICallRTCEngineCreated() {
        guard let engine = self.engine.getRTCInstance() as? AliRtcEngine else {
            return
        }
        // Save the AliRtcEngine instance and add an external audio stream.
        self.rtcEngine = engine
    }
  • (Alternative) Call the getRtcEngine() method of ARTCAICallEngine. Ensure you call this method only after the onAliRtcEngineCreated callback is triggered.

Step 3: Add an external audio stream

After getting the AliRtcEngine instance, call its addExternalAudioStream method to add an external audio stream. Perform this operation within the onAliRtcEngineCreated callback.

Android

private void addExternalAudio() {
    // Configure based on your use case.
    AliRtcEngine.AliRtcExternalAudioStreamConfig config = new AliRtcEngine.AliRtcExternalAudioStreamConfig();
    config.sampleRate = SAMPLE_RATE; // The sample rate of the external audio stream.
    config.channels = CHANNEL; // The number of channels of the external audio stream.
    config.publishVolume = 100;
    config.playoutVolume =  0;
    config.enable3A = true;

    int result = mAliRtcEngine.addExternalAudioStream(config);
    if (result <= 0) {
        return;
    }
    // The return value is the stream ID.
    mExternalAudioStreamId = result;
}

iOS

func addExternalAudio() {
    let config = AliRtcExternalAudioStreamConfig();
    config.sampleRate = SAMPLE_RATE   // The sample rate of the external audio stream.
    config.channels = CHANNEL         // The number of channels of the external audio stream.
    config.publishVolume = 100
    config.playoutVolume = 0
    config.enable3A = true
    let streamId = self.rtcEngine?.addExternalAudioStream(config)
    if (streamId <= 0) {
        // failed
    }
    else {
        self.externalPublishStreamId = streamId;
    }
}

Step 4: Push PCM data to the SDK

Begin pushing audio PCM data within the onCallBegin callback. Your application can control the size and frequency of the data pushed. For example, you can push 30 ms of audio data every 20 ms.

Note
  • When pushing data, you must correctly set the numSamples parameter of AliRtcAudioFrame based on the actual audio length. On some devices, such as when getting data via AudioRecord.read, the actual amount of data read may be less than the buffer's capacity. Use the return value to determine the actual data length.

  • If the pushExternalAudioStreamRawData operation fails because the SDK's internal buffer is full (Error code: ERR_SDK_AUDIO_INPUT_BUFFER_FULL: 0x01070101), your code must handle this error by pausing (e.g., with sleep) and retrying to avoid blocking.

Sample code

Android

@Override
public void onCallBegin() {
    startPushAudioRawData();
}

public void startPushAudioRawData() {
   // 1. Read data from AudioRecord.
    int bytesRead = 0;
    // Read data based on the audio source type.
    bytesRead = audioRecord.read(buffer, 0, buffer.length);   
    // 2. Push data to the SDK using the pushExternalAudioStreamRawData interface.
    if (mAliRtcEngine != null && bytesRead > 0) {
        // Construct an AliRtcAudioFrame object.
        AliRtcEngine.AliRtcAudioFrame sample = new AliRtcEngine.AliRtcAudioFrame();
        sample.data = audioData;
        sample.numSamples = bytesRead / (channels * (bitsPerSample / 8)); // Calculate the number of samples based on the actual bytes read.
        sample.numChannels = CHANNEL;
        sample.samplesPerSec = SAMPLE_RATE; // Sample rate.
        sample.bytesPerSample = bitsPerSample / 8; // The number of bytes per sample.
        // Push the captured data to the SDK.
        int ret = 0;
        // Retry when the push operation fails because the buffer is full.
        int retryCount = 0;
        final int MAX_RETRY_COUNT = 20;
        final int BUFFER_WAIT_MS = 10;
        do {
            ret = mAliRtcEngine.pushExternalAudioStreamRawData(mExternalAudioStreamId, sample);
            if(ret == ErrorCodeEnum.ERR_SDK_AUDIO_INPUT_BUFFER_FULL) {
                // Handle the case where the buffer is full. Wait and retry.
                retryCount++;
                if(mExternalAudioStreamId <= 0 || retryCount >= MAX_RETRY_COUNT) {
                    // Stop the loop if pushing is stopped or the maximum number of retries is reached.
                    break;
                }

                try {
                    // Pause for a short period.
                    Thread.sleep(BUFFER_WAIT_MS);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                    break;
                }
            } else {
                // Exit the loop if the push is successful or another error occurs.
                break;
            }
        } while (retryCount < MAX_RETRY_COUNT);

        // Record a log if the push operation fails.
        if(ret != 0) {
            if(ret == ErrorCodeEnum.ERR_SDK_AUDIO_INPUT_BUFFER_FULL) {
                // If the operation still fails after retries, record a log.
                Log.w("CustomAudioCapture", "Failed to push audio data after retries. Error code: " + ret + ", retries: " + retryCount);
            } else {
                Log.e("CustomAudioCapture", "Failed to push audio data. Error code: " + ret);
            }
        }
    }
}

iOS

let sample = AliRtcAudioFrame()
sample.dataPtr = dataPtr
sample.samplesPerSec = SAMPLE_RATE
sample.bytesPerSample = bytesPerSample
sample.numOfChannels = CHANNEL
sample.numOfSamples = numOfSamples

var retryCount = 0
let MAX_RETRY_COUNT = 20

while retryCount < MAX_RETRY_COUNT {
    if !(self.externalPublishStreamId > 0) {
        // Exit the loop if pushing is no longer required.
        break
    }

    let rc = self.rtcEngine?.pushExternalAudioStream(self.externalPublishStreamId, rawData: sample) ?? 0

    // 0x01070101 ERR_SDK_AUDIO_INPUT_BUFFER_FULL: The buffer is full. A retry is required.
    if rc == 0x01070101 && !(pcmInputThread?.isCancelled ?? true) {
        Thread.sleep(forTimeInterval: 0.03) // Wait for 30ms before retrying.
        retryCount += 1;
    } else {
        if rc < 0 {
            "pushExternalAudioStream error, ret: \(rc)".printLog()
        }
        break
    }
}

Step 5: Remove the external audio stream

Android

engine.removeExternalAudioStream(mExternalAudioStreamId);

iOS

self.rtcEngine?.removeExternalAudioStream(self.externalPublishStreamId)