All Products
Search
Document Center

ApsaraVideo Live:Custom audio capture

Last Updated:Mar 26, 2026

The ARTC SDK provides a flexible custom audio capture feature.

Overview

Although the ARTC SDK's built-in audio module is sufficient for most applications, some use cases require custom audio capture. For example:

  • When an audio capture device is occupied by another process.

  • To capture audio from a custom source, like a proprietary system or an audio file, and send it to the SDK.

The ARTC SDK's flexible custom audio capture feature allows you to manage your own audio devices and sources.

Sample code

Android: Android/ARTCExample/AdvancedUsage/src/main/java/com/aliyun/artc/api/advancedusage/CustomAudioCaptureAndRender/CustomAudioCaptureActivity.java.

iOS: iOS/ARTCExample/AdvancedUsage/CustomAudioCapture/CustomAudioCaptureVC.swift.

Prerequisites

Before you begin, ensure you have completed the following:

Implementation

image

1. Enable or disable internal capture

To use custom audio capture, you must first disable the SDK's internal capture module. We recommend doing this by passing the extras parameter when calling getInstance to create the engine. Use the following parameter:

user_specified_use_external_audio_record: Disables the SDK's internal capture to enable custom audio capture.

  • "TRUE": Use custom audio capture (disables internal capture).

  • "FALSE": Do not use custom audio capture (enables internal capture).

Note

The extras parameter is a JSON string.

Android

String extras = "{\"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.getInstance(this, extras);

iOS

// Create and initialize the engine.
var customAudioCaptureConfig: [String: String] = [:]
// Use custom audio capture.
customAudioCaptureConfig["user_specified_use_external_audio_record"] = "TRUE"
// Serialize to JSON.
guard let jsonData = try? JSONSerialization.data(withJSONObject: customAudioCaptureConfig, options: []),
let extras = String(data: jsonData, encoding: .utf8) else {
     print("JSON serialization failed")
     return
 }
let engine = AliRtcEngine.sharedInstance(self, extras:extras)

Mac

NSString * extras = @"{\"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = [AliRtcEngine sharedInstance:self extras:extras];

Windows

/* Windows supports enabling or disabling audio capture during engine creation. */
/* Disable internal capture. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"TRUE\", \"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);

/* Enable internal capture. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"FALSE\", \"user_specified_use_external_audio_record\":\"FALSE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);

2. Add an external audio stream

Call the addExternalAudioStream method to add an external audio stream and get its stream ID. If you require 3A audio processing, which includes acoustic echo cancellation (AEC), automatic gain control (AGC), and noise suppression (ANS), set the enable3A parameter in the AliRtcExternalAudioStreamConfig object.

Note

When to call this method:

  • If you need to use 3A, we recommend that you call addExternalAudioStream after the audio stream is successfully published and the custom capture module obtains the first audio frame. That is, you should call the method after the onAudioPublishStateChanged interface returns newState as AliRtcStatsPublished (3).

  • If you do not need 3A audio processing (for example, when streaming audio from a local file, network source, or TTS-generated data): You can call this method immediately after creating the engine. Then, start pushing audio data once the stream is published.

Android

AliRtcEngine.AliRtcExternalAudioStreamConfig config = new AliRtcEngine.AliRtcExternalAudioStreamConfig();
config.sampleRate = SAMPLE_RATE; // Sample rate
config.channels = CHANNEL; // Number of channels
// Publish volume
config.publishVolume = 100;
// Local playout volume
config.playoutVolume = isLocalPlayout ? 100 : 0;
config.enable3A = true;

int result = mAliRtcEngine.addExternalAudioStream(config);
if (result <= 0) {
    return;
}
// The return value is the stream ID. You need it to push data to the SDK.
mExternalAudioStreamId = result;

iOS

/* Set parameters based on your application's needs. */
AliRtcExternalAudioStreamConfig *config = [AliRtcExternalAudioStreamConfig new];
// This must match the number of channels of the external PCM audio stream. Set to 1 for mono or 2 for stereo.
config.channels = _pcmChannels;
// This must match the sample rate of the external PCM audio stream.
config.sampleRate = _pcmSampleRate;
config.playoutVolume = 0;
config.publishVolume = 100;
_externalPlayoutStreamId = [self.engine addExternalAudioStream:config];

Mac

/* Set parameters based on your application's needs. */
AliRtcExternalAudioStreamConfig *config = [AliRtcExternalAudioStreamConfig new];
config.channels = pcmChannels;
/** Sample rate. Default: 48000. Supported values: 8000, 12000, 16000, 24000, 32000, 44100, 48000, 64000, 88200, 96000, 176400, 192000. */
config.sampleRate = pcmSampleRate;
config.playoutVolume = 0;
config.publishVolume = 100;
int ret = [self.engine addExternalAudioStream:config];

Windows

/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
    
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
/* Set parameters based on your application's needs. */
AliEngineExternalAudioStreamConfig config;
config.playoutVolume = currentAudioPlayoutVolume;
config.publishVolume = currentAudioPublishVolume;
config.channels = 1;
config.sampleRate = 48000;
config.publishStream = 0;
audioStreamID = mAliRtcMediaEngine->AddExternalAudioStream(config);

mAliRtcMediaEngine->Release();

3. Implement a custom audio capture module

With custom audio capture, you are responsible for implementing the logic to capture and process audio data, and then sending that data to the SDK.

Alibaba Cloud provides a custom capture sample that demonstrates how to read PCM-formatted data from a local file or a microphone.

4. Push audio data to the SDK

After the audio stream is successfully published (the status in the onAudioPublishStateChanged callback changes to AliRtcStatsPublished), call the pushExternalAudioStreamRawData method, set the mExternalAudioStreamId parameter to the audio stream ID obtained in Step 2, and pass the collected audio data to the SDK. The collected audio data must be converted into an AliRtcAudioFrame object. The related configurations are as follows:

  • data: The audio data.

  • numSamples: The number of sample points per channel in the data provided.

  • bytesPerSample: Bytes per sample point (bit depth / 8). For example, this value is 2 for 16-bit audio.

  • numChannels: Number of audio channels.

  • samplesPerSec: The sample rate, in Hz (e.g., 16000 or 48000).

Note
  • You must start pushing data only after the audio stream is published (when the onAudioPublishStateChanged callback reports a state of AliRtcStatsPublished).

  • Set the numSamples parameter in the AliRtcAudioFrame object to the actual length of the captured data. Methods like AudioRecord.read may return less data than the buffer size, so you must use the method's return value to determine the actual data length.

  • The pushExternalAudioStreamRawData call can fail if the internal buffer is full. Your application must handle this error and implement a retry mechanism.

  • We recommend calling pushExternalAudioStreamRawData every 10 ms to send data.

Android

// Assume the captured audio data is in `audioData`, the size is `bytesRead` bytes, and it represents 10 ms of data.
if (mAliRtcEngine != null && bytesRead > 0) {
    // Construct an AliRtcAudioFrame object. `bitsPerSample` is the bit depth, which is typically 16.
    AliRtcEngine.AliRtcAudioFrame sample = new AliRtcEngine.AliRtcAudioFrame();
    sample.data = audioData;
    sample.numSamples = bytesRead / (channels * (bitsPerSample / 8)); // Calculate the number of samples based on the actual number of bytes read.
    sample.numChannels = channels;
    sample.samplesPerSec = sampleRate;
    sample.bytesPerSample = bitsPerSample / 8;

    int ret = 0;
    // Retry the push operation if it fails because the buffer is full.
    int retryCount = 0;
    final int MAX_RETRY_COUNT = 20;
    final int BUFFER_WAIT_MS = 10;
    do {
        // Push the captured data to the SDK.
        ret = mAliRtcEngine.pushExternalAudioStreamRawData(mExternalAudioStreamId, sample);
        if(ret == ErrorCodeEnum.ERR_SDK_AUDIO_INPUT_BUFFER_FULL) {
            // Handle the buffer full scenario. Wait for a short period and retry.
            retryCount++;
            if(mExternalAudioStreamId <= 0 || retryCount >= MAX_RETRY_COUNT) {
                // The stream has been stopped or the maximum retry count is reached. Exit the loop.
                break;
            }

            try {
                // Pause for a short interval.
                Thread.sleep(BUFFER_WAIT_MS);
            } catch (InterruptedException e) {
                e.printStackTrace();
                break;
            }
        } else {
            // Push succeeded or another error occurred. Exit the loop.
            break;
        }
    } while (retryCount < MAX_RETRY_COUNT);
}

iOS

// Construct an AliRtcAudioFrame object from the captured audio data.
let sample = AliRtcAudioFrame()
sample.dataPtr = UnsafeMutableRawPointer(mutating: pcmData)
sample.samplesPerSec = pcmSampleRate
sample.bytesPerSample = Int32(MemoryLayout<Int16>.size)
sample.numOfChannels = pcmChannels
sample.numOfSamples = numOfSamples

var retryCount = 0

while retryCount < 20 {
    if !(pcmInputThread?.isExecuting ?? false) {
        break
    }
    // Push the audio data to the SDK.
    let rc = rtcEngine?.pushExternalAudioStream(externalPublishStreamId, rawData: sample) ?? 0

    // Handle a full buffer.
    // 0x01070101 SDK_AUDIO_INPUT_BUFFER_FULL: The buffer is full. Retransmission is required.
    if rc == 0x01070101 && !(pcmInputThread?.isCancelled ?? true) {
        Thread.sleep(forTimeInterval: 0.03) // 30ms
        retryCount += 1;
    } else {
        if rc < 0 {
            "pushExternalAudioStream error, ret: \(rc)".printLog()
        }
        break
    }
}

Mac

while ( true ) {
    
    if (![pcmInputThread isExecuting]) {
        push_error = YES;
        break;
    }

    AliRtcAudioFrame *sample = [AliRtcAudioFrame new];
    sample.dataPtr = pcmData;
    sample.samplesPerSec = pcmSampleRate;
    sample.bytesPerSample = sizeof(int16_t);
    sample.numOfChannels = pcmChannels;
    sample.numOfSamples = numOfSamples;
    int rc = [self.engine pushExternalAudioStream:_externalPublishStreamId rawData:sample];
    count = count + 1;

    /* If the error is AliRtcErrAudioBufferFull, sleep for a moment and then continue pushing. */
    if ( rc == AliRtcErrAudioBufferFull && [pcmInputThread isCancelled ] == NO ) {
        [NSThread sleepForTimeInterval:0.04] ;
    }else {
        if ( rc < 0 ) {
            push_error = true ;
        }

        break ;
    }

}

Windows

Note

Before you implement custom capture on Windows, you must call the QueryInterface method to get the media engine object.

/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;   
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);

// Construct an audio frame from the data.
AliEngineAudioRawData rawData;
rawData.dataPtr = frameInfo.audio_data[0];
rawData.numOfSamples = (int) (frameInfo.audio_data[0].length / (2 * frameInfo.audio_channels));
rawData.bytesPerSample = 2;
rawData.numOfChannels = frameInfo.audio_channels;
rawData.samplesPerSec = frameInfo.audio_sample_rate;
// Push the data to the SDK.
int ret = mAliRtcMediaEngine->PushExternalAudioStreamRawData(audioStreamID, rawData);
// Handle buffer full and other errors.
if ( ret == AliEngineErrorAudioBufferFull ) {
    Sleep(40);
    continue ;
} 

// Release the media engine.
mAliRtcMediaEngine->Release();

5. Remove the external audio stream

To stop publishing audio from the custom source, call removeExternalAudioStream.

Android

mAliRtcEngine.removeExternalAudioStream(mExternalAudioStreamId);

iOS

[self.engine removeExternalAudioStream:_externalPublishStreamId];

Mac

[self.engine removeExternalAudioStream:_externalPublishStreamId];

Windows

/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
    
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);

mAliRtcMediaEngine->RemoveExternalAudioStream(audioStreamID);
mAliRtcMediaEngine->Release();

6. (Optional) Dynamically enable or disable internal capture

To dynamically enable or disable the SDK's internal capture during a call, use the setParameter method.

Android

/* Dynamically disable internal capture. */
String parameter = "{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}";
mAliRtcEngine.setParameter(parameter);

/* Dynamically enable internal capture. */
String parameter = "{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}"; 
mAliRtcEngine.setParameter(parameter);

iOS

// Dynamically disable internal capture.
engine.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}")
// Dynamically enable internal capture.
engine.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}")

Mac

// Dynamically disable internal capture.
[self setParameter:@"{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}"];
// Dynamically enable internal capture.
[self setParameter:@"{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}"];

Windows

/* Dynamically disable internal capture. */
mAliRtcEngine->SetParameter("{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}");

/* Dynamically enable internal capture. */
mAliRtcEngine->SetParameter("{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}");

FAQ

  • What is the recommended frequency for calling pushExternalAudioStreamRawData?

    • We recommend synchronizing the calls with the physical audio device's clock, calling the method each time the device provides a new data packet.

    • If no physical device clock is available, we recommend sending data every 10 to 50 ms.

  • Can I use the SDK's internal 3A audio processing (AEC, AGC, and ANS) with custom audio capture?

    • Yes. As described in Step 2, you can set the enable3A parameter when adding the external audio stream to enable or disable the SDK's internal 3A audio processing.