The ARTC SDK provides a flexible custom audio capture feature.
Overview
Although the ARTC SDK's built-in audio module is sufficient for most applications, some use cases require custom audio capture. For example:
When an audio capture device is occupied by another process.
To capture audio from a custom source, like a proprietary system or an audio file, and send it to the SDK.
The ARTC SDK's flexible custom audio capture feature allows you to manage your own audio devices and sources.
Sample code
Android: Android/ARTCExample/AdvancedUsage/src/main/java/com/aliyun/artc/api/advancedusage/CustomAudioCaptureAndRender/CustomAudioCaptureActivity.java.
iOS: iOS/ARTCExample/AdvancedUsage/CustomAudioCapture/CustomAudioCaptureVC.swift.
Prerequisites
Before you begin, ensure you have completed the following:
Created an Alibaba Real-Time Communication (ARTC) application and obtained an App ID and App Key from the ApsaraVideo Live console. For instructions, see Create an application.
Integrated the ARTC SDK into your project and implemented basic real-time audio and video calling. For instructions, see Download and integrate the ARTC SDK and Implement an audio/video call.
Implementation
1. Enable or disable internal capture
To use custom audio capture, you must first disable the SDK's internal capture module. We recommend doing this by passing the extras parameter when calling getInstance to create the engine. Use the following parameter:
user_specified_use_external_audio_record: Disables the SDK's internal capture to enable custom audio capture.
"TRUE": Use custom audio capture (disables internal capture)."FALSE": Do not use custom audio capture (enables internal capture).
The extras parameter is a JSON string.
Android
String extras = "{\"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.getInstance(this, extras);iOS
// Create and initialize the engine.
var customAudioCaptureConfig: [String: String] = [:]
// Use custom audio capture.
customAudioCaptureConfig["user_specified_use_external_audio_record"] = "TRUE"
// Serialize to JSON.
guard let jsonData = try? JSONSerialization.data(withJSONObject: customAudioCaptureConfig, options: []),
let extras = String(data: jsonData, encoding: .utf8) else {
print("JSON serialization failed")
return
}
let engine = AliRtcEngine.sharedInstance(self, extras:extras)Mac
NSString * extras = @"{\"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = [AliRtcEngine sharedInstance:self extras:extras];Windows
/* Windows supports enabling or disabling audio capture during engine creation. */
/* Disable internal capture. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"TRUE\", \"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);
/* Enable internal capture. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"FALSE\", \"user_specified_use_external_audio_record\":\"FALSE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);2. Add an external audio stream
Call the addExternalAudioStream method to add an external audio stream and get its stream ID. If you require 3A audio processing, which includes acoustic echo cancellation (AEC), automatic gain control (AGC), and noise suppression (ANS), set the enable3A parameter in the AliRtcExternalAudioStreamConfig object.
When to call this method:
If you need to use 3A, we recommend that you call
addExternalAudioStreamafter the audio stream is successfully published and the custom capture module obtains the first audio frame. That is, you should call the method after theonAudioPublishStateChangedinterface returnsnewStateasAliRtcStatsPublished (3).If you do not need 3A audio processing (for example, when streaming audio from a local file, network source, or TTS-generated data): You can call this method immediately after creating the engine. Then, start pushing audio data once the stream is published.
Android
AliRtcEngine.AliRtcExternalAudioStreamConfig config = new AliRtcEngine.AliRtcExternalAudioStreamConfig();
config.sampleRate = SAMPLE_RATE; // Sample rate
config.channels = CHANNEL; // Number of channels
// Publish volume
config.publishVolume = 100;
// Local playout volume
config.playoutVolume = isLocalPlayout ? 100 : 0;
config.enable3A = true;
int result = mAliRtcEngine.addExternalAudioStream(config);
if (result <= 0) {
return;
}
// The return value is the stream ID. You need it to push data to the SDK.
mExternalAudioStreamId = result;iOS
/* Set parameters based on your application's needs. */
AliRtcExternalAudioStreamConfig *config = [AliRtcExternalAudioStreamConfig new];
// This must match the number of channels of the external PCM audio stream. Set to 1 for mono or 2 for stereo.
config.channels = _pcmChannels;
// This must match the sample rate of the external PCM audio stream.
config.sampleRate = _pcmSampleRate;
config.playoutVolume = 0;
config.publishVolume = 100;
_externalPlayoutStreamId = [self.engine addExternalAudioStream:config];Mac
/* Set parameters based on your application's needs. */
AliRtcExternalAudioStreamConfig *config = [AliRtcExternalAudioStreamConfig new];
config.channels = pcmChannels;
/** Sample rate. Default: 48000. Supported values: 8000, 12000, 16000, 24000, 32000, 44100, 48000, 64000, 88200, 96000, 176400, 192000. */
config.sampleRate = pcmSampleRate;
config.playoutVolume = 0;
config.publishVolume = 100;
int ret = [self.engine addExternalAudioStream:config];Windows
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
/* Set parameters based on your application's needs. */
AliEngineExternalAudioStreamConfig config;
config.playoutVolume = currentAudioPlayoutVolume;
config.publishVolume = currentAudioPublishVolume;
config.channels = 1;
config.sampleRate = 48000;
config.publishStream = 0;
audioStreamID = mAliRtcMediaEngine->AddExternalAudioStream(config);
mAliRtcMediaEngine->Release();3. Implement a custom audio capture module
With custom audio capture, you are responsible for implementing the logic to capture and process audio data, and then sending that data to the SDK.
Alibaba Cloud provides a custom capture sample that demonstrates how to read PCM-formatted data from a local file or a microphone.
4. Push audio data to the SDK
After the audio stream is successfully published (the status in the onAudioPublishStateChanged callback changes to AliRtcStatsPublished), call the pushExternalAudioStreamRawData method, set the mExternalAudioStreamId parameter to the audio stream ID obtained in Step 2, and pass the collected audio data to the SDK. The collected audio data must be converted into an AliRtcAudioFrame object. The related configurations are as follows:
data: The audio data.
numSamples: The number of sample points per channel in the data provided.
bytesPerSample: Bytes per sample point (bit depth / 8). For example, this value is 2 for 16-bit audio.
numChannels: Number of audio channels.
samplesPerSec: The sample rate, in Hz (e.g., 16000 or 48000).
You must start pushing data only after the audio stream is published (when the
onAudioPublishStateChangedcallback reports a state ofAliRtcStatsPublished).Set the
numSamplesparameter in theAliRtcAudioFrameobject to the actual length of the captured data. Methods likeAudioRecord.readmay return less data than the buffer size, so you must use the method's return value to determine the actual data length.The
pushExternalAudioStreamRawDatacall can fail if the internal buffer is full. Your application must handle this error and implement a retry mechanism.We recommend calling
pushExternalAudioStreamRawDataevery 10 ms to send data.
Android
// Assume the captured audio data is in `audioData`, the size is `bytesRead` bytes, and it represents 10 ms of data.
if (mAliRtcEngine != null && bytesRead > 0) {
// Construct an AliRtcAudioFrame object. `bitsPerSample` is the bit depth, which is typically 16.
AliRtcEngine.AliRtcAudioFrame sample = new AliRtcEngine.AliRtcAudioFrame();
sample.data = audioData;
sample.numSamples = bytesRead / (channels * (bitsPerSample / 8)); // Calculate the number of samples based on the actual number of bytes read.
sample.numChannels = channels;
sample.samplesPerSec = sampleRate;
sample.bytesPerSample = bitsPerSample / 8;
int ret = 0;
// Retry the push operation if it fails because the buffer is full.
int retryCount = 0;
final int MAX_RETRY_COUNT = 20;
final int BUFFER_WAIT_MS = 10;
do {
// Push the captured data to the SDK.
ret = mAliRtcEngine.pushExternalAudioStreamRawData(mExternalAudioStreamId, sample);
if(ret == ErrorCodeEnum.ERR_SDK_AUDIO_INPUT_BUFFER_FULL) {
// Handle the buffer full scenario. Wait for a short period and retry.
retryCount++;
if(mExternalAudioStreamId <= 0 || retryCount >= MAX_RETRY_COUNT) {
// The stream has been stopped or the maximum retry count is reached. Exit the loop.
break;
}
try {
// Pause for a short interval.
Thread.sleep(BUFFER_WAIT_MS);
} catch (InterruptedException e) {
e.printStackTrace();
break;
}
} else {
// Push succeeded or another error occurred. Exit the loop.
break;
}
} while (retryCount < MAX_RETRY_COUNT);
}iOS
// Construct an AliRtcAudioFrame object from the captured audio data.
let sample = AliRtcAudioFrame()
sample.dataPtr = UnsafeMutableRawPointer(mutating: pcmData)
sample.samplesPerSec = pcmSampleRate
sample.bytesPerSample = Int32(MemoryLayout<Int16>.size)
sample.numOfChannels = pcmChannels
sample.numOfSamples = numOfSamples
var retryCount = 0
while retryCount < 20 {
if !(pcmInputThread?.isExecuting ?? false) {
break
}
// Push the audio data to the SDK.
let rc = rtcEngine?.pushExternalAudioStream(externalPublishStreamId, rawData: sample) ?? 0
// Handle a full buffer.
// 0x01070101 SDK_AUDIO_INPUT_BUFFER_FULL: The buffer is full. Retransmission is required.
if rc == 0x01070101 && !(pcmInputThread?.isCancelled ?? true) {
Thread.sleep(forTimeInterval: 0.03) // 30ms
retryCount += 1;
} else {
if rc < 0 {
"pushExternalAudioStream error, ret: \(rc)".printLog()
}
break
}
}Mac
while ( true ) {
if (![pcmInputThread isExecuting]) {
push_error = YES;
break;
}
AliRtcAudioFrame *sample = [AliRtcAudioFrame new];
sample.dataPtr = pcmData;
sample.samplesPerSec = pcmSampleRate;
sample.bytesPerSample = sizeof(int16_t);
sample.numOfChannels = pcmChannels;
sample.numOfSamples = numOfSamples;
int rc = [self.engine pushExternalAudioStream:_externalPublishStreamId rawData:sample];
count = count + 1;
/* If the error is AliRtcErrAudioBufferFull, sleep for a moment and then continue pushing. */
if ( rc == AliRtcErrAudioBufferFull && [pcmInputThread isCancelled ] == NO ) {
[NSThread sleepForTimeInterval:0.04] ;
}else {
if ( rc < 0 ) {
push_error = true ;
}
break ;
}
}
Windows
Before you implement custom capture on Windows, you must call the QueryInterface method to get the media engine object.
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
// Construct an audio frame from the data.
AliEngineAudioRawData rawData;
rawData.dataPtr = frameInfo.audio_data[0];
rawData.numOfSamples = (int) (frameInfo.audio_data[0].length / (2 * frameInfo.audio_channels));
rawData.bytesPerSample = 2;
rawData.numOfChannels = frameInfo.audio_channels;
rawData.samplesPerSec = frameInfo.audio_sample_rate;
// Push the data to the SDK.
int ret = mAliRtcMediaEngine->PushExternalAudioStreamRawData(audioStreamID, rawData);
// Handle buffer full and other errors.
if ( ret == AliEngineErrorAudioBufferFull ) {
Sleep(40);
continue ;
}
// Release the media engine.
mAliRtcMediaEngine->Release();5. Remove the external audio stream
To stop publishing audio from the custom source, call removeExternalAudioStream.
Android
mAliRtcEngine.removeExternalAudioStream(mExternalAudioStreamId);iOS
[self.engine removeExternalAudioStream:_externalPublishStreamId];Mac
[self.engine removeExternalAudioStream:_externalPublishStreamId];Windows
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
mAliRtcMediaEngine->RemoveExternalAudioStream(audioStreamID);
mAliRtcMediaEngine->Release();6. (Optional) Dynamically enable or disable internal capture
To dynamically enable or disable the SDK's internal capture during a call, use the setParameter method.
Android
/* Dynamically disable internal capture. */
String parameter = "{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}";
mAliRtcEngine.setParameter(parameter);
/* Dynamically enable internal capture. */
String parameter = "{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}";
mAliRtcEngine.setParameter(parameter);iOS
// Dynamically disable internal capture.
engine.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}")
// Dynamically enable internal capture.
engine.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}")Mac
// Dynamically disable internal capture.
[self setParameter:@"{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}"];
// Dynamically enable internal capture.
[self setParameter:@"{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}"];Windows
/* Dynamically disable internal capture. */
mAliRtcEngine->SetParameter("{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}");
/* Dynamically enable internal capture. */
mAliRtcEngine->SetParameter("{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}");FAQ
What is the recommended frequency for calling
pushExternalAudioStreamRawData?We recommend synchronizing the calls with the physical audio device's clock, calling the method each time the device provides a new data packet.
If no physical device clock is available, we recommend sending data every 10 to 50 ms.
Can I use the SDK's internal 3A audio processing (AEC, AGC, and ANS) with custom audio capture?
Yes. As described in Step 2, you can set the enable3A parameter when adding the external audio stream to enable or disable the SDK's internal 3A audio processing.