The Alibaba Real-Time Communication (ARTC) SDK provides a flexible custom audio capture feature.
Feature introduction
While the ARTC SDK's internal audio module can meet the basic audio requirements for most applications, it may not be sufficient for specific scenarios. In such cases, you may need to implement custom audio capture. Common use cases include:
Resolve issues where the default audio capture device is occupied by another process.
Need to capture audio data from a custom source, such as a proprietary capture system or an audio file, and then pass it to the SDK for transmission.
The ARTC SDK supports flexible custom capture, allowing you to manage audio devices and sources as needed.
Sample code
Android: Android/ARTCExample/AdvancedUsage/src/main/java/com/aliyun/artc/api/advancedusage/CustomAudioCaptureAndRender/CustomAudioCaptureActivity.java
iOS: iOS/ARTCExample/AdvancedUsage/CustomAudioCapture/CustomAudioCaptureVC.swift
Prerequisites
Make sure you meet the following requirements:
Create an ARTC application and obtain the AppID and AppKey from the ApsaraVideo Live console.
Download and integrate ARTC SDK into your project and Implement audio/video communications.
Implementation
1. Enable or disable internal capture
To use the custom audio capture feature, you typically need to disable the SDK's internal audio capture module. To do this, you can pass the extras parameter when you call getInstance to create an engine. The relevant parameter is:
user_specified_use_external_audio_record: Specifies whether to use external audio capture (which disables the SDK's internal capture).
TRUE: Use external audio capture (disables internal SDK capture).FALSE: Do not use external audio capture (enables internal SDK capture).
The extras parameter is a JSON string.
Android
String extras = "{\"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.getInstance(this, extras);iOS
// Create and initialize the engine.
var customAudioCaptureConfig: [String: String] = [:]
// Use external capture.
customAudioCaptureConfig["user_specified_use_external_audio_record"] = "TRUE"
// Serialize to JSON.
guard let jsonData = try? JSONSerialization.data(withJSONObject: customAudioCaptureConfig, options: []),
let extras = String(data: jsonData, encoding: .utf8) else {
print("JSON serialization failed")
return
}
let engine = AliRtcEngine.sharedInstance(self, extras:extras)Windows
/* Windows supports enabling or disabling audio capture during creation. */
/* Disable internal capture module. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"TRUE\", \"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);
/* Enable internal capture module. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"FALSE\", \"user_specified_use_external_audio_record\":\"FALSE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);2. Add an external audio stream
Call the addExternalAudioStream method to add an external audio stream and obtain its stream ID. If you need the SDK's 3A audio processing features (AEC, AGC, ANS), set the enable3A parameter in the AliRtcExternalAudioStreamConfig object.
When to call this API:
If you need 3A processing, call
addExternalAudioStreamafter the audio stream has been published and the custom capture module has obtained its first audio frame. Specifically, call it after theonAudioPublishStateChangedcallback returns anewStateofAliRtcStatsPublished (3).If you do not need 3A processing (such as when streaming audio from a local file, network stream, or TTS-generated data), you can call this method right after creating the engine. You can then start pushing audio data after the audio stream is published.
Android
AliRtcEngine.AliRtcExternalAudioStreamConfig config = new AliRtcEngine.AliRtcExternalAudioStreamConfig();
config.sampleRate = SAMPLE_RATE; // Sample rate
config.channels = CHANNEL; // Number of channels
// Publishing volume
config.publishVolume = 100;
// Local playback volume
config.playoutVolume = isLocalPlayout ? 100 : 0;
config.enable3A = true;
int result = mAliRtcEngine.addExternalAudioStream(config);
if (result <= 0) {
return;
}
// The return value is the stream ID. You will need it to push data to the SDK later.
mExternalAudioStreamId = result;iOS
/* Set the parameters as needed for your business. */
AliRtcExternalAudioStreamConfig *config = [AliRtcExternalAudioStreamConfig new];
// This must be the same as the number of channels of the external PCM audio stream. Set it to 1 for mono or 2 for stereo.
config.channels = _pcmChannels;
// This must be the same as the sample rate of the external PCM audio stream.
config.sampleRate = _pcmSampleRate;
config.playoutVolume = 0;
config.publishVolume = 100;
_externalPlayoutStreamId = [self.engine addExternalAudioStream:config];Windows
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
/* Set the parameters as needed for your business. */
AliEngineExternalAudioStreamConfig config;
config.playoutVolume = currentAudioPlayoutVolume;
config.publishVolume = currentAudioPublishVolume;
config.channels = 1;
config.sampleRate = 48000;
config.publishStream = 0;
audioStreamID = mAliRtcMediaEngine->AddExternalAudioStream(config);
mAliRtcMediaEngine->Release();3. Implement a custom audio capture module
You must implement your own logic to capture and process audio data based on your business scenario, and then pass the data to the SDK for transmission.
Alibaba Cloud provides sample code that demonstrates how to read PCM data from a local file or a microphone. For implementation details, see the Custom capture sample.
4. Push audio data to the SDK based on stream ID
After the audio stream is published (when the onAudioPublishStateChanged callback reports a status of AliRtcStatsPublished), call the pushExternalAudioStreamRawData method. Set the mExternalAudioStreamId parameter to the stream ID obtained in Step 2, and pass the captured audio data to the SDK. The captured audio data must be converted to an AliRtcAudioFrame object with the following properties:
data: The audio data.numSamples: The number of sample points per channel in the provided data.bytesPerSample: The number of bytes per sample point, which is related to the bit depth. For 16-bit PCM, this value is 2.numChannels: The number of audio channels.samplesPerSec: The sample rate, such as 16000 or 48000.
You must start pushing data only after the audio stream has been published (after the
onAudioPublishStateChangedcallback reports a status ofAliRtcStatsPublished).Set the
numSamplesproperty of theAliRtcAudioFrameaccording to the actual length of the captured data. On some devices, for example, the length of audio data obtained viaAudioRecord.readmay be smaller than the buffer size. You must use the return value to determine the actual data length.When calling
pushExternalAudioStreamRawData, the operation may fail if the internal buffer is full. You must handle this exception and retry sending the data.We recommend sending data every 10 ms.
Android
// Assume that the captured audio data is in audioData, the data size is bytesRead bytes, and it is 10 ms of data.
if (mAliRtcEngine != null && bytesRead > 0) {
// Construct an AliRtcAudioFrame object. bitsPerSample is the bit depth, which is usually 16.
AliRtcEngine.AliRtcAudioFrame sample = new AliRtcEngine.AliRtcAudioFrame();
sample.data = audioData;
sample.numSamples = bytesRead / (channels * (bitsPerSample / 8)); // Calculate the number of samples based on the actual number of bytes read.
sample.numChannels = channels;
sample.samplesPerSec = sampleRate;
sample.bytesPerSample = bitsPerSample / 8;
int ret = 0;
// Retry when the push fails because the buffer is full.
int retryCount = 0;
final int MAX_RETRY_COUNT = 20;
final int BUFFER_WAIT_MS = 10;
do {
// Push the obtained data to the SDK.
ret = mAliRtcEngine.pushExternalAudioStreamRawData(mExternalAudioStreamId, sample);
if(ret == ErrorCodeEnum.ERR_SDK_AUDIO_INPUT_BUFFER_FULL) {
// Handle the buffer full case. Wait for a period and retry. The maximum retry duration is several hundred ms.
retryCount++;
if(mExternalAudioStreamId <= 0 || retryCount >= MAX_RETRY_COUNT) {
// The stream has stopped or the maximum number of retries has been reached. Exit the loop.
break;
}
try {
// Pause for a period.
Thread.sleep(BUFFER_WAIT_MS);
} catch (InterruptedException e) {
e.printStackTrace();
break;
}
} else {
// The push is successful or another error occurred. Exit the loop directly.
break;
}
} while (retryCount < MAX_RETRY_COUNT);
}iOS
// Construct an AliRtcAudioFrame object from the captured audio data.
let sample = AliRtcAudioFrame()
sample.dataPtr = UnsafeMutableRawPointer(mutating: pcmData)
sample.samplesPerSec = pcmSampleRate
sample.bytesPerSample = Int32(MemoryLayout<Int16>.size)
sample.numOfChannels = pcmChannels
sample.numOfSamples = numOfSamples
var retryCount = 0
while retryCount < 20 {
if !(pcmInputThread?.isExecuting ?? false) {
break
}
// Push the audio data to the SDK.
let rc = rtcEngine?.pushExternalAudioStream(externalPublishStreamId, rawData: sample) ?? 0
// Handle buffer full.
// 0x01070101 SDK_AUDIO_INPUT_BUFFER_FULL: The buffer is full. Retransmission is required.
if rc == 0x01070101 && !(pcmInputThread?.isCancelled ?? true) {
Thread.sleep(forTimeInterval: 0.03) // 30ms
retryCount += 1;
} else {
if rc < 0 {
"pushExternalAudioStream error, ret: \(rc)".printLog()
}
break
}
}Windows
Before you implement custom capture on Windows, you must call the QueryInterface method to get the media engine object.
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
// Construct an audio frame from the data.
AliEngineAudioRawData rawData;
rawData.dataPtr = frameInfo.audio_data[0];
rawData.numOfSamples = (int) (frameInfo.audio_data[0].length / (2 * frameInfo.audio_channels));
rawData.bytesPerSample = 2;
rawData.numOfChannels = frameInfo.audio_channels;
rawData.samplesPerSec = frameInfo.audio_sample_rate;
// Push the data to the SDK.
int ret = mAliRtcMediaEngine->PushExternalAudioStreamRawData(audioStreamID, rawData);
// Handle buffer full and other errors.
// Release the media engine.
mAliRtcMediaEngine->Release();5. Remove the external audio stream
To stop publishing audio from a custom audio source, call the removeExternalAudioStream method to remove the external audio stream.
Android
mAliRtcEngine.removeExternalAudioStream(mExternalAudioStreamId);iOS
[self.engine removeExternalAudioStream:_externalPublishStreamId];Windows
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
mAliRtcMediaEngine->RemoveExternalAudioStream(audioStreamID);
mAliRtcMediaEngine->Release();6. (Optional) Dynamically enable or disable internal capture
If your business scenario requires dynamically enabling or disabling the SDK's internal capture during a call, call the setParameter method.
Android
/* Dynamically disable internal capture module. */
String parameter = "{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}";
mAliRtcEngine.setParameter(parameter);
/* Dynamically enable internal capture module. */
String parameter = "{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}";
mAliRtcEngine.setParameter(parameter);iOS
// Dynamically disable internal capture module.
engine.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}")
// Dynamically enable internal capture module.
engine.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}")Windows
/* Windows supports enabling or disabling audio capture during creation. */
/* Disable internal internal capture module. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"TRUE\", \"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);
/* Enable internal capture module. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"FALSE\", \"user_specified_use_external_audio_record\":\"FALSE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);FAQ
What is the recommended frequency for calling
pushExternalAudioStream?Call it based on the clock of the physical audio device each time the device captures data.
If there is no physical device to drive the timing, we recommend sending data every 10 to 50 ms.
Can I use the SDK's internal 3A audio processing (AEC, AGC, ANS) with custom audio capture?
Yes. As described in Step 2, you can set the
enable3Aparameter when adding the external audio stream to enable or disable the SDK's internal 3A processing.