This topic describes how to use AICallKit SDK to perform frame capture for inspection.
Before you begin
The following examples show how to capture frames using the API.
You must integrate AICallKit SDK in advance. For more information, see Integrate AICallKit SDK for Android, Integrate AICallKit SDK for iOS, and Integrate AICallKit SDK for Web.
AICallKit SDK V2.1.0 or later supports frame capture.
Feature description
During a call with the visual understanding agent, call the frame capture API to capture images for inspection. AICallKit SDK provides two modes to automatically capture images from users' cameras and push them to the large language model (LLM) for analysis and processing. This feature is suitable for scenarios such as industrial inspection and AI glass applications.
How it works
To use AICallKit SDK to implement frame capture, call startVisionCustomCapture. AICallKit SDK provides two modes to meet your inspection requirements:
One-time frame capture: triggered when an event occurs. For example, when a user clicks a button, images from the camera are pushed to the LLM for processing.
Parameter
Type
Description
isSingle
Boolean
The frame capture mode.
true: one-time frame capture
false (default): regular frame capture
text
String
The text parameter when requesting a multimodal large model.
eachDuration
Int
The frame capture duration. Unit: seconds.
num
Int
The number of images to capture.
userData
String
The custom business information, which is passed along with the text and frames to the LLM for processing.
Example: If the frame capture duration is set to 1 second and number of images to capture is set to 2, the system begins timing at the start of the call and processes the video data within that 1 second. During this period, it accurately captures 2 frames according to the principle of uniform distribution and then sends them to the LLM for detection and analysis.
Regular frame capture: During a specified time range, the system feeds images from the user's camera to the LLM for processing at regular intervals.
Parameter
Type
Description
isSingle
Boolean
The frame capture mode.
true: one-time frame capture
false (default): regular frame capture
text
String
The text parameter when requesting a multimodal large model.
duration
Int
The frame capture duration. Unit: seconds.
eachDuration
Int
The frame capture interval. Unit: seconds.
num
Int
The number of images to capture each time.
userData
String
The custom business information, which is passed along with the text and frames to the LLM for processing.
enableASR
Boolean
Specifies whether to send the ASR-recognized human speech to the LLM as the input.
true
false (default)
NoteThis parameter is available only in AICallKit SDK V2.2.0 or later.
Example: If the frame capture duration is set to 100 seconds, with a frame capture interval of 2 seconds and 2 images to capture each time, the system automatically captures images every 2 seconds during the 100-second period starting from when the API is called. In each cycle, the system evenly captures 2 frames and sends them to the LLM for detection and analysis. During this process, the frame capture mode and userData remain unchanged to ensure data consistency.
NoteIn regular frame capture mode, user voice is not processed by default.
Sample code
Android
// Call startVisionCustomCapture after the call is initiated.
// Start custom frame capture in the onCallBegin callback.
// For frame capture parameters, see the description about ARTCAICallVisionCustomCaptureRequest.
public void onCallBegin() {
// Request parameters, including the text parameter, frame capture mode, frame capture interval, number of images to capture each time, frame capture duration, and custom business information.
mARTCAICallEngine.startVisionCustomCapture(new ARTCAICallEngine.ARTCAICallVisionCustomCaptureRequest("XXX", false, 5, 2, 100, ""))
}
// End frame capture if needed.
mARTCAICallEngine.stopVisionCustomCapture()
iOS
// Call startVisionCustomCapture after the call is initiated.
// Start custom frame capture in the onCallBegin callback.
// For frame capture parameters, see the description about ARTCAICallVisionCustomCaptureRequest.
public func onCallBegin() {
// The call starts.
let req = ARTCAICallVisionCustomCaptureRequest()
req.isSingle = false
req.text = "xxx"
req.userData = "{}"
req.duration = 100
req.eachDuration = 5
req.num = 2
_ = self.engine.startVisionCustomCapture(req: req)
}
// End frame capture if needed.
_ = self.engine.stopVisionCustomCapture()
Web
// Call startVisionCustomCapture after the call is initiated.
// For frame capture parameters, see the description about AICallVisionCustomCaptureRequest.
engine.startVisionCustomCapture({
isSingle: false,
text: 'xxx',
userData: '{}',
duration: 100,
eachDuration: 5,
num: 2,
});
// End frame capture if needed.
engine.stopVisionCustomCapture();