All Products
Search
Document Center

Intelligent Media Services:Frame capture

Last Updated:May 15, 2025

This topic describes how to use AICallKit SDK to perform frame capture for inspection.

Before you begin

Feature description

During a call with the visual understanding agent, call the frame capture API to capture images for inspection. AICallKit SDK provides two modes to automatically capture images from users' cameras and push them to the large language model (LLM) for analysis and processing. This feature is suitable for scenarios such as industrial inspection and AI glass applications.

How it works

To use AICallKit SDK to implement frame capture, call startVisionCustomCapture. AICallKit SDK provides two modes to meet your inspection requirements:

  • One-time frame capture: triggered when an event occurs. For example, when a user clicks a button, images from the camera are pushed to the LLM for processing.

    Parameter

    Type

    Description

    isSingle

    Boolean

    The frame capture mode.

    • true: one-time frame capture

    • false (default): regular frame capture

    text

    String

    The text parameter when requesting a multimodal large model.

    eachDuration

    Int

    The frame capture duration. Unit: seconds.

    num

    Int

    The number of images to capture.

    userData

    String

    The custom business information, which is passed along with the text and frames to the LLM for processing.

    Example: If the frame capture duration is set to 1 second and number of images to capture is set to 2, the system begins timing at the start of the call and processes the video data within that 1 second. During this period, it accurately captures 2 frames according to the principle of uniform distribution and then sends them to the LLM for detection and analysis.

  • Regular frame capture: During a specified time range, the system feeds images from the user's camera to the LLM for processing at regular intervals.

    Parameter

    Type

    Description

    isSingle

    Boolean

    The frame capture mode.

    • true: one-time frame capture

    • false (default): regular frame capture

    text

    String

    The text parameter when requesting a multimodal large model.

    duration

    Int

    The frame capture duration. Unit: seconds.

    eachDuration

    Int

    The frame capture interval. Unit: seconds.

    num

    Int

    The number of images to capture each time.

    userData

    String

    The custom business information, which is passed along with the text and frames to the LLM for processing.

    enableASR

    Boolean

    Specifies whether to send the ASR-recognized human speech to the LLM as the input.

    • true

    • false (default)

    Note

    This parameter is available only in AICallKit SDK V2.2.0 or later.

    Example: If the frame capture duration is set to 100 seconds, with a frame capture interval of 2 seconds and 2 images to capture each time, the system automatically captures images every 2 seconds during the 100-second period starting from when the API is called. In each cycle, the system evenly captures 2 frames and sends them to the LLM for detection and analysis. During this process, the frame capture mode and userData remain unchanged to ensure data consistency.

    Note

    In regular frame capture mode, user voice is not processed by default.

Sample code

Android

// Call startVisionCustomCapture after the call is initiated.
// Start custom frame capture in the onCallBegin callback.
// For frame capture parameters, see the description about ARTCAICallVisionCustomCaptureRequest.
 public void onCallBegin() {
    // Request parameters, including the text parameter, frame capture mode, frame capture interval, number of images to capture each time, frame capture duration, and custom business information.
    mARTCAICallEngine.startVisionCustomCapture(new ARTCAICallEngine.ARTCAICallVisionCustomCaptureRequest("XXX", false, 5, 2, 100, ""))
 }

// End frame capture if needed.
mARTCAICallEngine.stopVisionCustomCapture()

iOS

// Call startVisionCustomCapture after the call is initiated.
// Start custom frame capture in the onCallBegin callback.
// For frame capture parameters, see the description about ARTCAICallVisionCustomCaptureRequest.
public func onCallBegin() {
    // The call starts.
  let req = ARTCAICallVisionCustomCaptureRequest()
  req.isSingle = false
  req.text = "xxx"
  req.userData = "{}"
  req.duration = 100
  req.eachDuration = 5
  req.num = 2
  _ = self.engine.startVisionCustomCapture(req: req)
}

// End frame capture if needed.
_ = self.engine.stopVisionCustomCapture()

Web

// Call startVisionCustomCapture after the call is initiated.
// For frame capture parameters, see the description about AICallVisionCustomCaptureRequest.
engine.startVisionCustomCapture({
  isSingle: false,
  text: 'xxx',
  userData: '{}',
  duration: 100,
  eachDuration: 5,
  num: 2,
});

// End frame capture if needed.
engine.stopVisionCustomCapture();