All Products
Search
Document Center

AgentBay:UI

Last Updated:Feb 03, 2026

Description

The User Interface (UI) class provides methods for interacting with UI elements in the AgentBay cloud environment. You can use these methods to retrieve UI elements, send key events, enter text, perform gestures, and take screenshots.

Method

Description

Environment

Linux Computer Use

Windows Computer Use

Browser Use

Mobile Use

Code Space

click

Clicks at specified coordinates on the screen. Supports left, middle, and right clicks.

Not supported

Not supported

Not supported

Supported

Not supported

input_text

Enters text.

Not supported

Not supported

Not supported

Supported

Not supported

send_key

Sends a key press. Supports Android-specific keys such as HOME, BACK, and volume keys.

Not supported

Not supported

Not supported

Supported

Not supported

swipe

Performs a swipe gesture. Set the start coordinates, end coordinates, and duration.

Not supported

Not supported

Not supported

Supported

Not supported

screenshot

Takes a screenshot of the session and saves it to a specified OSS URL.

Supported

Supported

Supported

Supported

Not supported

list_root_windows

Lists all root windows and their information, including window ID, title, process ID, and process name.

Not supported

Supported

Not supported

Not supported

Not supported

get_active_window

Gets information about the active window.

Not supported

Supported

Not supported

Not supported

Not supported

activate_window

Activates a specified window.

Not supported

Supported

Not supported

Not supported

Not supported

maximize_window

Maximizes a window.

Not supported

Supported

Not supported

Not supported

Not supported

minimize_window

Minimizes a window.

Not supported

Supported

Not supported

Not supported

Not supported

restore_window

Restores a window to its normal state.

Not supported

Supported

Not supported

Not supported

Not supported

close_window

Closes a window.

Not supported

Supported

Not supported

Not supported

Not supported

resize_window

Resizes a window. Specify the width and height.

Not supported

Supported

Not supported

Not supported

Not supported

fullscreen_window

Sets a window to full screen mode.

Not supported

Supported

Not supported

Not supported

Not supported

focus_mode

Enables or disables focus mode.

Not supported

Supported

Not supported

Not supported

Not supported

get_all_ui_elements

Gets all UI elements on the device, including non-interactive elements.

Not supported

Not supported

Not supported

Supported

Not supported

get_clickable_ui_elements

Gets all clickable UI elements.

Not supported

Not supported

Not supported

Supported

Not supported

Properties

Property name

Description

HOME

Home key (3)

BACK

Back key (4)

VOLUME_UP

Volume up key (24)

VOLUME_DOWN

Volume down key (25)

POWER

Power key (26)

MENU

Menu key (82)

Methods

getClickableUIElements - Gets clickable UI elements

Golang

func (ui *UI) GetClickableUIElements(timeoutMs int) (*UIElementsResult, error)

Parameter:

  • timeoutMs (int): The timeout period in milliseconds. If the value is <= 0, the default is 2000 ms.

Return values:

  • *UIElementsResult: A result object that contains the clickable UI elements and the request ID.

  • error: An error message if the operation fails.

UIElementsResult struct:

type UIElementsResult struct {
    RequestID string      // A unique request identifier for debugging.
    Elements  []*UIElement // An array of UI elements.
}

type UIElement struct {
    Bounds     string       // The bounds of the element.
    ClassName  string       // The CSS class name.
    Text       string       // The text content.
    Type       string       // The element type.
    ResourceId string       // The resource ID.
    Index      int          // The element index.
    IsParent   bool         // Indicates whether the element is a parent element.
    Children   []*UIElement // The child elements.
}

Python

def get_all_ui_elements(timeout_ms: int = 2000) -> List[Dict[str, Any]]

Parameter:

  • timeout_ms (int, optional): The timeout period in milliseconds. The default is 2000 ms.

Return value:

  • List[Dict[str, Any]]: A list of all UI elements, including parsing details.

Exception:

  • AgentBayError: Raised if the operation fails.

TypeScript

getClickableUIElements(timeoutMs?: number): Promise<string>

Parameter:

  • timeoutMs (number, optional): The timeout period in milliseconds. The default is 2000 ms.

Return value:

  • Promise<string>: A string representation of the clickable UI elements if the operation is successful.

Exception:

  • APIError: Thrown if the operation fails.

getAllUIElements - Gets all UI elements

Golang

func (ui *UI) GetAllUIElements(timeoutMs int) (*UIElementsResult, error)

Parameter:

  • timeoutMs (int): The timeout period in milliseconds. If the value is <= 0, the default value is 2000 ms.

Return values:

  • *UIElementsResult: A result object that contains all UI elements and the request ID.

  • error: An error message if the operation fails.

Python

def get_all_ui_elements(timeout_ms: int = 2000) -> List[Dict[str, Any]]

Parameter:

  • timeout_ms (int, optional): The timeout period in milliseconds. The default is 2000 ms.

Return value:

  • List[Dict[str, Any]]: A list of all UI elements, including parsing details.

Exception:

  • AgentBayError: Raised if the operation fails.

TypeScript

getAllUIElements(timeoutMs?: number): Promise<string>

Parameter:

  • timeoutMs (number, optional): The timeout period in milliseconds. The default is 2000 ms.

Return value:

  • Promise<string>: A string representation of all UI elements if the operation is successful.

Exception:

  • APIError: Thrown if the operation fails.

sendKey - Sends a key press

Golang

func (ui *UI) SendKey(key int) (*KeyActionResult, error)

Parameter:

  • key (int): The key code to send. Use the KeyCode constant.

Return values:

  • *KeyActionResult: A result object that contains the operation status and the request ID.

  • error: An error message if the operation fails.

KeyActionResult struct:

type KeyActionResult struct {
    RequestID string // A unique request identifier for debugging.
    Success   bool   // Indicates whether the key was sent successfully.
}

Python

def send_key(key: int) -> bool

Parameter:

  • key (int): The key code to send. Use the KeyCode constant.

Return value:

  • bool: Returns True if the operation is successful.

Exception:

  • AgentBayError: Raised if the operation fails.

Example:

# Send the Home keypress
success = agent_bay.ui.send_key(KeyCode.HOME)
print("Keypress send result:", success)

TypeScript

sendKey(key: number): Promise<string>

Parameter:

  • key (number): The key code to send. Use the KeyCode constant.

Return value:

  • Promise<string>: The response text if the operation is successful.

Exception:

  • APIError: Thrown if the operation fails.

Example:

// Send the Home keypress
const result = await agentBay.ui.sendKey(KeyCode.HOME);
console.log("Keypress send result:", result);

inputText - Enters text

Golang

func (ui *UI) InputText(text string) (*TextInputResult, error)

Parameter:

  • text (string): The text to enter.

Return values:

  • *TextInputResult: A result object that contains the entered text and the request ID.

  • error: An error message if the operation fails.

TextInputResult struct:

type TextInputResult struct {
    RequestID string // A unique request identifier for debugging.
    Text      string // The entered text.
}

Python

def input_text(text: str) -> None

Parameter:

  • text (str): The text to enter.

Exception:

  • AgentBayError: Raised if the operation fails.

Example:

# Enter the text "Hello"
agent_bay.ui.input_text("Hello")

TypeScript

inputText(text: string): Promise<string>

Parameter:

  • text (string): The text to enter.

Return value:

  • Promise<string>: The response text if the operation is successful.

Exception:

  • APIError: Thrown if the operation fails.

Example:

// Enter the text "Hello"
const result = await agentBay.ui.inputText("Hello");
console.log("Text input result:", result);

swipe - Performs a swipe

Golang

func (ui *UI) Swipe(startX, startY, endX, endY, durationMs int) (*SwipeResult, error)

Parameters:

  • startX (int): The starting X coordinate.

  • startY (int): The starting Y coordinate.

  • endX (int): The ending X coordinate.

  • endY (int): The ending Y coordinate.

  • durationMs (int): The duration of the swipe in milliseconds.

Return values:

  • *SwipeResult: A result object that contains the operation status and the request ID.

  • error: An error message if the operation fails.

SwipeResult struct:

type SwipeResult struct {
    RequestID string // A unique request identifier for debugging.
    Success   bool   // Indicates whether the swipe was successful.
}

Python

def swipe(start_x: int, start_y: int, end_x: int, end_y: int, duration_ms: int = 300) -> None

Parameters:

  • start_x (int): The starting X coordinate.

  • start_y (int): The starting Y coordinate.

  • end_x (int): The ending X coordinate.

  • end_y (int): The ending Y coordinate.

  • duration_ms (int, optional): The duration of the swipe in milliseconds. The default is 300 ms.

Exception:

  • AgentBayError: Raised if the operation fails.

Example:

# Swipe from (100, 200) to (300, 400)
agent_bay.ui.swipe(100, 200, 300, 400)

TypeScript

swipe(
  startX: number,
  startY: number,
  endX: number,
  endY: number,
  durationMs?: number
): Promise<string>

Parameters:

  • startX (number): The starting X coordinate.

  • startY (number): The starting Y coordinate.

  • endX (number): The ending X coordinate.

  • endY (number): The ending Y coordinate.

  • durationMs (number, optional): The duration of the swipe in milliseconds. The default is 300 ms.

Return value:

  • Promise<string>: The response text if the operation is successful.

Exception:

  • APIError: Thrown if the operation fails.

Example:

// Swipe from (100, 200) to (300, 400)
const result = await agentBay.ui.swipe(100, 200, 300, 400);
console.log("Swipe operation result:", result);

click - Clicks a coordinate point

Golang

func (ui *UI) Click(x, y int, button string) (*UIResult, error)

Parameters:

  • x (int): The X coordinate.

  • y (int): The Y coordinate.

  • button (string): The mouse button to use. If this parameter is empty, the default is left.

Return values:

  • *UIResult: A result object that contains the operation status, component ID, and request ID.

  • error: An error message if the operation fails.

UIResult struct:

type UIResult struct {
    RequestID  string // A unique request identifier for debugging.
    ComponentID string // The component ID, if applicable.
    Success    bool   // Indicates whether the operation was successful.
}

Python

def click(x: int, y: int, button: str = "left") -> None

Parameters:

  • x (int): The X coordinate.

  • y (int): The Y coordinate.

  • button (str, optional): The mouse button to use. The default is left.

Exception:

  • AgentBayError: Raised if the operation fails.

Example:

# Left-click the coordinates (200, 300)
agent_bay.ui.click(200, 300)

TypeScript

click(x: number, y: number, button?: string): Promise<string>

Parameters:

  • x (number): The X coordinate.

  • y (number): The Y coordinate.

  • button (string, optional): The mouse button to use. The default is left.

Return value:

  • Promise<string>: The response text if the operation is successful.

Exception:

  • APIError: Thrown if the operation fails.

Example:

// Left-click the coordinates (500, 800)
const result = await agentBay.ui.click(500, 800);
console.log("Click operation result:", result);

screenshot - Takes a screenshot

Golang

func (ui *UI) Screenshot() (*UIResult, error)

Return values:

  • *UIResult: A result object that contains the operation status and the request ID.

  • error: An error message if the operation fails.

Python

def screenshot() -> str

Return value:

  • str: The screenshot data.

Exception:

  • AgentBayError: Raised if the operation fails.

Example:

# Take a screenshot
image_data = agent_bay.ui.screenshot()
print("Screenshot data:", image_data[:100])  # Print the first 100 characters

TypeScript

screenshot(): Promise<string>

Return value:

  • Promise<string>: The screenshot data if the operation is successful.

Exception:

  • APIError: Thrown if the operation fails.

Example:

// Take a screenshot
const imageData = await agentBay.ui.screenshot();
console.log("Screenshot data:", imageData);