This topic describes how to use the AgentBay SDK to automate mouse, keyboard, and screen operations in cloud computer environments.
Overview
The Computer Use module in the AgentBay SDK provides UI automation for cloud computers. It supports three categories of operations:
Mouse operations - Click, move, drag, and scroll with precise coordinate control.
Keyboard operations - Type text and send key combinations (keyboard shortcuts).
Screen operations - Capture screenshots and retrieve screen dimensions.
Method summary
Category | Method | Description |
Mouse | Click at specified coordinates. | |
Mouse | Move cursor without clicking. | |
Mouse | Drag from one point to another. | |
Mouse | Scroll the mouse wheel at a location. | |
Mouse | Get current cursor coordinates. | |
Keyboard | Type a string at the cursor position. | |
Keyboard | Send key combinations or shortcuts. | |
Keyboard | Release previously held keys. | |
Screen | Capture the screen and get a download URL. | |
Screen | Get screen dimensions and DPI scaling. |
Coordinate system
All mouse and screen operations use a pixel-based coordinate system. The origin (0, 0) is at the top-left corner of the screen. The X axis increases to the right, and the Y axis increases downward. Use get_screen_size() to determine the available screen dimensions, and get_cursor_position() to find the current cursor location.
Result objects
Operations in the Computer Use module return one of two result types:
BoolResult (returned by mouse clicks, movement, scrolling, text input, and key operations):
Property | Type | Description |
|
| Whether the operation completed successfully. |
|
| The boolean result of the operation. |
|
| Error details if the operation failed. Empty string on success. |
OperationResult (returned by screenshot, get_cursor_position, and get_screen_size):
Property | Type | Description |
|
| Whether the operation completed successfully. |
| varies | Operation-specific data. For |
|
| Error details if the operation failed. Empty string on success. |
Prerequisites
Before you begin, make sure you have:
The AgentBay SDK installed (
pip install wuying-agentbay-sdk) and configured with valid credentials.A supported cloud computer image (
windows_latestorlinux_latest).
Create a session
Before you can perform any UI automation, you must create a session connected to a cloud computer.
from agentbay import AgentBay, CreateSessionParams
agent_bay = AgentBay()
# Use windows_latest or linux_latest
session_params = CreateSessionParams(image_id="windows_latest")
session = agent_bay.create(session_params).sessionWhen you are finished, delete the session to release resources:
agent_bay.delete(session)All code examples in the following sections assume that you have already created anagent_bayinstance and an activesessionas shown above. Session creation and deletion code is omitted for brevity.
Mouse operations
click_mouse
Performs a mouse click at the specified screen coordinates.
Signature
session.computer.click_mouse(x, y, button=MouseButton.LEFT)Parameters
Parameter | Type | Required | Default | Description |
|
| Yes | - | Horizontal position in pixels. |
|
| Yes | - | Vertical position in pixels. |
|
| No |
| The mouse button to use. |
MouseButton enum values
Import: from agentbay import MouseButton
Value | Description |
| Standard left-click. |
| Right-click (context menu). |
| Middle-click (scroll wheel button). |
| Double left-click. |
Returns: A BoolResult object. Check result.success to confirm the click was registered.
Example
from agentbay import MouseButton
# Left-click (default button)
result = session.computer.click_mouse(x=500, y=300)
if result.success:
print("Left-click successful")
# Output: Left-click successful
# Right-click
result = session.computer.click_mouse(x=500, y=300, button=MouseButton.RIGHT)
if result.success:
print("Right-click successful")
# Output: Right-click successful
# Middle-click
result = session.computer.click_mouse(x=500, y=300, button=MouseButton.MIDDLE)
if result.success:
print("Middle-click successful")
# Output: Middle-click successful
# Double left-click
result = session.computer.click_mouse(x=500, y=300, button=MouseButton.DOUBLE_LEFT)
if result.success:
print("Double left-click successful")
# Output: Double left-click successfulmove_mouse
Moves the mouse cursor to the specified coordinates without clicking.
Signature
session.computer.move_mouse(x, y)Parameters
Parameter | Type | Required | Description |
|
| Yes | Horizontal position in pixels. |
|
| Yes | Vertical position in pixels. |
Returns: A BoolResult object.
Example
result = session.computer.move_mouse(x=600, y=400)
if result.success:
print("Mouse move successful")
# Output: Mouse move successfuldrag_mouse
Drags the mouse from one point to another while holding the specified button.
Signature
session.computer.drag_mouse(from_x, from_y, to_x, to_y, button=MouseButton.LEFT)Parameters
Parameter | Type | Required | Default | Description |
|
| Yes | - | Starting horizontal position in pixels. |
|
| Yes | - | Starting vertical position in pixels. |
|
| Yes | - | Ending horizontal position in pixels. |
|
| Yes | - | Ending vertical position in pixels. |
|
| No |
| The mouse button to hold during the drag. |
Supported button values for drag: MouseButton.LEFT, MouseButton.RIGHT, MouseButton.MIDDLE.
MouseButton.DOUBLE_LEFT is not supported for drag operations.Returns: A BoolResult object.
Example
from agentbay import MouseButton
result = session.computer.drag_mouse(
from_x=100,
from_y=100,
to_x=200,
to_y=200,
button=MouseButton.LEFT
)
if result.success:
print("Drag operation successful")
# Output: Drag operation successfulscroll
Scrolls the mouse wheel at a specific location on the screen.
Signature
session.computer.scroll(x, y, direction=ScrollDirection.UP, amount=1)Parameters
Parameter | Type | Required | Default | Description |
|
| Yes | - | Horizontal position where the scroll occurs, in pixels. |
|
| Yes | - | Vertical position where the scroll occurs, in pixels. |
|
| No |
| The direction to scroll. |
|
| No |
| Number of scroll increments. |
ScrollDirection enum values
Import: from agentbay import ScrollDirection
Value | Description |
| Scroll upward. |
| Scroll downward. |
| Scroll left. |
| Scroll right. |
Returns: A BoolResult object.
Example
from agentbay import ScrollDirection
# Scroll up
result = session.computer.scroll(x=500, y=500, direction=ScrollDirection.UP, amount=3)
if result.success:
print("Scroll up successful")
# Output: Scroll up successful
# Scroll down
result = session.computer.scroll(x=500, y=500, direction=ScrollDirection.DOWN, amount=5)
if result.success:
print("Scroll down successful")
# Output: Scroll down successfulget_cursor_position
Returns the current position of the mouse cursor.
Signature
session.computer.get_cursor_position()Parameters: None.
Returns: An OperationResult object. When result.success is True, result.data contains a JSON string with x and y fields.
Example
import json
result = session.computer.get_cursor_position()
if result.success:
cursor_data = json.loads(result.data)
print(f"Cursor position: x={cursor_data['x']}, y={cursor_data['y']}")
# Output: Cursor position: x=512, y=384Keyboard operations
input_text
Types a string of text at the current cursor position.
Signature
session.computer.input_text(text)Parameters
Parameter | Type | Required | Description |
|
| Yes | The text to type. |
Returns: A BoolResult object.
Example
result = session.computer.input_text("Hello AgentBay!")
if result.success:
print("Text input successful")
# Output: Text input successfulpress_keys
Sends one or more keys simultaneously, with support for modifier keys. Use this method for keyboard shortcuts such as Ctrl+C or Alt+Tab.
Signature
session.computer.press_keys(keys, hold=False)Parameters
Parameter | Type | Required | Default | Description |
|
| Yes | - | A list of key names to press simultaneously. |
|
| No |
| When set to |
Returns: A BoolResult object.
Example
# Press Ctrl+A to select all
result = session.computer.press_keys(keys=["Ctrl", "a"])
if result.success:
print("Key press successful")
# Output: Key press successful
# Press Ctrl+C to copy
result = session.computer.press_keys(keys=["Ctrl", "c"])
if result.success:
print("Copy command sent")
# Output: Copy command sentrelease_keys
Releases keys that were previously held down with press_keys(hold=True). Always release held keys when you are done to prevent them from interfering with subsequent operations.
Signature
session.computer.release_keys(keys)Parameters
Parameter | Type | Required | Description |
|
| Yes | A list of key names to release. |
Returns: A BoolResult object.
Example
# Hold down the Ctrl key
session.computer.press_keys(keys=["Ctrl"], hold=True)
# ... Perform other operations ...
# Release the Ctrl key
result = session.computer.release_keys(keys=["Ctrl"])
if result.success:
print("Key release successful")
# Output: Key release successfulScreen operations
screenshot
Captures the current screen and returns a download URL. The screenshot is saved to Object Storage Service (OSS), and a URL is returned that you can use to download the image file.
Signature
session.computer.screenshot()Parameters: None.
Returns: An OperationResult object. When result.success is True, result.data contains the download URL for the screenshot image (not the raw image data).
Example
result = session.computer.screenshot()
if result.success:
screenshot_url = result.data
print(f"Screenshot URL: {screenshot_url}")
# Output: Screenshot URL: https://***.***.aliyuncs.com/***/screenshot_1234567890.png?***get_screen_size
Returns the screen dimensions and display scale factor of the cloud computer.
Signature
session.computer.get_screen_size()Parameters: None.
Returns: An OperationResult object. When result.success is True, result.data contains a JSON string with the following fields:
Field | Type | Description |
|
| Screen width in pixels. |
|
| Screen height in pixels. |
|
| The display scale factor (DPI scaling). A value of |
Example
import json
result = session.computer.get_screen_size()
if result.success:
screen_data = json.loads(result.data)
print(f"Screen width: {screen_data['width']}")
print(f"Screen height: {screen_data['height']}")
print(f"DPI scaling factor: {screen_data['dpiScalingFactor']}")
# Output: Screen width: 1024
# Output: Screen height: 768
# Output: DPI scaling factor: 1.0Troubleshooting
"Tool not found" error
Symptom: You receive a "Tool not found" error when calling a Computer Use method.
Cause: The session is not running on a supported cloud computer image.
Solution: When you create a session, make sure the image_id parameter is set to a valid value such as windows_latest or linux_latest.
# Correct - use a supported image ID
session_params = CreateSessionParams(image_id="windows_latest")Screenshot returns a URL, not image data
Symptom: result.data from screenshot() contains a URL string instead of raw image bytes.
Cause: This is the expected behavior. The screenshot() method stores the image in Object Storage Service (OSS) and returns a download URL.
Solution: Use the returned URL to download the image with an HTTP client or browser.
Held keys not released
Symptom: Subsequent keyboard operations behave unexpectedly after using press_keys(hold=True).
Cause: Keys held with press_keys(hold=True) remain active until explicitly released.
Solution: Always pair press_keys(hold=True) with a corresponding release_keys() call.
# Hold a key
session.computer.press_keys(keys=["Shift"], hold=True)
# Perform operations that need Shift held...
# Always release afterward
session.computer.release_keys(keys=["Shift"])Coordinates out of bounds
Symptom: A mouse operation produces unexpected results or no visible effect.
Cause: The target coordinates may be outside the screen area. The SDK does not validate coordinate ranges on the client side — coordinates are sent directly to the cloud computer.
Solution: Call get_screen_size() first to determine the valid coordinate range, and make sure your x and y values fall within (0, 0) to (width, height).