All Products
Search
Document Center

AgentBay:Window management

Last Updated:Oct 30, 2025

This topic describes how to use the Wuying AgentBay software development kit (SDK) to manage windows on a cloud computer. You can control the status, position, and focus of windows, and interact with desktop windows in the cloud environment.

Overview

The Computer Use module provides comprehensive window management features for the desktop environment, including the following:

  1. Window discovery - List and find windows in the system.

  2. Window status control - Maximize, minimize, restore, and close windows.

  3. Window positioning - Resize and reposition windows.

  4. Focus management - Control window focus and activation status.

  5. Desktop automation - Build complex desktop automation workflows.

Create a session

You can create a session with a desktop environment.

import os
from agentbay import AgentBay
from agentbay.session_params import CreateSessionParams

api_key = os.getenv("AGENTBAY_API_KEY")
if not api_key:
    raise ValueError("The AGENTBAY_API_KEY environment variable is required")

agent_bay = AgentBay(api_key=api_key)

params = CreateSessionParams(image_id="linux_latest")
result = agent_bay.create(params)

if result.success:
    session = result.session
    print(f"Session created: {session.session_id}")
else:
    print(f"Failed to create session: {result.error_message}")
    exit(1)
    

List windows

You can retrieve information about all available windows in the desktop environment.

result = session.computer.list_root_windows(timeout_ms=5000)

if result.success:
    windows = result.windows
    print(f"Found {len(windows)} windows")
    # Execution result: Found 0 windows (when no windows are open)
    # Or: Found 5 windows (when applications are running)
    
    for window in windows:
        print(f"Title: {window.title}")
        # Execution result: Title: Google Chrome
        print(f"Window ID: {window.window_id}")
        # Execution result: Window ID: 12345678
        print(f"Process: {window.pname if window.pname else 'N/A'}")
        # Execution result: Process: chrome
        print(f"PID: {window.pid if window.pid else 'N/A'}")
        # Execution result: PID: 9876
        print(f"Position: ({window.absolute_upper_left_x}, {window.absolute_upper_left_y})")
        # Execution result: Position: (100, 50)
        print(f"Size: {window.width}x{window.height}")
        # Execution result: Size: 1280x720
        print(f"Child windows: {len(window.child_windows)}")
        # Execution result: Child windows: 0
        print("---")
else:
    print(f"Error listing windows: {result.error_message}")
    

Parameters:

  • timeout_ms (int, optional): The timeout in milliseconds. Default: 3000.

Window object attributes:

  • window_id (int): The unique identifier of the window.

  • title (str): The window title or description text.

  • absolute_upper_left_x (Optional[int]): The X-coordinate of the upper-left corner of the window.

  • absolute_upper_left_y (Optional[int]): The Y-coordinate of the upper-left corner of the window.

  • width (Optional[int]): The window width in pixels.

  • height (Optional[int]): The window height in pixels.

  • pid (Optional[int]): The ID of the process that owns the window.

  • pname (Optional[str]): The name of the process that owns the window.

  • child_windows (List[Window]): A list of child windows.

Window control operations

You can control the status and position of a window.

Activate a window

result = session.computer.list_root_windows()

if result.success and result.windows:
    window_id = result.windows[0].window_id
    
    activate_result = session.computer.activate_window(window_id)
    # Execution result: Window activated successfully
    
    if activate_result.success:
        print("Window activated successfully")
    else:
        print(f"Failed to activate window: {activate_result.error_message}")
        

Maximize a window

result = session.computer.list_root_windows()

if result.success and result.windows:
    window_id = result.windows[0].window_id
    
    maximize_result = session.computer.maximize_window(window_id)
    # Execution result: Window maximized successfully
    
    if maximize_result.success:
        print("Window maximized successfully")
    else:
        print(f"Failed to maximize window: {maximize_result.error_message}")
        

Minimize a window

result = session.computer.list_root_windows()

if result.success and result.windows:
    window_id = result.windows[0].window_id
    
    minimize_result = session.computer.minimize_window(window_id)
    # Execution result: Window minimized successfully
    
    if minimize_result.success:
        print("Window minimized successfully")
    else:
        print(f"Failed to minimize window: {minimize_result.error_message}")
        

Restore a window

result = session.computer.list_root_windows()

if result.success and result.windows:
    window_id = result.windows[0].window_id
    
    restore_result = session.computer.restore_window(window_id)
    # Execution result: Window restored successfully
    
    if restore_result.success:
        print("Window restored successfully")
    else:
        print(f"Failed to restore window: {restore_result.error_message}")
        

Resize a window

result = session.computer.list_root_windows()

if result.success and result.windows:
    window_id = result.windows[0].window_id
    
    resize_result = session.computer.resize_window(window_id, 800, 600)
    # Execution result: Window resized to 800x600
    
    if resize_result.success:
        print("Window resized to 800x600")
    else:
        print(f"Failed to resize window: {resize_result.error_message}")
        

Make a window full screen

result = session.computer.list_root_windows()

if result.success and result.windows:
    window_id = result.windows[0].window_id
    
    fullscreen_result = session.computer.fullscreen_window(window_id)
    # Execution result: Window set to full screen
    
    if fullscreen_result.success:
        print("Window set to full screen")
    else:
        print(f"Failed to set to full screen: {fullscreen_result.error_message}")
        

Close a window

# Note: Use with caution, as this permanently closes the window.
result = session.computer.list_root_windows()

if result.success and result.windows:
    window_id = result.windows[0].window_id
    
    close_result = session.computer.close_window(window_id)
    
    if close_result.success:
        print("Window closed successfully")
    else:
        print(f"Failed to close window: {close_result.error_message}")
        

Complete window control function

import time

def control_window(session, window_id):
    print(f"Controlling window ID: {window_id}")
    
    try:
        session.computer.activate_window(window_id)
        print("Window activated")
    except Exception as e:
        print(f"Activation failed: {e}")
    
    time.sleep(1)
    
    try:
        session.computer.maximize_window(window_id)
        print("Window maximized")
    except Exception as e:
        print(f"Maximization failed: {e}")
    
    time.sleep(1)
    
    try:
        session.computer.minimize_window(window_id)
        print("Window minimized")
    except Exception as e:
        print(f"Minimization failed: {e}")
    
    time.sleep(1)
    
    try:
        session.computer.restore_window(window_id)
        print("Window restored")
    except Exception as e:
        print(f"Restore failed: {e}")
    
    try:
        session.computer.resize_window(window_id, 800, 600)
        print("Window resized to 800x600")
    except Exception as e:
        print(f"Resize failed: {e}")

windows = session.computer.list_root_windows()
if windows.success and windows.windows:
    control_window(session, windows.windows[0].window_id)
    

Focus management

You can control the system's focus behavior to prevent windows from stealing focus.

try:
    session.computer.focus_mode(True)
    # Execution result: Focus mode enabled - Windows will not steal focus
    print("Focus mode enabled - Windows will not steal focus")
except Exception as e:
    print(f"Failed to enable focus mode: {e}")

try:
    session.computer.focus_mode(False)
    # Execution result: Focus mode disabled
    print("Focus mode disabled")
except Exception as e:
    print(f"Failed to disable focus mode: {e}")
    

Parameters:

  • on (bool): Set to True to enable focus mode, or False to disable it.

Get the active window

You can retrieve information about the currently active window.

# Note: This operation may fail if there is no active window.
result = session.computer.get_active_window(timeout_ms=5000)

if result.success:
    active_window = result.window
    # Execution result when a window is active:
    # Active window:
    #   Title: Google Chrome
    #   Window ID: 87654321
    #   Process: chrome
    #   PID: 4321
    #   Position: (0, 0)
    #   Size: 1920x1080
    print(f"Active window:")
    print(f"  Title: {active_window.title}")
    print(f"  Window ID: {active_window.window_id}")
    print(f"  Process: {active_window.pname}")
    print(f"  PID: {active_window.pid}")
    print(f"  Position: ({active_window.absolute_upper_left_x}, {active_window.absolute_upper_left_y})")
    print(f"  Size: {active_window.width}x{active_window.height}")
else:
    # Execution result when no window is active:
    # Failed to get active window: Response error (expected result when no window is active)
    print(f"Failed to get active window: {result.error_message}")
    

Parameters:

  • timeout_ms (int, optional): The timeout in milliseconds. Default: 3000.

Complete workflow example

The following example shows how to start an application and control its window.

import os
import time
from agentbay import AgentBay
from agentbay.session_params import CreateSessionParams

api_key = os.getenv("AGENTBAY_API_KEY")
if not api_key:
    raise ValueError("The AGENTBAY_API_KEY environment variable is required")

agent_bay = AgentBay(api_key=api_key)

params = CreateSessionParams(image_id="linux_latest")
result = agent_bay.create(params)

if not result.success:
    print(f"Failed to create session: {result.error_message}")
    exit(1)

session = result.session
print(f"Session created: {session.session_id}")
# Execution result: Session created: session-04bdwfj7u688ec96t

print("Step 1: Find installed applications...")
apps_result = session.computer.get_installed_apps(
    start_menu=True,
    desktop=False,
    ignore_system_apps=True
)
# Execution result: Found 76 applications

if not apps_result.success:
    print(f"Failed to get applications: {apps_result.error_message}")
    agent_bay.delete(session)
    exit(1)

target_app = None
for app in apps_result.data:
    if "chrome" in app.name.lower():
        target_app = app
        break

if not target_app:
    print("Google Chrome not found")
    agent_bay.delete(session)
    exit(1)

print(f"Found application: {target_app.name}")
# Execution result: Found application: Google Chrome

print("Step 2: Start the application...")
start_result = session.computer.start_app(target_app.start_cmd)

if not start_result.success:
    print(f"Failed to start application: {start_result.error_message}")
    agent_bay.delete(session)
    exit(1)

print(f"Application started, {len(start_result.data)} processes launched")
# Execution result: Application started, 6 processes launched

print("Step 3: Wait for the application window to load...")
time.sleep(5)

print("Step 4: Find the application window...")
windows_result = session.computer.list_root_windows()

if not windows_result.success:
    print(f"Failed to list windows: {windows_result.error_message}")
    agent_bay.delete(session)
    exit(1)

app_window = None
for window in windows_result.windows:
    if target_app.name.lower() in window.title.lower():
        app_window = window
        break

if not app_window and windows_result.windows:
    app_window = windows_result.windows[0]
    print("Using the first available window")

if app_window:
    print(f"Found window: {app_window.title}")
    # Execution result: Found window: Welcome to Google Chrome
    
    print("Step 5: Control the window...")
    try:
        session.computer.activate_window(app_window.window_id)
        print("Window activated")
        # Execution result: Window activated
        
        time.sleep(1)
        session.computer.maximize_window(app_window.window_id)
        print("Window maximized")
        # Execution result: Window maximized
        
        time.sleep(1)
        session.computer.resize_window(app_window.window_id, 1024, 768)
        print("Window resized to 1024x768")
        # Execution result: Window resized to 1024x768
        
    except Exception as e:
        print(f"Window control failed: {e}")

print("Cleaning up session...")
agent_bay.delete(session)
print("Workflow complete!")
# Execution result: Session deleted successfully

API reference

Window manager methods

Method

Parameters

Return value

Description

list_root_windows()

timeout_ms: int = 3000

WindowListResult

Lists all root windows

get_active_window()

timeout_ms: int = 3000

WindowInfoResult

Gets the current active window

activate_window()

window_id: int

BoolResult

Activates a window

maximize_window()

window_id: int

BoolResult

Maximizes a window

minimize_window()

window_id: int

BoolResult

Minimizes a window

restore_window()

window_id: int

BoolResult

Restores a window

close_window()

window_id: int

BoolResult

Closes a window

fullscreen_window()

window_id: int

BoolResult

Makes a window full screen

resize_window()

window_id: int<br/>width: int<br/>height: int

BoolResult

Resizes a window

focus_mode()

on: bool

BoolResult

Toggles focus mode

Return types

WindowListResult

  • success (bool): Indicates whether the operation was successful.

  • windows (List[Window]): A list of window objects.

  • error_message (str): The error message if the operation failed.

  • request_id (str): The unique request identifier.

Window

  • window_id (int): The unique identifier of the window.

  • title (str): The window title or description text.

  • absolute_upper_left_x (Optional[int]): The X-coordinate of the upper-left corner of the window.

  • absolute_upper_left_y (Optional[int]): The Y-coordinate of the upper-left corner of the window.

  • width (Optional[int]): The window width in pixels.

  • height (Optional[int]): The window height in pixels.

  • pid (Optional[int]): The ID of the process that owns the window.

  • pname (Optional[str]): The name of the process that owns the window.

  • child_windows (List[Window]): A list of child windows.

WindowInfoResult

  • success (bool): Indicates whether the operation was successful.

  • window (Window): The window object.

  • error_message (str): The error message if the operation failed.

  • request_id (str): The unique request identifier.

BoolResult

  • success (bool): Indicates whether the operation was successful.

  • data (bool): The result data of the operation.

  • error_message (str): The error message if the operation failed.

  • request_id (str): The unique request identifier.