Discover, control, and manage application windows on an AgentBay cloud computer. List open windows, change their state, resize them, and control focus behavior.
All window management methods belong to the Computer Use module, accessed through session.computer. Each method returns a result object with success, error_message, and operation-specific fields.
Workflow
Create a session with a cloud computer desktop environment.
List windows to discover open applications.
Select a target window by title, process name, or window ID.
Control the window -- activate, maximize, minimize, resize, or close it.
Manage focus to prevent other windows from interrupting automation.
Prerequisites
An AgentBay session with a desktop environment
import os
from agentbay import AgentBay, CreateSessionParams
api_key = os.getenv("AGENTBAY_API_KEY")
if not api_key:
raise ValueError("The AGENTBAY_API_KEY environment variable is required")
agent_bay = AgentBay(api_key=api_key)
params = CreateSessionParams(image_id="linux_latest")
result = agent_bay.create(params)
if result.success:
session = result.session
print(f"Session created: {session.session_id}")
else:
print(f"Failed to create session: {result.error_message}")
exit(1)List windows
Call list_root_windows() to get all top-level application windows on the desktop. Root windows refer to top-level application windows, not the X11 root window.
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
timeout_ms | int | No | 3000 | Timeout in milliseconds. |
Returns: WindowListResult
result = session.computer.list_root_windows(timeout_ms=5000)
if result.success:
windows = result.windows
print(f"Found {len(windows)} windows")
for window in windows:
print(f"Title: {window.title}")
print(f"Window ID: {window.window_id}")
print(f"Process: {window.pname if window.pname else 'N/A'}")
print(f"PID: {window.pid if window.pid else 'N/A'}")
print(f"Position: ({window.absolute_upper_left_x}, {window.absolute_upper_left_y})")
print(f"Size: {window.width}x{window.height}")
print(f"Child windows: {len(window.child_windows)}")
print("---")
else:
print(f"Error listing windows: {result.error_message}")Window object attributes
| Attribute | Type | Description |
|---|---|---|
window_id | int | Unique identifier of the window. |
title | str | Window title or description text. |
absolute_upper_left_x | Optional[int] | X-coordinate of the upper-left corner. |
absolute_upper_left_y | Optional[int] | Y-coordinate of the upper-left corner. |
width | Optional[int] | Window width in pixels. |
height | Optional[int] | Window height in pixels. |
pid | Optional[int] | Process ID of the window owner. |
pname | Optional[str] | Process name of the window owner. |
child_windows | List[Window] | Child windows nested under this window. |
Control window state
All window control methods take a window_id parameter (int) and return a BoolResult. Get the window_id from list_root_windows() first:
result = session.computer.list_root_windows()
if result.success and result.windows:
window_id = result.windows[0].window_idActivate a window
Bring a window to the foreground and give it input focus.
activate_result = session.computer.activate_window(window_id)
if activate_result.success:
print("Window activated successfully")
else:
print(f"Failed to activate window: {activate_result.error_message}")Maximize a window
Expand a window to fill the entire screen area.
maximize_result = session.computer.maximize_window(window_id)
if maximize_result.success:
print("Window maximized successfully")
else:
print(f"Failed to maximize window: {maximize_result.error_message}")Minimize a window
Hide a window to the taskbar.
minimize_result = session.computer.minimize_window(window_id)
if minimize_result.success:
print("Window minimized successfully")
else:
print(f"Failed to minimize window: {minimize_result.error_message}")Restore a window
Return a maximized or minimized window to its previous size and position.
restore_result = session.computer.restore_window(window_id)
if restore_result.success:
print("Window restored successfully")
else:
print(f"Failed to restore window: {restore_result.error_message}")Make a window full screen
Set a window to full screen mode.
fullscreen_result = session.computer.fullscreen_window(window_id)
if fullscreen_result.success:
print("Window set to full screen")
else:
print(f"Failed to set window to full screen: {fullscreen_result.error_message}")Resize a window
Change a window to specific pixel dimensions.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
window_id | int | Yes | Target window identifier. |
width | int | Yes | New width in pixels. |
height | int | Yes | New height in pixels. |
resize_result = session.computer.resize_window(window_id, 800, 600)
if resize_result.success:
print("Window resized to 800x600")
else:
print(f"Failed to resize window: {resize_result.error_message}")Close a window
Permanently close a window. Use with caution -- the window and any unsaved data may be lost.
close_result = session.computer.close_window(window_id)
if close_result.success:
print("Window closed successfully")
else:
print(f"Failed to close window: {close_result.error_message}")Manage focus
Call focus_mode() to prevent windows from stealing focus from the active window. This is useful during automation when background processes might open dialogs or notifications that interrupt the workflow.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
on | bool | Yes | True to enable focus mode, False to disable it. |
Returns: BoolResult
# Enable focus mode to prevent focus stealing
try:
session.computer.focus_mode(True)
print("Focus mode enabled - Windows will not steal focus")
except Exception as e:
print(f"Failed to enable focus mode: {e}")
# Disable focus mode
try:
session.computer.focus_mode(False)
print("Focus mode disabled")
except Exception as e:
print(f"Failed to disable focus mode: {e}")Get the active window
Call get_active_window() to retrieve the window that currently has focus. This method may fail if no window is active.
Returns: WindowInfoResult
result = session.computer.get_active_window()
if result.success:
active_window = result.window
print(f"Active window:")
print(f" Title: {active_window.title}")
print(f" Window ID: {active_window.window_id}")
print(f" Process: {active_window.pname}")
print(f" PID: {active_window.pid}")
print(f" Position: ({active_window.absolute_upper_left_x}, {active_window.absolute_upper_left_y})")
print(f" Size: {active_window.width}x{active_window.height}")
else:
print(f"Failed to get active window: {result.error_message}")Complete example
Find an application, launch it, locate its window, and control it.
import os
import time
from agentbay import AgentBay, CreateSessionParams
api_key = os.getenv("AGENTBAY_API_KEY")
if not api_key:
raise ValueError("The AGENTBAY_API_KEY environment variable is required")
agent_bay = AgentBay(api_key=api_key)
params = CreateSessionParams(image_id="linux_latest")
result = agent_bay.create(params)
if not result.success:
print(f"Failed to create session: {result.error_message}")
exit(1)
session = result.session
print(f"Session created: {session.session_id}")
# Step 1: Find installed applications
print("Step 1: Find installed applications...")
apps_result = session.computer.get_installed_apps(
start_menu=True,
desktop=False,
ignore_system_apps=True
)
if not apps_result.success:
print(f"Failed to get applications: {apps_result.error_message}")
agent_bay.delete(session)
exit(1)
target_app = None
for app in apps_result.data:
if "chrome" in app.name.lower():
target_app = app
break
if not target_app:
print("Google Chrome not found")
agent_bay.delete(session)
exit(1)
print(f"Found application: {target_app.name}")
# Step 2: Start the application
print("Step 2: Start the application...")
start_result = session.computer.start_app(target_app.start_cmd)
if not start_result.success:
print(f"Failed to start application: {start_result.error_message}")
agent_bay.delete(session)
exit(1)
print(f"Application started, {len(start_result.data)} processes launched")
# Step 3: Wait for the window to load
print("Step 3: Wait for the application window to load...")
time.sleep(5)
# Step 4: Find the application window
print("Step 4: Find the application window...")
windows_result = session.computer.list_root_windows()
if not windows_result.success:
print(f"Failed to list windows: {windows_result.error_message}")
agent_bay.delete(session)
exit(1)
app_window = None
for window in windows_result.windows:
if target_app.name.lower() in window.title.lower():
app_window = window
break
if not app_window and windows_result.windows:
app_window = windows_result.windows[0]
print("Using the first available window")
if app_window:
print(f"Found window: {app_window.title}")
# Step 5: Control the window
print("Step 5: Control the window...")
try:
session.computer.activate_window(app_window.window_id)
print("Window activated")
time.sleep(1)
session.computer.maximize_window(app_window.window_id)
print("Window maximized")
time.sleep(1)
session.computer.resize_window(app_window.window_id, 1024, 768)
print("Window resized to 1024x768")
except Exception as e:
print(f"Window control failed: {e}")
# Clean up
print("Cleaning up session...")
agent_bay.delete(session)
print("Workflow complete!")API reference
Methods
| Method | Parameters | Return type | Description |
|---|---|---|---|
list_root_windows() | timeout_ms: int = 3000 | WindowListResult | List all top-level application windows. |
get_active_window() | None | WindowInfoResult | Get the currently active window. |
activate_window() | window_id: int | BoolResult | Activate a window and give it focus. |
maximize_window() | window_id: int | BoolResult | Maximize a window. |
minimize_window() | window_id: int | BoolResult | Minimize a window. |
restore_window() | window_id: int | BoolResult | Restore a window from maximized or minimized state. |
close_window() | window_id: int | BoolResult | Close a window. |
fullscreen_window() | window_id: int | BoolResult | Set a window to full screen mode. |
resize_window() | window_id: int, width: int, height: int | BoolResult | Resize a window to specified dimensions. |
focus_mode() | on: bool | BoolResult | Enable or disable focus stealing prevention. |
Return types
WindowListResult
| Attribute | Type | Description |
|---|---|---|
success | bool | Whether the operation succeeded. |
windows | List[Window] | List of window objects. |
error_message | str | Error message if the operation failed. |
request_id | str | Unique request identifier. |
WindowInfoResult
| Attribute | Type | Description |
|---|---|---|
success | bool | Whether the operation succeeded. |
window | Window | The window object. |
error_message | str | Error message if the operation failed. |
request_id | str | Unique request identifier. |
BoolResult
| Attribute | Type | Description |
|---|---|---|
success | bool | Whether the operation succeeded. |
data | bool | Result data of the operation. |
error_message | str | Error message if the operation failed. |
request_id | str | Unique request identifier. |
Window
| Attribute | Type | Description |
|---|---|---|
window_id | int | Unique identifier of the window. |
title | str | Window title or description text. |
absolute_upper_left_x | Optional[int] | X-coordinate of the upper-left corner. |
absolute_upper_left_y | Optional[int] | Y-coordinate of the upper-left corner. |
width | Optional[int] | Window width in pixels. |
height | Optional[int] | Window height in pixels. |
pid | Optional[int] | Process ID of the window owner. |
pname | Optional[str] | Process name of the window owner. |
child_windows | List[Window] | List of child windows. |