This topic describes how to use the Wuying AgentBay software development kit (SDK) to manage windows on a cloud computer. You can control the status, position, and focus of windows, and interact with desktop windows in the cloud environment.
Overview
The Computer Use module provides comprehensive window management features for the desktop environment, including the following:
Window discovery - List and find windows in the system.
Window status control - Maximize, minimize, restore, and close windows.
Window positioning - Resize and reposition windows.
Focus management - Control window focus and activation status.
Desktop automation - Build complex desktop automation workflows.
Create a session
You can create a session with a desktop environment.
import os
from agentbay import AgentBay
from agentbay.session_params import CreateSessionParams
api_key = os.getenv("AGENTBAY_API_KEY")
if not api_key:
raise ValueError("The AGENTBAY_API_KEY environment variable is required")
agent_bay = AgentBay(api_key=api_key)
params = CreateSessionParams(image_id="linux_latest")
result = agent_bay.create(params)
if result.success:
session = result.session
print(f"Session created: {session.session_id}")
else:
print(f"Failed to create session: {result.error_message}")
exit(1)
List windows
You can retrieve information about all available windows in the desktop environment.
result = session.computer.list_root_windows(timeout_ms=5000)
if result.success:
windows = result.windows
print(f"Found {len(windows)} windows")
# Execution result: Found 0 windows (when no windows are open)
# Or: Found 5 windows (when applications are running)
for window in windows:
print(f"Title: {window.title}")
# Execution result: Title: Google Chrome
print(f"Window ID: {window.window_id}")
# Execution result: Window ID: 12345678
print(f"Process: {window.pname if window.pname else 'N/A'}")
# Execution result: Process: chrome
print(f"PID: {window.pid if window.pid else 'N/A'}")
# Execution result: PID: 9876
print(f"Position: ({window.absolute_upper_left_x}, {window.absolute_upper_left_y})")
# Execution result: Position: (100, 50)
print(f"Size: {window.width}x{window.height}")
# Execution result: Size: 1280x720
print(f"Child windows: {len(window.child_windows)}")
# Execution result: Child windows: 0
print("---")
else:
print(f"Error listing windows: {result.error_message}")
Parameters:
timeout_ms(int, optional): The timeout in milliseconds. Default: 3000.
Window object attributes:
window_id(int): The unique identifier of the window.title(str): The window title or description text.absolute_upper_left_x(Optional[int]): The X-coordinate of the upper-left corner of the window.absolute_upper_left_y(Optional[int]): The Y-coordinate of the upper-left corner of the window.width(Optional[int]): The window width in pixels.height(Optional[int]): The window height in pixels.pid(Optional[int]): The ID of the process that owns the window.pname(Optional[str]): The name of the process that owns the window.child_windows(List[Window]): A list of child windows.
Window control operations
You can control the status and position of a window.
Activate a window
result = session.computer.list_root_windows()
if result.success and result.windows:
window_id = result.windows[0].window_id
activate_result = session.computer.activate_window(window_id)
# Execution result: Window activated successfully
if activate_result.success:
print("Window activated successfully")
else:
print(f"Failed to activate window: {activate_result.error_message}")
Maximize a window
result = session.computer.list_root_windows()
if result.success and result.windows:
window_id = result.windows[0].window_id
maximize_result = session.computer.maximize_window(window_id)
# Execution result: Window maximized successfully
if maximize_result.success:
print("Window maximized successfully")
else:
print(f"Failed to maximize window: {maximize_result.error_message}")
Minimize a window
result = session.computer.list_root_windows()
if result.success and result.windows:
window_id = result.windows[0].window_id
minimize_result = session.computer.minimize_window(window_id)
# Execution result: Window minimized successfully
if minimize_result.success:
print("Window minimized successfully")
else:
print(f"Failed to minimize window: {minimize_result.error_message}")
Restore a window
result = session.computer.list_root_windows()
if result.success and result.windows:
window_id = result.windows[0].window_id
restore_result = session.computer.restore_window(window_id)
# Execution result: Window restored successfully
if restore_result.success:
print("Window restored successfully")
else:
print(f"Failed to restore window: {restore_result.error_message}")
Resize a window
result = session.computer.list_root_windows()
if result.success and result.windows:
window_id = result.windows[0].window_id
resize_result = session.computer.resize_window(window_id, 800, 600)
# Execution result: Window resized to 800x600
if resize_result.success:
print("Window resized to 800x600")
else:
print(f"Failed to resize window: {resize_result.error_message}")
Make a window full screen
result = session.computer.list_root_windows()
if result.success and result.windows:
window_id = result.windows[0].window_id
fullscreen_result = session.computer.fullscreen_window(window_id)
# Execution result: Window set to full screen
if fullscreen_result.success:
print("Window set to full screen")
else:
print(f"Failed to set to full screen: {fullscreen_result.error_message}")
Close a window
# Note: Use with caution, as this permanently closes the window.
result = session.computer.list_root_windows()
if result.success and result.windows:
window_id = result.windows[0].window_id
close_result = session.computer.close_window(window_id)
if close_result.success:
print("Window closed successfully")
else:
print(f"Failed to close window: {close_result.error_message}")
Complete window control function
import time
def control_window(session, window_id):
print(f"Controlling window ID: {window_id}")
try:
session.computer.activate_window(window_id)
print("Window activated")
except Exception as e:
print(f"Activation failed: {e}")
time.sleep(1)
try:
session.computer.maximize_window(window_id)
print("Window maximized")
except Exception as e:
print(f"Maximization failed: {e}")
time.sleep(1)
try:
session.computer.minimize_window(window_id)
print("Window minimized")
except Exception as e:
print(f"Minimization failed: {e}")
time.sleep(1)
try:
session.computer.restore_window(window_id)
print("Window restored")
except Exception as e:
print(f"Restore failed: {e}")
try:
session.computer.resize_window(window_id, 800, 600)
print("Window resized to 800x600")
except Exception as e:
print(f"Resize failed: {e}")
windows = session.computer.list_root_windows()
if windows.success and windows.windows:
control_window(session, windows.windows[0].window_id)
Focus management
You can control the system's focus behavior to prevent windows from stealing focus.
try:
session.computer.focus_mode(True)
# Execution result: Focus mode enabled - Windows will not steal focus
print("Focus mode enabled - Windows will not steal focus")
except Exception as e:
print(f"Failed to enable focus mode: {e}")
try:
session.computer.focus_mode(False)
# Execution result: Focus mode disabled
print("Focus mode disabled")
except Exception as e:
print(f"Failed to disable focus mode: {e}")
Parameters:
on(bool): Set to True to enable focus mode, or False to disable it.
Get the active window
You can retrieve information about the currently active window.
# Note: This operation may fail if there is no active window.
result = session.computer.get_active_window(timeout_ms=5000)
if result.success:
active_window = result.window
# Execution result when a window is active:
# Active window:
# Title: Google Chrome
# Window ID: 87654321
# Process: chrome
# PID: 4321
# Position: (0, 0)
# Size: 1920x1080
print(f"Active window:")
print(f" Title: {active_window.title}")
print(f" Window ID: {active_window.window_id}")
print(f" Process: {active_window.pname}")
print(f" PID: {active_window.pid}")
print(f" Position: ({active_window.absolute_upper_left_x}, {active_window.absolute_upper_left_y})")
print(f" Size: {active_window.width}x{active_window.height}")
else:
# Execution result when no window is active:
# Failed to get active window: Response error (expected result when no window is active)
print(f"Failed to get active window: {result.error_message}")
Parameters:
timeout_ms(int, optional): The timeout in milliseconds. Default: 3000.
Complete workflow example
The following example shows how to start an application and control its window.
import os
import time
from agentbay import AgentBay
from agentbay.session_params import CreateSessionParams
api_key = os.getenv("AGENTBAY_API_KEY")
if not api_key:
raise ValueError("The AGENTBAY_API_KEY environment variable is required")
agent_bay = AgentBay(api_key=api_key)
params = CreateSessionParams(image_id="linux_latest")
result = agent_bay.create(params)
if not result.success:
print(f"Failed to create session: {result.error_message}")
exit(1)
session = result.session
print(f"Session created: {session.session_id}")
# Execution result: Session created: session-04bdwfj7u688ec96t
print("Step 1: Find installed applications...")
apps_result = session.computer.get_installed_apps(
start_menu=True,
desktop=False,
ignore_system_apps=True
)
# Execution result: Found 76 applications
if not apps_result.success:
print(f"Failed to get applications: {apps_result.error_message}")
agent_bay.delete(session)
exit(1)
target_app = None
for app in apps_result.data:
if "chrome" in app.name.lower():
target_app = app
break
if not target_app:
print("Google Chrome not found")
agent_bay.delete(session)
exit(1)
print(f"Found application: {target_app.name}")
# Execution result: Found application: Google Chrome
print("Step 2: Start the application...")
start_result = session.computer.start_app(target_app.start_cmd)
if not start_result.success:
print(f"Failed to start application: {start_result.error_message}")
agent_bay.delete(session)
exit(1)
print(f"Application started, {len(start_result.data)} processes launched")
# Execution result: Application started, 6 processes launched
print("Step 3: Wait for the application window to load...")
time.sleep(5)
print("Step 4: Find the application window...")
windows_result = session.computer.list_root_windows()
if not windows_result.success:
print(f"Failed to list windows: {windows_result.error_message}")
agent_bay.delete(session)
exit(1)
app_window = None
for window in windows_result.windows:
if target_app.name.lower() in window.title.lower():
app_window = window
break
if not app_window and windows_result.windows:
app_window = windows_result.windows[0]
print("Using the first available window")
if app_window:
print(f"Found window: {app_window.title}")
# Execution result: Found window: Welcome to Google Chrome
print("Step 5: Control the window...")
try:
session.computer.activate_window(app_window.window_id)
print("Window activated")
# Execution result: Window activated
time.sleep(1)
session.computer.maximize_window(app_window.window_id)
print("Window maximized")
# Execution result: Window maximized
time.sleep(1)
session.computer.resize_window(app_window.window_id, 1024, 768)
print("Window resized to 1024x768")
# Execution result: Window resized to 1024x768
except Exception as e:
print(f"Window control failed: {e}")
print("Cleaning up session...")
agent_bay.delete(session)
print("Workflow complete!")
# Execution result: Session deleted successfullyAPI reference
Window manager methods
Method | Parameters | Return value | Description |
|
|
| Lists all root windows |
|
|
| Gets the current active window |
|
|
| Activates a window |
|
|
| Maximizes a window |
|
|
| Minimizes a window |
|
|
| Restores a window |
|
|
| Closes a window |
|
|
| Makes a window full screen |
|
|
| Resizes a window |
|
|
| Toggles focus mode |
Return types
WindowListResult
success(bool): Indicates whether the operation was successful.windows(List[Window]): A list of window objects.error_message(str): The error message if the operation failed.request_id(str): The unique request identifier.
Window
window_id(int): The unique identifier of the window.title(str): The window title or description text.absolute_upper_left_x(Optional[int]): The X-coordinate of the upper-left corner of the window.absolute_upper_left_y(Optional[int]): The Y-coordinate of the upper-left corner of the window.width(Optional[int]): The window width in pixels.height(Optional[int]): The window height in pixels.pid(Optional[int]): The ID of the process that owns the window.pname(Optional[str]): The name of the process that owns the window.child_windows(List[Window]): A list of child windows.
WindowInfoResult
success(bool): Indicates whether the operation was successful.window(Window): The window object.error_message(str): The error message if the operation failed.request_id(str): The unique request identifier.
BoolResult
success(bool): Indicates whether the operation was successful.data(bool): The result data of the operation.error_message(str): The error message if the operation failed.request_id(str): The unique request identifier.