The browser agent automatically interacts with web pages in a controlled environment. It supports page navigation, information extraction, form interaction, and screenshot feedback to extend the task-handling capabilities of agent mode. - AI Coding Assistant Lingma

The browser agent extends agent mode, allowing it to use a browser to complete tasks. It can open web pages, browse content, click buttons, fill out forms, and scroll pages in a controlled environment. It also provides screenshot feedback on the page status as needed to help you automate tasks that require direct web page access.

Simply describe your needs in natural language in a chat session, such as "Go to the official website, find the latest prices, and summarize the differences." The agent automatically invokes the browser agent when needed, without requiring you to manually switch modes or write scripts.

Core capabilities

The browser agent provides the following capabilities:

Open and navigate web pages
- Open a specific web page using a provided URL.
- Navigate to new pages or tabs within the same site, such as by clicking navigation or pagination links.
- Handle multi-step navigation tasks, such as "Open page A -> Click menu B -> Go to details page C."
Read and extract information
- Read visible text content on the current page, such as titles, paragraphs, lists, and tables.
- Extract key information from the page and provide summaries or comparisons in natural language.
- Find relevant information on a page based on your instructions, such as "Find all price-related content on this page."
Interact with pages
- Click buttons and links, switch tabs, or expand and collapse content.
- Enter text into form elements, such as input and search boxes, and submit the form.
- Scroll the page to view more content and avoid missing key information.
Visual feedback and status awareness
- Take screenshots of the current page state during complex steps, as needed, for subsequent analysis and explanation.
- Detect page status, such as page load completion, a successful form submission, or navigation to a new page, to determine the next action.

Use cases

You can use the browser agent in the following scenarios:

Information retrieval and comparison
- Visit product websites, documentation sites, or blogs to extract key information and generate summaries.
- Compare information across multiple pages or solutions, such as price, feature, or configuration differences.
Online operations and workflow walkthroughs
- Walk through a web-based workflow, such as registering an account or submitting a ticket, provided that permissions are sufficient and risks are controlled.
- Document the typical steps for using a web-based admin system and generate a draft of the instructions.
Development and testing assistance
- Open online documentation or an API reference to extract sections relevant to your current code.
- Browse a web application's interface to check its page structure, copy, or interaction logic and suggest improvements.

Specify the goal and constraints in your task description (for example, "read-only, do not submit any forms" or "only access public documentation pages") to help the agent complete the task more safely and reliably.

Browser types

The browser agent supports two browser types that you can switch between as needed:

Built-in browser: A lightweight browser panel integrated into the IDE that requires no extra configuration. It is suitable for quick previews and simple page interactions.
Chrome: Executes tasks using your local Chrome browser. It supports more complex web applications and pages that require specific browser features or extensions.

You can switch the browser type in the browser agent settings.

How to use in agent mode

The browser agent is built into agent mode and requires no separate configuration. You can invoke it in two ways:

Automatic invocation: The agent mode intelligently determines when to use the browser agent based on your request.
Explicit invocation: Use the /browser command to explicitly request the browser agent.

Follow these steps to use it:

Step 1: Enter agent mode

Open the TONGJI Qoder CN chat panel and switch to agent mode.

Step 2: Describe your task

Use /browser to invoke it explicitly, or describe your needs in natural language. For example:

/browser open https://example.com and summarize its main features
/browser find the 2025 pricing plans and organize them into a table
/browser analyze the theme customization options in this component library

Step 3: Review the results

The browser agent will:

Perform the necessary web interactions
Provide a detailed explanation of the actions taken
Provide screenshots for visual verification
Present the extracted data in a structured format

Recommendations and best practices

Define clear goals and boundaries
- State the desired outcome in a single sentence rather than describing just one step.
- For security- or permission-sensitive operations, explicitly state constraints like "do not perform submit, payment, or delete actions."
Provide a stable entry URL
- Provide a specific page URL instead of a vague search term to reduce navigation issues.
- If the task involves multiple pages, you can list the key pages or paths in the prompt.
Break down complex tasks
- For very long processes, such as a complex configuration wizard, break the task into smaller goals. Execute and verify the intermediate results step by step.
- After each stage, adjust the next prompt based on the results returned by the browser agent.

Security and limitations

Keep the following in mind when using the browser agent:

Permissions and privacy
- Do not use the browser agent to input or expose any sensitive information on web pages, such as passwords, access tokens, or personal data.
- For operations involving account login, payment, or data modification, perform these actions manually first. Then, let the agent perform read-only validation or generate instructions.
Page compatibility and stability
- Some sites that rely heavily on a front-end framework or complex interactions may experience slow loading or difficulty recognizing elements.
- If a page's structure or copy changes frequently, some steps might fail. If this happens, provide a more specific description or use a more stable entry page.
Result credibility
- The browser agent's answers are based on real-time web content. However, the content itself may not be authoritative. Verify critical information yourself before making decisions.
- For scenarios requiring legal, compliance, or high-risk business judgments, do not rely solely on the automated results from the browser agent.

With the browser agent, you can enable TONGJI Qoder CN to not only understand your code but also understand the web pages you are visiting. This allows you to coordinate code editing and web operations within the same conversation, significantly reducing the need to switch between your browser and IDE.