What tools does an AI browser agent need?

A complete browser agent needs four tool categories: navigation (visit URL, back, forward, open/switch tab), interaction (click, type, select, check, hover), scrolling (up, down, to element), and extraction (read page, get element text, extract structured data, get URL). Prophet ships 18 such tools, enough to script virtually any single-page or multi-page workflow.

How does an AI agent decide which browser tool to use?

The language model receives your goal plus the current page's accessibility tree, then emits a structured tool call specifying the tool name and parameters ("click element with role 'button' and name 'Sign Up'"). After the extension executes the call and returns the result, the model decides whether to chain another tool or finish. There is no rigid decision tree — selection is goal-driven.

What is the difference between clicking and typing tools in AI browser agents?

Click activates buttons, links, checkboxes, dropdowns, and any element with an interactive role — one synthetic event per call. Type (or fill) targets editable inputs, clears any existing content, then enters the specified text character by character. Robust agents also expose a Select Option tool for native and ARIA dropdowns, since clicking dropdown items varies by implementation.

How does an AI agent extract data from a web page?

Three layered tools handle extraction. Read Page Content returns the full text content structured by accessibility-tree hierarchy. Extract Structured Data pulls labeled fields (prices, headings, table rows) into JSON. Get Element Text targets one specific element by role and name. The agent chooses whichever returns the minimum tokens to satisfy your question.

Can AI browser agents handle multi-step workflows?

Yes. The agent loops: read page state, call a tool, check the result, decide the next tool. A request like "fill out the contact form and submit it" typically chains 4-6 tool calls (focus name, type name, focus email, type email, click submit, verify confirmation). Each step is informed by the previous step's outcome, so dynamic states like validation errors are handled mid-loop.

What AI Browser Agents Actually Do: 18 Tools Behind Click, Type & Extract

When you ask an AI assistant to "fill out this form" or "find the pricing on this page," the AI needs concrete capabilities to translate your instruction into browser actions. These capabilities are called tools: discrete functions that the AI agent can invoke to interact with the web page. Prophet ships with 18 built-in tools that cover the full range of browser interactions, from simple clicks to complex data extraction.

Understanding what these tools do and how they work helps you give better instructions and get more reliable results. This guide explains each category of tools, how the AI decides which one to use, and how to prompt for the best outcomes.

How AI Agent Tools Work

Before diving into specific tools, it helps to understand the basic mechanism. When you send a message to Prophet, the AI model (Claude) analyzes your request and the current page state. If the request requires interacting with the page, the model generates a "tool call," a structured instruction specifying which tool to use and what parameters to pass.

For example, if you say "click the Sign Up button," the model reads the page's accessibility tree, identifies the element labeled "Sign Up" with a button role, and generates a tool call like: click element with role "button" and name "Sign Up". The extension executes this tool call against the live page in your browser and returns the result to the model, which then decides whether to take another action or respond to you.

This tool-use loop can execute multiple steps in sequence. A request like "fill out the contact form with my name and email, then submit it" might involve three or four tool calls: focusing the name field, typing the name, focusing the email field, typing the email, and clicking the submit button. The model chains these tools together automatically based on your high-level instruction.

Navigation Tools

Navigation tools control where the browser goes and what page is loaded.

Navigate to URL opens a specified URL in the current tab. This is used when the AI needs to visit a specific page, such as a help article, a dashboard, or a search results page. The tool waits for the page to finish loading before returning control to the model.

Go back / Go forward mirrors the browser's back and forward buttons. The AI uses these when it needs to return to a previous page after checking something, or when navigating through a multi-step process that requires moving between pages.

Open new tab creates a new browser tab with a specified URL. This is useful when the AI needs to reference information from another page without losing the current page's state. For example, looking up a help article while filling out a form.

Switch tab changes the active tab to one the AI has previously opened or that was already open. This enables workflows that span multiple pages, like comparing information across different sites.

Interaction Tools

Interaction tools let the AI agent manipulate elements on the current page.

Click is the most frequently used tool. It activates buttons, links, checkboxes, dropdowns, and any other clickable element on the page. The AI identifies the target element through the accessibility tree, using a combination of the element's role (button, link, checkbox), its accessible name (the text label), and its position in the page structure. This approach is more reliable than CSS selectors or XPath because accessible names persist even when developers change the underlying HTML structure.

Type / Fill enters text into input fields, text areas, and other editable elements. The AI first identifies the target input field through its label or placeholder text, focuses it, and then enters the specified text. This tool handles clearing existing content before typing, which is important when editing pre-filled forms.

Select option chooses an option from a dropdown or select menu. The AI identifies the dropdown by its label, opens it, and selects the specified option by its visible text. This works with both native HTML select elements and custom dropdown components that use ARIA roles.

Check / Uncheck toggles checkbox and radio button elements. The AI reads the current state (checked or unchecked) and only performs the action if the state needs to change, preventing double-toggles.

Hover moves the mouse over an element to trigger hover states. This is used for menus, tooltips, and other interactive elements that reveal content on hover.

Scrolling Tools

Web pages often extend beyond the visible viewport, and the AI needs to access content that is not currently visible.

Scroll down / Scroll up moves the page viewport vertically. The AI uses these tools when it needs to access content below or above the current view, or when it needs to bring a specific element into view before interacting with it.

Scroll to element scrolls the page until a specific element is visible in the viewport. This is more precise than generic scrolling and is used when the AI knows which element it needs to reach but the element is not currently visible.

Data Extraction Tools

Extracting information from web pages is one of the most common AI agent tasks, and Prophet provides specialized tools for structured data extraction.

Read page content returns the full text content of the current page, structured by the accessibility tree's hierarchy. This gives the AI a comprehensive understanding of the page's content and structure, which it uses to answer questions, summarize content, and plan interactions.

Extract structured data pulls specific data from the page in a structured format. When you ask "what are the prices listed on this page?" the AI uses this tool to identify price-related elements and return them in a structured way that preserves the relationship between items and their prices.

Get element text returns the text content of a specific element identified by its role and name. This is used for targeted extraction when the AI needs a specific piece of information rather than the full page content.

Get page URL returns the current page's URL. This seems simple, but it is important for the AI to know where it is, especially after navigating through multiple pages or following redirects.

How the AI Chooses Tools

The AI model does not follow a rigid decision tree when selecting tools. Instead, it evaluates your request against the current page state and selects the tool (or sequence of tools) most likely to accomplish your goal.

For a request like "find the total on this invoice," the AI will first use the read page content tool to understand the page structure, then identify the element containing the total, and return the answer. No interaction tools are needed because the task is purely informational.

For a request like "subscribe to the monthly plan," the AI reads the page to find the subscription options, clicks the monthly plan button, fills in any required form fields, and clicks the confirmation button. Each tool call is informed by the result of the previous one, allowing the AI to handle unexpected states like confirmation dialogs or additional form fields.

You can influence tool selection by being specific in your instructions. "Click the blue button at the top of the page" gives the AI less useful information than "click the Subscribe button in the pricing section." The AI identifies elements by their semantic meaning (labels, roles), not their visual appearance, so descriptions that reference functionality work better than descriptions that reference appearance.

Reliability and Error Handling

Browser automation is inherently unpredictable. Pages load slowly, elements change dynamically, and interactions can trigger unexpected states. Prophet's tools include built-in error handling for common failure modes.

Element not found: If the AI tries to click an element that does not exist, the tool returns an error message that the AI uses to re-evaluate the page and try an alternative approach.
Element not visible: If the target element is outside the viewport, the AI automatically scrolls to bring it into view before retrying the interaction.
Page loading: Navigation tools wait for the page to finish loading before returning, preventing the AI from trying to interact with elements that have not rendered yet.
Dynamic content: The accessibility tree captures the current rendered state of the page, including content loaded dynamically via JavaScript, so the AI always works with the latest page state.

Tips for Better Tool Usage

These practices lead to more reliable results when working with Prophet's browser tools:

Describe goals, not steps. Say "fill out the contact form with name John and email john@example.com" rather than "click the name field, type John, click the email field, type john@example.com." The AI plans better steps when it understands the goal.
Use semantic descriptions. Refer to elements by their labels and functions: "the search box," "the submit button," "the price in the first row." Avoid visual descriptions like "the red button on the left."
Break complex tasks into phases. For multi-page workflows, guide the AI through one phase at a time: "First, go to the settings page and find the notification preferences" rather than combining many steps into one instruction.
Verify results. After the AI performs actions, ask it to confirm what happened: "Did the form submit successfully?" or "What does the confirmation page say?"

Understanding these tools transforms the AI from a chatbot into a capable browser agent. For a walkthrough of how these tools work in practice, visit the how it works page. For an overview of what Prophet can do across different professional workflows, explore the use cases directory.